A Deal with Autism Speaks to House Research Data From 10,000 Complete Genomes
Google and Autism Speaks will be announcing a partnership designed to house the sequencing of 10,000 genomes and other clinical data. WSJ's Shirley Wang and Autism Speaks co-founder Bob Wright join Lunch Break with Tanya Rivero. Photo: Google/Autism Speaks.
Google Inc. GOOGLE -0.43% and Autism Speaks, a major autism research foundation, plan to announce on Tuesday a deal in which the Internet giant will house the sequencing of 10,000 complete genomes and other clinical data of children with autism and their siblings and parents. The hope of those involved is to accelerate research on the developmental disorder.
Studying genes has been touted as a key to understanding Alzheimer's disease, cancer and autism. But huge DNA databases require computing and storage that many universities and research hospitals don't have.
The database will be part of the AUT10K, the Autism Speaks genome-mapping program. It is thought to be the largest collection of whole genomes and would be open to all qualified researchers. The tools needed to analyze the data would be available on the Google system.
Organizers expect to have an easy-to-use portal for researchers within a year. They hope to have the raw data available sooner.
Robert Ring of Autism Speaks, left, talks with Stephen Scherer, right, in Toronto at the Hospital for Sick Children's Centre for Applied Genomics. Dr. Scherer, who directs the center, will be the director of the Autism Speaks AUT10K program. Nicolas Cusworth
Putting the information and analytical tools on servers like those provided by Google that can be accessed remotely, often known as cloud computing, allows for more seamless collaboration between researchers. It also provides access for researchers from institutions that don't have powerful computer systems to conduct genomic studies on their own.
"Cloud computing is the great leveler," says Mark DeLong, director of research computing at Duke University. Also, "it opens up new avenues for talent development." Dr. DeLong isn't involved in the partnership.
More broadly, genomic research aims to figure out how diseases work, who may have or develop a certain condition and how to develop new treatments. Genomic research has already led to meaningful findings, including a significant one for heart disease researchers. The PCSK-9 gene is one example of a "gene-to-drug" discovery, some researchers say. In the mid-2000s, scientists discovered that a mutation in the gene reduced production of a particular protein that helps regulate cholesterol levels. People with the mutation also didn't develop heart disease. Clinical trials of experimental drugs that inhibit PCSK-9 are now in late stages.
One of the biggest insights gleaned from genetic research in autism so far is that there isn't just one form of autism, but many. Whole-genome sequencing, which allows scientists to look at every single letter—known as a base pair or nucleotide—in a person's DNA should provide "increased resolution of understanding what autism is," says Robert Ring, chief science officer at Autism Speaks.
His organization has sent teams of clinicians into people's homes to collect samples. They have collected more than 10,000 over the last 15 years.
Google wants to use its cloud technology to help Autism Speaks and others in genomics get results "better, faster and cheaper," says David Glazer, Google's engineering director in charge of the genomics cloud effort. Dr. Glazer declined to disclose the amount Autism Speaks is paying for these services.
To establish such large genomic databases, researchers must overcome technological challenges. Storage is an issue: The digital representation of a genome takes up roughly 100 gigabytes of storage. Only about 10 whole genomes fit on a typical desktop computer. Some collections of genomic information are already so large that downloading them over the Internet would take too long to be useful, says Duke's Dr. DeLong.
Placing the data on servers for any scientist to use is one way around this problem. Dr. DeLong looks for certain DNA sequences that researchers at Duke are using so the university can store it once. That way, 40 people don't need copies that occupy so much space.
Accuracy of data can be a giant obstacle with such cloud databases, particularly if they are pieced together by data collected from different places. They must be labeled clearly, or scientists from other institutions could mistakenly interpret the data, which could lead to inaccurate results and findings that can't be reproduced.
Security and confidentiality of the donors' data are also concerns. Some networks require researchers to apply for access. Dr. Ring says AUT10K will be available to qualified researchers who agree to abide by a standard research agreement.
In addition, universities and researchers must figure out whether they want to share their data—which now is sometimes mandated by grant funders—and how to protect their data for their own patents and publications.
When the University of Pennsylvania's Gerard Schellenberg needed 800 whole genomes from collaborators at the University of Southern California, USC bought physical hard drives and shipped them, recalls Dr. Schellenberg, a professor in pathology and laboratory medicine who is a lead investigator of the Alzheimer's disease Sequencing Project.
With the Alzheimer's project, a collaboration between five universities, scientists have sequenced 580 genomes of people with the progressive memory disease and are sifting through these millions of pieces of data. The institutions put their data up on Amazon's cloud storage service so researchers at the different sites could run analyses, then removed the data because it was too expensive to store in the cloud. Downloading the results, which typically involve comparing DNA of people with a disease to those without it, cost the institutions about $200 per genome, Dr. Schellenberg says.