Data, data, everywhere!

February 7, 2012

Over 23-24 January 2012, CIMMYT’s global maize program received an unprecedented gift: over 2 billion maize marker data points from 4,000 CIMMYT lines. “For each line, we are now able to detect over half a million markers,” said Gary Atlin, Associate Director of the program. “These ‘signposts’ give us great power to do genetic analysis; they are distributed more or less randomly across the 10 chromosomes of maize, so we are able to track very small pieces of chromosome,” he added.

CIMMYT is currently working with USDA maize geneticist Dr. Ed Buckler at Cornell University’s Institute for Genomic Diversity, whose team produced this data for CIMMYT using genotyping-by-sequencing (GBS) technology. As the operation increases, CIMMYT is partnering with Diversity Arrays Technology Pty Ltd (DArT P/L) to establish a self-sustaining genetic-analysis service in Mexico, which will be based on GBS (“Servicio de Análisis Genético para la Agricultura” or SAGA in Spanish). SAGA will genotype large numbers of genebank accessions for the Seeds of Discovery project, whilst also serving the needs of breeding programs, both at CIMMYT and in Mexican partner organizations.

Using both these data and phenotypic information, researchers will learn how to select lines which perform well under drought, or low soil nitrogen levels, or possess resistance to a particular disease. Previously, CIMMYT was using SSR genotyping, at a cost of around $1 per data point. SSRs span several hundred base pairs, essentially allowing them to detect more alleles and therefore provide four or five times more information than the Single Nucleotide Polymorphisms (SNPs) currently being used. However, there are fewer SSR loci and SSR visualization technologies are more expensive; in fact, whilst the current data set cost less than $160,000 to obtain, in 2005, using SSRs, it would have cost around $400,000,000. “It’s a new ballgame,” states Atlin. “GBS genotyping costs us about $40 per line, and will likely drop to around $20 next year. This is about the same cost as evaluating the line for yield in a single field plot. At this price, we can genotype all CIMMYT maize breeding lines entering replicated field testing, and build powerful models to predict performance in the field for traits that are difficult and expensive to measure.” He notes that it will also speed up the breeding cycle, resulting in greater yield gains per year.

Getting the two billion marker data points is just the beginning; next steps include analyzing and converting the data to information. The team plans to generate at least this much data annually henceforth. “It’s a huge job,” says Atlin, “but already a significant achievement.”