Annotating the maize genome

Volker Brendel, professor of bioinformatics at Iowa State, spoke at the Maize Genetics Conference about the need for a better system of community annotation of the maize genome. The genome of the popular maize inbred line B73 is sequenced, but we don’t actually know what a lot of the code stands for. It’s going to take a lot of collaborative effort to discover and annotate (explain) the function of each gene and to put all of that information in one place so it will be useful.
Volker reminds us that the Arabidopsis 2010 funding is running out, so we need to assess the plant genetics situation. How many genes do we know the function of? There is still much to learn.
Maize is uniquely positioned to replace Arabidopis as a focus for basic plant research due to the many resources that are already established, the most important of which is the extensive maize genetics community (he didn’t say it, but there is another reason why maize is a better choice than Arabidopsis right now – all of our major grains are very closely related, so work on maize applies to rice, wheat, sorghum, and more). The community needs to work together in the annotation process, assigning functions to the genes that have been sequenced, putting the data from a variety of sources together to make a bigger picture. Each researchers has a favorite gene (pathway, organelle, etc) – how can each of the researchers contribute to the annotation process?
PlantGDB is a comparative genomics site funded by NSF has information on 14 species, including maize, which is very useful. However, no matter how clever the computer programs are, the human touch is still needed. Filling in information on any of these species helps us to better understand all of them. On the site, community members can flag genes for which the models don’t seem to fit, and can contribute alternative explanations. The final goal is to have every gene model approved by the relevant community member(s). When a person annotates a gene, the PlantGDB committee reviews it, approves it, and the information is shortly available on the site. Annotating the genes you are working on is your civil duty, something you owe due to public funding you receive.
After Volker’s talk, the attendees discussed what is the public’s role in the attenuation process should be. There are a lot of cases where the the gene model can be checked without any lab work, simply by looking at the sequences. Some members of the community think we should harness the brainpower of thousands of biology undergraduate students by assigning annotations for class. I like the idea of getting students involved, and hope they follow through.Diversity of people to represent the maize genetics community.
A panel discussion followed, where a lot of great new ideas for annotation were brought up (unfortunately I don’t have the names of some of the people that spoke).
One panel member said we need “Zeazomics” – a collection of information including genomics, metabolimics, proteomics, and whatever else we can come up with – to fill in gaps in our knowledge. being able to link all of this information together will lead to stronger explanations of the phenotypes we see. He said this process will not be definitive, it will create a series of hypothesis that will lead to more hypotheses. The hypothesis testing will lead to functional biolgoy, from physiology to biochemistry to cell biology and more. Additional genome sequencing is necessary to capture the entire diversity of maize. Maize is the model for grasses, for crops, for future applications like biofuels. Now is the time to push maize research to a much higher level.
To accomplish all this, we’ll need to take care of a few things, as the other panel members and members of the community brought up:

  • Need to have reciprocal links from genes from MaizeGDB to NCBI Entrez Gene. Currently, about 20,000 NCBI Entrez Genes need links back to MaizeGDB.
  • To help with annotation, Lisa Harper, curator of MaizeGDB, will do a movie that shows the common problems of using the databases, including how the genome changes over time as the contigs are reordered, etc. This is needed because people are often working off of older copies of the information for a given gene, as it might not be updated frequently enough.
  • There is also a need to integrate microarray data into the databases. Particularly complicated are those microarrays that are specific to a particular tissue and/or developmental stage. Volker says that this problem is common and new technologies with new ways to visualize data are necessary.
  • MaizeGDB needs a forum such that people working on the same genes can coordinate their work.
  • iPlant is organizing a workshop in St. Louis in June to help coordinate the various genome annotation groups.
  • There is a plan to create outreach information that any member of the maize community will be able to download and use to communicate the needs and accomplishments to the public and to government officials.