Minutes of the January meeting of the Aspergillus fumigatus (Af) genome sequencing group

Hinxton Hall, Wellcome Trust Genome Campus, 14 and 15th January, 2000

Chairs: David W Denning (DWD), University of Manchester, UK; Joan W Bennett (JB), Tulane University, USA and Celia Caulcott (CC), Wellcome Trust (WT), UK.

Also attending: Michael J Anderson (MA), University of Manchester, UK; Rodolfo Aramayo (RA), Texas A&M University, USA; Bart Barrell (BB), Sanger Centre, UK; Axel Brakhage (AB), Technische Universitat Darmstadt, Germany; Dennis Dixon (DD), National Institutes of Health, USA; Tamara Feldblyum, The Institute for Genomic Research (TIGR), USA; Philippe Glaser (PG), Institut Pasteur, France; Rhian Gwilliam (RG), Sanger Centre, UK; June Kwon-Chung (JKC), National Institutes of Health (NIH), USA; Victoria McGovern, Burroughs-Wellcome Foundation, USA; Emilia Mellado, Instituto de Salud Carlos III, Spain; Crispin Miller (CM), University of Manchester, UK; William Nierman (WN), TIGR, USA; Stephen Oliver (SO), University of Manchester, UK (15th only); Julian Parkhill (JP), Sanger Centre, UK; Rolf Prade (RP), Oklahoma State University USA; Michael Quail (MQ), Sanger Centre, UK; Marie-Adele Rajandream, Sanger Centre, UK; Kazuhiro Tsukamoto, Nagasaki University, Japan; Geoffrey Turner, University of Sheffield, UK; John Woodward (JW), Sanger Centre, UK.

Apologies from: Nancy Kellor, Texas A&M University, USA; Jean-Paul Latgé, Institut Pasteur, France;

Introduction to the meeting.

1) Introduction: David Denning: introduced Af with background information on Georg Fresenius and a medical history of aspergillosis. He described the aims of the meeting which were to discuss the co-ordination of the project, what sort of annotation we should aim for, the naming of genes and to obtain a list of other related projects.

He asked DD to introduce the NIH funding mechanism for sequencing small genomes: U01 grant procedure with a total cost of $1.5 million per year over 2 years; there was a target list of 5 organisms with 2 fungi: Af and Cryptococcus neoformans (Cn); he stated that the NIH is currently funding 3/4 applications on Af and ~30 on Cn; that the genome project should be viewed as a means to increase the size of the community, but that however the reviewers would need to be satisfied that the sequencing data would be used and therefore that letters of collaboration were absolutely necessary. DD finished by stating the Michael Gottlieb was trying to get enough money to fund 3 grants/year.

DWD asked PG to introduce the Genoscope application: PG stated that Genoscope had to be satisfied that there was interest in the project and that the sequence data are going to be used. The BAC clones from the Sanger Centre were going to be provided to Genoscope and the annotation was going to be done at the Pasteur. Finally PG stated that Genoscope had to satisfied that the rest of the genome was going to be sequenced by someone else.

DWD continued by stating that he wanted input from the attendees on the structure of the committees which he proposed should consist of a sequence project team, a co-ordination committee, a functional annotation team and a list of advisors consisting of two groups: funders and key biologists. He also wanted input on the timeline for the project and on the list of other projects. JB wondered if we should contact industry and RA stated that Schering- Plough were interested in the project but wouldn’t provide any money. His hope was that this project would force Cereon to release the data on A. nidulans (An). DWD stated that 12 of the large drug companies had anti-fungal programmes.

2) Progress report on the pilot project: Michael Anderson and Michael Quail: MA introduced the clinical isolate which is going to be sequenced (AF293) and described his success in preparing DNA encapsulated in agarose plugs. He described his attempts to determine an electrophoretic karyotype for AF293, but reported that so far he had only been able to resolve the smallest chromosomes (chr) into two bands of 1.7 and 1.8 megabases (Mb). MQ described how he and JW have carried out a partial digest of the genomic DNA using an optimal ratio of EcoRI to EcoRI methylase. After gel extracting DNA of various size ranges, the DNA was ligated into a prepared vector (pBACe3.6). From the results obtained with a 1/20th fraction, MQ estimated that the primary library of insert size 100 to 150 kb should have over 5000 recombinants. A smaller library using DNA of size range 50 to 100 kb has also been constucted. MQ then described the mapping procedure which will be carried out by RG. 3456 clones from the primary library will be gridded out onto membranes (6x6x96) in preparation for hybridisation work. These clones will also be digested with HindIII and sized on agarose gels or digested with Sau3AI and sized on a sequencer. The choice would be based on a costing. The resulting fingerprints would be used to build up contigs. MA described the reasons why 3 loci from chromosome VIII of An had been picked as probes to isolate a 1 Mb contig from Af. There followed a general discussion about probing the clones with suggestions that the available sequence from chromosome IV of An or the genetic map of chr IV be used to select probes.

3) Overview of sequencing projects: Bart Barrell: stated that international sequencing projects depend on confidence about each other’s data and on the free exchange of these data. The quality of sequence from major centres is not in doubt. Overlapping will occur and permits validation. Annotation is presented in a user friendly manner using, for instance, Artemis. He stated that the sequencing centres view each other as friendly competitors and that they check each other’s data and agree on standards. BB then demonstrated the software that the Sanger centre uses and summarised what they do re locating ORFs and assigning putative function. He used the Schizosaccharomyces pombe (Sp) project as an example of how a project is managed at the Sanger centre. Each project has a professional annotation team: for instance in Sp, there is one person responsible for bioinformatics and 2 Sp postdocs. This team has formed a relationship with the research community.

4) Lessons to be learnt from other international sequencing collaborations: Celia Caulcott: compared a few international projects. The Malaria project is one of the most successful from an international perspective where the genome was carved up and committed chr by chr. The Leishmania major project was approached in the same manner where chr were assigned by the biologists and sequencers with no reference to the funders. The WHO helped with co-ordination. In contrast, with Typanosoma brucei, a pilot project of 1 chr was done at the Sanger, but the biologists decided on a shotgun rather than a chr by chr approach since this is a quicker way to find genes. However, at a meeting in Bethseda in Autumn 1999, it was decided to do the sequencing contig by contig because of public funding to several centres and the current lack of available computing power to assemble 30 Mb. In summary, CC stated that with an international sequencing project, you must listen to what the sequencers want, to what they have to say about annotation and finishing and to give them due recognition.

5) Data from the web questionnaire: Michael Anderson: summarised the data from the questionnaire posted on the web in December 1999. This questionnaire was designed to address how the Aspergillus research community might use the data from the sequencing project, what type of annotation they would find useful, how they would like genes to be named and what types of projects they would like to be involved in. Comments were also canvassed and some of these were highlighted.

6) Nomenclature: Joan Bennett: summarised the various approaches to the naming of genes noting in particular the approach of the C. elegans and yeast communities. With C. elegans, there is a naming 'czar' to whom researchers are meant to refer for approval of gene names. However people still publish gene names without reference to the 'czar'. With yeast, there is a long list of rules published on the web with the hope that researchers will adopt a standardised approach. JB made the important point that there are probably around 3000 core eukaryotic genes which will probably all be given one name eventually and therefore that there is an argument for not worrying too much about the names of these genes. JB then summarised the accepted naming schemes for the fungi, Neurospora crassa and An and summarised the relevant sections of the web-based questionnaire. During the discussion, the following points were made: it is probably better not to lay down any rigid rules, but perhaps to adopt an ad hoc approach. JB expressed a preference for adoption of the yeast nomenclature system.

Plenary Talk

The A. nidulans chromosome IV sequencing project : Rolf Prade: The minimal tile of cosmids has been used to sequence the 2.9 Mb chr. The cosmids have on average a 30 kb overlap. Each cosmid has been shotgunned subcloned by Sau3A1 partial digestion. He stated that they were ~½way with 6.7 fold coverage of each cosmid. So far they have just been building up the contigs from the raw sequence data and the sequence will have to be finished by covering the gaps. Annotation will be done when the sequence is complete. He highlighted the effort of Nigel Dunn-Coleman from Genencor in getting money from the other companies which funded this project. In answer to a question, RP stated that they had not looked systematically for the genetic markers on linkage group IV. JP made the point that with Streptomyces coelicolor which has a good genetic map, they have not been able to link all the genetic markers with ORFs when they sequenced the cosmids. There followed a discussion about how to use these chr IV data with the Af sequencing project. BB stated that the pilot project had been costed generously to permit fallback and therefore it might be better to spend money on mapping rather than on sequencing more than 1 Mb and to use this 1 Mb to validate the map. JKC made the point that not every An gene would be present in Af especially if the gene products were involved in secondary metabolism. PG suggested that the project use his Af STS sequence data which covers ~ 10 % of the genome, to search against that for chr IV to find ORFs to use as probes. The general consensus was that probes for screening the Af library should be developed from both the An chr IV and VIII data.

Presentations about submitted and proposed applications: sequencing projects

1) Genoscope application: Philippe Glaser: stated that the Institut Pasteur application to sequence 5 - 10 Mb had been submitted in November and that the funding meeting was being held in January. The application would be competing with human chr IV. The project would start in October and someone in Genoscope would be appointed to manage it. It is likely that they would just generate good sequence and that therefore the annotation would have to be done at the Pasteur. Money for this annotation would have to be applied for separately.

2) NIH earmarked sequencing money - TIGR application: William Nierman: pointed out the parallels of this project with the Arabidopsis sequencing project, where although the Af genome is smaller, an international collaboration is involved and they started without a contig map. They hold twice yearly meetings of the sequencers and funders. A lesson to be learnt from the Arabidopsis project was where the same software was forced onto all the annotators by the funders which caused dissent. Finally with the Arabidopsis minimal tile, there was 10 % overlap between the clones. Regarding the Af project and the NIH U01 application, he stated that they would have to demonstrate technical improvements to the sequencing. One improvement he proposed would be to use random shearing of genomic DNA to generate another library with 50 kb inserts which would be end-sequenced to 1 x coverage of the genome. This library would act as a useful resource for the project and should be crucial in closing gaps. Another innovation would be to pool 10 BAC clones rather than using just one clone to construct the shotgun libraries for sequencing. Finally he stated that TIGR are developing software which would also use the randomly generated end sequence of library clones during the assembly of sequencing contigs. Discussion: DWD asked what might be considered a suitable publishing unit. BB stated that the sequence of individual BAC clones submitted to EMBL should be considered a form of publishing. Other units would be individual chr or the data from the pilot project. It was also suggested that contigs should be assigned to the electrophoretic karyotype and that for instance TIGR could sequence those that hybridised to the larger chr and Sanger those to the smaller chr.

3) The Spanish contribution to the project: Emilia Mellado: stated that money should be available from the research agency of the Health Ministry (603,000 Euros). They have identified a sequencing centre: Dept Microbiology and Genetics at Salamanca who have already been involved in the Sp project. The proposal was due in March and they would need to show that there was a demand for the data. Discussion: it was suggested that Spain could be assigned a specific region of interest as had been done with the Sp project, but SO stated that it was better to run the project in a systematic fashion rather than to ‘cherry pick’favourite regions.

4) The Japanese contribution to the project: Kazuhiro Tsukamoto: stated that applications to the University are due in the Spring and in the Autumn to the government. 4 sequencing machines were present in Nagasaki’s School of Medicine with 1 in Prof Kohno’s group. He estimated that it would take 4 months to sequence 1 BAC clone at a cost of £6000. He felt that his group could undertake to sequence 2 BAC clones over 12 - 18 months.

Other potential areas for research and funding applications.

1) The Molecular Genetics of A. fumigatus: Rodolfo Aramayo: made the crucial point that there was a need to develop the molecular tools required to exploit the sequence data. He listed important questions that would need to be answered such as whether or not all the work should be done on the strain being sequenced and what checks should be carried out on the strain. For instance, what is its karyotype, how stable is its genome and pathogenicity after cultivation in the lab and after transformation, and should mutants be generated from it? There were also many questions that would need to be addressed regarding transformation procedures of Af which would be a critical requirement for any functional genomics. In answer to some of his points, AB pointed out that C d’Enfert had developed a ura blaster transformation system for Af though with a different strain. JKC pointed out that after many attempts in her lab she was certain that it was not possible to induce meiosis in Af. The point was made that basic biological questions arising from the sequencing project could be addressed in An and questions about pathogenicity and drug resistance in Af. SO made the point that as with yeast, deletions in the primary series of genes should be done in the sequenced strain. Finally RA stated that he planned to submit an NIH R01 application to develop the molecular tools and mutants.

2) Electrophoretic karyotyping of A. fumigatus: Joan Bennett: summarised the data from other Aspergillus species and stated that there was probably more variation in asexual species because there was no need to align chr during meiosis. A comparative study of An and A. parasiticus showed that loci mapped to differently sized bands. It has also been shown in A. niger that chr sizes were radically different between strains. She and Jo-Anne von Burik (U. Minnesota) will submit an application to the NIH to study the electrophoretic karyotype of several Af strains.

3) Functional gene annotation of the Aspergilli: Geoffrey Turner: summarised the various possible levels of functional annotation from the extensive type provide by Proteome with its YPD and WormPD databases to the one page summaries of SGD to one paragraph summaries. He displayed some of the supportive comments made about functional annotation from respondents to the web questionnaire. He wondered how the community could help and CM made the point that a database could be set up in such a way that experts would be able to add 2 sentence descriptions to ORFs. DD stated that Proteome had applied to the NIH for money for the YPD database.

4) The Pasteur A. fumigatus transcriptome project: Philippe Glaser: summarised progress so far. A shotgun library from AF293 has been constructed which contains inserts of range 0.6 to 1.0 kb. 75x96 plasmid preps have been done and 6754 of these sequenced (pass rate of 94 %). 4994 sequences remained after removal of low quality and repetitive sequences and vector contamination. Only the DNA has been kept from these clones and an aliquot put aside for PCR. In initial array experiments, the DNA was spotted onto membranes and hybridised with total genomic DNA. The strongest signals were from clones containing mitochondrial DNA and the other strong signals were from clones containing the previously defined retrotransposon Afut1. This experiment indicated that there isn’t much repetitive DNA in Af. PG stated that no sequence analysis had been done yet and that they plan to investigate transcript expression levels of Af growing on human epithelium, on aveolar macrophages and on macrophage and epithelial cell lines. They plan on comparing expression differences in Af mutants and in other filamentous fungi. He finished by saying that they had been unsuccessful in obtaining money from the EU framework V programme.

General discussion

DWD raised the suggestion of creating a separate database for the Aspergillus community which could be expanded to include transcriptome and other data. BB stated that AceDB could store map data and include literature references and be tagged with authorship. WN stated that TIGR would not be using AceDB. CC pointed out a new bioinformatics initiative ( from the WT which includes funding for database curatorship. CM indicated some of the bioinformatics possibilities which include creating data warehouses containing array data, protein-protein interactions and data from literature sources such as journals and EMBL entries. JB was strongly in favour of a curated database, but RP thought it likely to raise heated discussion within the community. Current costings of finished sequence were $ 3m for 12 Mb from TIGR and £ 0.10/base from the Sanger. Genoscope do not operate on a cost basis, but on a contract basis to sequence a certain length of DNA. CC stated that the WT were behind the project, but obviously that they couldn't commit to more sequencing. BB stated that the timing of the next application would depend on when they had generated useful data from the pilot project and on access to sequence capacity. It was stated that an electrophoretic karyotype was important for sizing the genome and to help in the assignment of BAC clones. It was felt that no more mapping or cloning was required for the project beyond that to be done in the Sanger pilot project and in the TIGR project. It was proposed that we should increase the profile of the project and of the medical importance of Af by seeking publication in journals such as Curr Opin Microbiol, Trends Microbiol, ASM News and Microbiol Today. The Sanger centre will distribute clones and hybridisation grids with charges to cover P&P. Finally it was proposed that the next meeting be held in the Autumn.

Michael J. Anderson

Joan W. Bennett

David W. Denning

Geoffrey Turner

May 23, 2000

back to index

Site Contact