“Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!”
He took his vorpal sword in hand:
Long time the manxome foe he sought —
So rested he by the Tumtum tree,
And stood awhile in thought.
And, as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!
One, two! One, two! And through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.
“And, has thou slain the Jabberwock?
Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!’
He chortled in his joy.
[Image shows a portrait of the Pundit as a young man fighting the Jabberwock]
Out of Germany, not Out of Africa — the beastly German germ is a GMO whose parents have been around Germany for 10 years.
But it has now been virtully dismembered and its entrails laid out to dry, thanks to an intense weekend of in silico combat by the super-geeks at BGI institute and elsewhere in the cloud. The latest results, from BGI, is a sensitive new test for the germ, and full demonstration of the several gene movement events generating this strain by Kat Holt. The Era7 crowd have provide a detailed annotation of the gene content of this organism which is now freely available as a detailed paper in Nature Precedings.
The German beast has a main chromosome that is 99.69% identical to known Escherichia coli EAEC strain 55989 over 96.07% of the chromosome’s length. This strain comes from Africa. Another strain, a German strain 01-09591 originally isolated in 2001 is probably even more closely related to the current outbreak strain, but this strain’s genome has not yet been completely decoded. BGI could do it in about a day if given the DNA. Interestingly, Kat Holt an others now show the German outbreak strain has inherited a shigatoxin gene as part of an acquired virus cassette inserted its main chromosome. It also has genes for a gut surface attachment apparatus (aggABCD) as a mobile gene cassette carried on a plasmid — a clear-cut example of a natural GMO.
Gene-jockies are the scientists who go riding along long stretches of DNA data on a computer screen to try and discover its hidden secrets. Their amazing but largely unappreciated work — and that of their clever colleagues the computer nerds — in dissecting the DNA of disease germs to find how they cause harm — certainly merits much more publicity that it currently gets.
But first a warning– we are going to occasionally talk in jabberwocky, the language of the gene-jockies. It sometimes frightens newbies, but after a while one gets to enjoy it, and even get addicted to it. Indeed, it is extremely satisfying to investigate a disease outbreak with the computer, and to see virtual collaboration in action between research groups in different countries. With the current food poisoning emergency, data come from China and France, outbreak DNA samples from Germany, databases from Los Alamos, computers from Bill Gates, Intel corporation and Silicon Valley, and the internet from Al Gore.
Fortunately, because laboratory work with cloned DNA was not banned in 1976 in spite of the efforts of many people to stop it, gene-jockies are now able to decode the genome information in pathogens in a few days. The German public deserve knowing that science is their friend in times of trouble, and the efforts of much disparaged DNA scientists need proper recognition. So we are posting post here some samples of the feverish effort that’s currently giving the German EHEC beast a heavy working over in silico.
The Pundit always takes pleasure in saluting people who would do something rather than doing nothing and stopping everything. It is fantastic to witness the power of the wisdom of crowds in open-source action over the internet. The donation to the scientific community just a day or so ago of the complete DNA sequence of the German epidemic germ ( mentioned in the headline of this post, and described more fully by notes added at the end) is stimulating at lot of open-source analysis (details below).
The full version:
To set the stage with better understanding on the EHEC germ, let’s have a go at some rather simple German germ genome analysis here, and follow-up with a scan through on what the super-geeks like David Studholme, Kat Holt and Nick Loman do.
Using the genetic leads and data deposited in public databases such as Genbank graciously provided by the gene jockies (key sections of which are appended below) one can use the computer and the internet to discover valuable new information about the German outbreak strain.
For example gene sequence data from BGI indicate it is closely related to E. coli strain EAEC 55989 (proof for this is given below) and we find from searching the open-access genetic databases that that EAEC E. coli strain 55989 is fully described by Touchon and others (2009). We find in these databases that its disease-related plasmid pAA and main chromosome are both completely characterised by data base reports that were presciently placed there a year ago by the following fine French scientists:
Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, et al. 2009 Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths. PLoS Genet 5(1): e1000344. doi:10.1371/journal.pgen.1000344
REF 86. Mossoro C, Glaziou P, Yassibanda S, Lan NT, Bekondi C, et al. (2002) Chronic diarrhea, hemorrhagic colitis, and hemolytic-uremic syndrome associated with HEp-2 adherent Escherichia coli in adults infected with human immunodeficiency virus in Bangui, Central African Republic. J Clin Microbiol 40: 3086–3088.
REF 87. Bernier C, Gounon P, Le Bouguénec C (2002) Identification of an aggregative adhesion fimbria (AAF) type III-encoding operon in enteroaggregative Escherichia coli as a sensitive probe for detecting the AAF-encoding operon family. Infect Immun 70: 4302–11.
Wow! How fascinating. An African pedigree. The Pundit is now going to continue hunting through these DNA sequences to tease out in further detail the virulence chararacteristics of the German strain.
He’s hunting for the aggR disease plasmid and the stx2 toxin prophage. Not the Snark or the Jabberwock. As he finds it he is going to translate the work from jabberwocky into ordinary English.
Gene-Jocky Sleuthing results:
From the BGI genome data, the Pundit was able to confirm plausible natural mechanisms for gene movement of the virulent gene aggR and the toxin gene stx2 into the German outbreak strain from other E. coli strains known to carry them. These mechanisms involve mating to transfer a virulence plasmid, and virus mediated or assisted movement of the toxin gene stx2.
Some database sleuthing details
A little bit of reading of gene sequences provided by the gene-jockies, plus some software assisted sleuthing in the main Genbank gene databases easily turns up the DNA data files on EAEC strain E. coli 5589 kindly deposited in the databases a year ago by the French workers for open use by the scientific community. These notes are referenced by hyperlinks to the Genbank entries and brief descriptions later on in this post.
From these entries the Pundit discovered that Bernier, Gounon and Le Bouguenec did some key work way back in 2002. They describe a germ virulence capability known as aggregative adhesion fimbria (AAF). These hair like objects are on the surface of the bacteria and enable the bacteria to attach firmly to the gut wall. They are alternatively called pili (singular pilus). These pili are associated with a special mechanism that enables these bacteria to inject a protein into the gut lining cells of their host animal. Microbiologists call this a type III secretion system. Type III excretion systems of various sorts are frequently a crucial disease related germ attribute. They inject many diffrent proteins into animal cell targets.
Another fact can be gleaned by scrutiny of the Genbank database entries on EAEC E. coli 5589 . The African strain has no gene for making shigatoxin — whose presence is a key feature of EHEC germs. Thus if the German beast evolved from the African EAEC germ, it must have captured a new shigatoxin stx2 gene — quite likely via a virus mediated or assisted addition of new genes into its main chromosome.
EAEC virulence components are transmissible.
More sleuthing yields an authoritative 2009 review– “Pathogenomics of the Virulence Plasmids of Escherichia coli” by Timothy Johnson and Lisa Nolan which says this:
EAEC strains are the most recently described of the E. coli intestinal pathotypes (77). These bacteria were first describedby Nataro et al. in 1987, based upon their distinct aggregative adherence phenotype, which is seen as a brick-like pattern when the bacteria adhere to cultured HEp-2 cells (129). EAEC strains are considered to be an emergent diarrheal pathotype implicated in traveler’s diarrhea and affecting immunocompromised children in developing countries (77). In fact, EAEC strains are second only to ETEC strains as being the most common agent of traveler’s diarrhea. It is thought that food and water are the most likely means of transmission (77). Epidemiological studies involving this strain have demonstrated that EAEC virulence is heterogeneous, complex, and likely dependent on multiple bacterial factors and host immune status (126). EAEC pathogenesis is thought to involve three primary steps. First, the bacteria adhere to the intestinal mucosa using aggregative adherent fimbriae (AAF).Second, the bacteria produce a mucus-mediated biofilm on the enterocyte surface. Finally, the bacteria release toxins that affect the inflammatory response, intestinal secretion, and mucosal toxicity (77). Aspects of each of these steps involve plasmid-encoded traits.
A primary virulence factor of EAEC is that encoding the aggregative adherence phenotype (72). This trait was found to be associatedwith AAF (127) and is localized to a 55- to 65-MDa plasmid, termed the “pAA plasmid” (129). Like ETEC CFs, allelic variants of AAF have been identified. AAF from prototypical EAEC strain 17-2 (127) is genetically distinct from AAF from prototypical strain O42 (126), and their respective allelic variants are named types AAF/I and AAF/II. Other allelic variants of AAF have been described, including AAF/III from prototypical strain 55589 (9) and AAF/IV from strain C1010-00 (16). All the identified AAF allelic types appear to be plasmid encoded, and most of the strains analyzed tend to possess only a single AAF allelic type (72). AAF genes are regulated by an AraC-like transcriptional activator, AggR, and strains containing AggR have been termed “typical” EAEC strains (131). The AAF regulon contains both fimbrial genes and a regulator linked to one another on the pAA-type plasmid. There is evidence that AggR is a global regulator of EAEC virulence, as it exhibits effects on a number of chromosomal virulence factors as well (125). The major AAF pilins regulated by AggR include aggA (AAF/I), aafA (AAF/II), and agg3 (AAF/III) (9, 35, 131). AggR also regulates the expression of aap, a dispersin that is highly prevalent among EAEC isolates and facilitates the movement of EAEC across the intestinal mucosa for subsequent aggregation and adherence (77). This dispersin is exported out of the EAEC cell via the antiaggregation protein transporter system, encoded by the genesaatPABCD (8). This ABC transporter system is highly prevalent among EAEC populations, highly conserved, and regulated by AggR (77). While few studies have involved large numbers of EAEC isolates, recent work by Jenkins et al. found that two groups of EAEC exist based upon gene clustering. They are distinguished by the presence or absence of genes encoded on plasmid pAA and en bloc sets of genes located on genomic islands near the pheU and glyU loci (83). The definition of “typical” versus “atypical” EAEC strains has thus been supported by such results, with typical EAEC strains possessing pAA-associated genes and certain chromosomal islands, apparently coinherited.
The EAEC plasmids also encode toxins such as the plasmid-encoded toxins Pet and EAST1 (45). Pet appears to belong to the serineprotease autotransporter family and has been shown to confer cytoskeletal rearrangements, suggesting a role for Pet in EAECpathogenesis (24, 188). EAST1 has been found to activate guanylate cyclase, resulting in ion secretion (128). However, relatively few EAEC strains actually possess the genes encoding Pet and EAST1, so their role in EAEC pathogenesis may be limited (36).
Three EAEC plasmids have been completely sequenced: pO42, belonging to AAF/II+ strain O42; 55989p, belonging to AAF/III+ strain55989; and pO86A1, containing a novel AAF-like operon. All three of these plasmids are F-type plasmids with stability, maintenance, and transfer regions (Fig. 2). Plasmid 55989p is considerably smaller than plasmids pO42 and pO86A1, which is due to truncations in the F transfer region. This plasmid also differs from pO86A1 and pO42 in that it contains a RepFIC replication region instead of RepFIIA, although all three plasmids also contain a second replication region known as RepFIB (Fig. 2). All three plasmids encode their respective AAF types, and each contains the AAF regulatory gene aggR. While the AAF types possess considerable genetic diversity, aggR is generally highly conserved among the plasmid sequences available. A phylogenetic comparison based upon a nucleotide alignment of available aggRsequences revealed that aggR genes from AAF types I and III appear to be most closely related, whereas other AAF types are more divergent (Fig. 3). Also, sharing nucleotide similarity with aggR is the AraC-type transcriptional regulator rns of human ETEC plasmid types (26).
The features common to all three sequenced EAEC plasmids are the AAF operons, aggR, and aatPABCD (Fig. 2). In all three plasmids,these sequences are present on a RepFIB/FIIA-type backbone. Each of these plasmids also has unique regions not present in the other two sequenced plasmids, including the pet gene in pO42, the ipd gene in pO86A1 encoding an extracellular serine protease, and the Ets iron transport system in pO42 (Fig. 2). The acquisition of Eit by pO42 is particularly interesting because it was previously found only within ExPEC ColV and ColBM plasmids on a RepFIB/FIIA plasmid backbone (87, 88). Although the EAEC plasmids share a common plasmid backbone and core EAEC-associated virulence genes, the gross genetic composition and synteny of these three plasmids are quite different from one another. This would suggest that a significant amount of gene shuffling and rearrangement has occurred since the introduction of their virulence-associated module or that this module has been introduced on different occasions.
Another Wow! Have these gene-jockies been busy! And what do they say: that the EAEC plasmids are F-plasmids (described here). This means most likely that they are mobile or can be easily mobilised by fully functional F-plasmids. Their disease causing ability is infectious and is predicted to be easly transmissible to other bacteria. They are related to the first plasmid ever characterised, the fertile one (called F) discovered by Joshua Lederberg in 1946! They are able to create new natural GMOs.
But all this is no surprise to microbiologists– it is quite predictable from what they are taught in Germs 101. But the gene databases provide strong confirmatory evidence on the sexual prowess of EAEC type pathogenic E. coli. Thank you again gene-jockies and database nerd-people. Bioinformatics Rules OK!
Plausible mechanisms for toxin gene movement into the German outbreak strain can be deduced the BGI DNA data:
From DNA sequence information on the German strain provided by BGI several phage (silent virus) genes can be identified as being inserted in the main chromosome, e.g. a gene for phage antitermination protein Q. This gene is near the shigatoxin chromosomal genes in EHEC bacteria (e.g. Strain EDL933 analysed by Perna et al in 2001). This finding confirms a possible role of bacterial virus genes in transfer of toxin forming ability into the German outbreak strain.(Kat Holt provides strong confirmation of this — see later for details).
Gene database search show these virus genes are also widely dispersed in other E. coli bacteria. A simple explanation is that virus genes in some way facilitate toxin gene movement between strains. Several mechanisms can be plausibly suggested for this, but the bottom line is that plausible routes for interstrain toxin gene movement (horizontal gene movement, HGT) are suggested by the DNA sequence evidence donated to the science community by BGI.
Full gene content of the German isolate described
Escherichia coli EHEC Germany outbreak preliminary functional annotation using BG7 system
by: Raquel Tobes, Marina Manrique, Pablo Pareja-Tobes, Eduardo Pareja-Tobes, Eduardo Pareja, Raquel Tobes, Marina Manrique, Pablo Pareja-Tobes, Eduardo Pareja-Tobes, Eduardo Pareja, Raquel Tobes
Free access to full text
Nature Precedings, No. 713. (6 June 2011) doi:10.1038/npre.2011.6001.1 Key: citeulike:9388507
We have annotated the European outbreak E. coli EHEC genome sequenced by BGI (6-2-2011) and assembled with MIRA by Nick Loman (6-2-2011 ). Our system BG7, Bacterial Genome annotation of Era7 Bioinformatics, predicts ORFs and annotates them based on fragments of similarity with Uniprot proteins. We have predicted 6327 genes, 6156 encoding proteins y 171 corresponding to ribosomal and tRNA. Based on the preliminary results of our semi-automated method of annotation we have selected some predicted protein with potential implications in pathogenicity and virulence.There are 33 predicted genes annotated as toxins and we have found three putative hemolysins: Hemolysin E, a putative hemolysin expression modulating protein and a channel protein, hemolysin III family. We have found 31 predicted genes that could be related to specific antibiotic resistance: beta-lactamic, aminoglycoside, macrolide, polymyxin, tetracycline, fosfomycin and deoxycholate, novobiocin, chloramphenicol, bicyclomycin, norfloxacin and enoxacin and 6-mercaptopurine. This strain is rich in adhesion, secretion systems, pathogenicity and virulence related proteins. It seems to have a restriction-modification system, many proteins involved in Fe transport and utilization (siderophores as aerobactin and enterobactin), lysozyme, one inhibitor of pancreatic serine proteases, proteins involved in anaerobic respiration, antimicrobial peptides, proteins involved in quorum sensing and biofilm formation that could confer competitive advantage to this strain.
Now for something really geekerful. The blow by blow wounding of the Jabberwork from the crowd working in the cloud. May the force be with you.
First we listen to Dave Studholme’s story:
E. coli TY2482: strain-specific genes
Posted on June 4, 2011 by david j studholme
Within days of the reported E. coli outbreak, BGI have released five runs of genome-wide sequence data generated using Ion Torrent. With equally astonishing speed, Nick Loman has generated a preliminary de novo sequence assembly and Marina Manrique generated a preliminary annotation of that assembly.
I was curious to know: is there anything in the genome of TY2482 that is unique (i.e. not found in any previously seen E. coli genomes)? The answer is probably yes — but not much! The unique genome regions are listed in the table below.
I performed BLASTN searches of the Nick Loman’s TY2482 assembly against each of the Escherichia genomes in the NCBI RefSeq database using an E-value threshold of 1e-10. The set of 221 RefSeq genomes (including chromosomes and plasmids) is listed below the table at the bottom of this page. I then pulled-out all the regions of the TY2482 assembly that showed no BLASTN matches against any of the RefSeq sequences. I found just four such regions, ranging in length from 1. kb to 1.5 kb. The lists of predicted genes in these regions are taken from Marina’s preliminary annotation.
Please note, that the list of sequences against which I compared TY2482 is not comprehensive. In other words, there are other E. coli (and closely related species) sequences in the public databases that are not included in my list. (…go to link for details.
Wait! There is more:
Comparisons of E. coli TY2482 against previously sequenced E. coli genomes
Posted on June 5, 2011 by david j studholme
Quote
I have aligned Nick Loman’s TY2482 assembly, and the BGI’s raw Ion Torrent sequence reads, against the compete E. coli genome sequences from the NCBI RefSeq database. I used BLASTN to align Nick’s contigs and used BWA to align BGI’s raw reads. I used CGView to display the results of the alignments.
I also aligned Nick’s assembly against all these genome sequences using Mummer. The results in Excel format are here and OpenOffice format here. Looks like the most similar genome is Escherichia coli 55989 NC_011748: 99.69% nucleotide sequence identity over 96.07% of the chromosome’s length.
E. coli TY2482: strain-specific genes
Posted on June 4, 2011 by david j studholme
UPDATE Monday 6th June 2011
I have investigated the candidate ‘novel’ genes below a bit more carefully. Thanks to @kamounlab for some help with this (but any mistakes are mine!). As I stated previously, none of these shows significant nucleotide sequence similarity to any of the sequenced E. coli genomes listed at the bottom of this post. However, we do find some similarities to other E. coli sequences in the public databases. The only truly ‘unique’ sequence in the TY2482 is about 1 kb on contig husec41_c1687.
- husec41_c1060: This contig shares a lot of sequence similarity with Stx2-converting phage 86 (NC_008464.1), which was previously seen in Stx2 phage of EHEC O86:H- strain DIJ1. It shares 96% identity over 75% of the contig’s length.
- husec41_c1408: This contig is almost identical to an E. coli strain Ec222 pathogenicity island GenBank: AY151282.1.
- husec41_c1687: About two-thirds of this contig shows no significant nucleotide sequence similarity to anything in the NR or RefSeq databases.
- However, BLASTX reveals some protein-level sequence similarity to E. coli transposon Tn21 resolvase at the 3′ end of the contig. But the section in the middle of the contig has no detectable similarity to anything at either the DNA or the protein level.

- husec41_c1496: Although this contig shows no detectable nucleic acid sequence similarity to any full-sequenced genome in RefSeq, it is almost identical to several sequences in GenBank, including a microcin operon from strain CA58.
E. coli TY2482 genome compared versus E. coli EAEC strain 55989
Of the E. coli strains whose genomes have been fully sequenced previously, EAEC 55989 is the one mostly closely related to TY2482. So what are the genomic differences between TY2482 and its sibling 55989?
The following CGView plots illustrate alignments of the BGI’s TY2482 Ion Torrent data aligned again the EAEC 55989 genome (using BWA).
- Ty2482 data aligned against whole chromosome of EAEC 55989
- Ty2482 data aligned against whole plasmid of EAEC 55989
- Type III secretion system
- ‘Deletion’ in gene for outer membrane protein
The Pundit: Oooooooooh!. Very nice. Plasmid gene movement to the main chromosome! Maybe not. Those are probably plasmid contigs matching 55989 data. Still nice. Maybe there a lot of rearrangement of the plasmid?
Then there is the super-geek woman Kat H, who lives near the Pundit:
EHEC genomes
Professional paper:
Escherichia coli EHEC Germany outbreak preliminary functional annotation using BG7 system
Analysis
- 3-Jun Nick Loman EHEC genome assembly
- 3-Jun Mike the Mad Biologist I Don’t Think the German Outbreak E. coli Strain Is Novel: Something Very Similar Was Isolated Ten Years Ago…
- 3-Jun Marina Manrique (Era7) Automatic annotation of E. coli TY2482
- 3-Jun Simon Gladman Automatic annotation of E. coli TY2482
- 4-Jun Kat Holt Two strain SNP comparison
- 5-Jun David Studholme Comparisons of E. coli TY2482 against previously sequenced E. coli genomes
- 5-Jun Kat Holt EHEC genomes – plasmid
- 5-Jun Konrad Paszkiewicz TY2482,-LB226692-vs-Genbank-Ecoli
- 5-Jun Phylogeo HUSEC41 German outbreak strains are not that ‘new’
- 5-Jun Mariam Rizkallah Automatic annotations with RAST
- 5-Jun Kat Holt Comparative genomics – acquired material
- 5-Jun Raquel Tobes Preliminary functional manual annotation of E. coli TY-2482
- 5-Jun Raquel Tobes Identification of genes involved in colonization, adhesion, pathogenicity and metal resistance
- 5-Jun Raquel Tobes Analysis of TY-2482 genome plasticity
- 6-Jun David Vallenet Explore LB226692 annotations with MicroScope
- 6-Jun Kat Holt EAEC plasmids and aggregative fimbriae
- 7-Jun Kat Holt Gene content and horizontal transfer analysis
- 7-Jun Wolfgang Gerlach Taxonomic analysis of 3057 genes from TY-2482 with CARMA3
- 7-Jun Raquel Tobes & Marina Manrique (Era7) Automatic annotation of E. coli TY2482 BGI V2 assembly
- 8-Jun Raquel Tobes & Marina Manrique (Era7)Automatic annotation of E. coli LB226692 genome
- 8-Jun David Studholme A cluster of E. coli 55989 genes missing from outbreak strain TY22428. Is this a T6SS?
- 8-Jun Kat Holt Annotation of phage in the latest (6/6; HiSeq data) assembly
- 8-Jun Kat Holt Typing schemes & PCR targets for the outbreak strain
- 9-Jun Scott Weissman & Kat Holt Plasmid MLST analysis of IncI/blaCTX-M plasmid
- 9-Jun Nico Petty via Kat Holt More detailed analysis of prophage
- 9-Jun Konrad Paszkiewicz Pfam domain comparison
- 10-Jun Raquel Tobes Mauve comparison of E. coli H112180280 and TY-2482 strains
- 10-Jun Kwan lab The outbreak strains have similar pathogencity as Ecoli EAEC strain 55989: Alignment of virulence factors from VFDB
- 11-Jun Kat Holt EAEC plasmid comparison with new scaffold assemblies
- 11-Jun Kat Holt Comparison of new BGI and HPA scaffolds
- 11-Jun Kat Holt Resistance genes in the same scaffold as chromosome
- 11-Jun Günter Klambauer, Martin Heusel, Djork-Arné Clevert and Sepp Hochreiter Copy number analysis of the outbreak strain
- 11-Jun Marina Manrique & Raquel Tobes (Era7) Automatic annotation of BGI V3 assembly of E. coli TY-2482 genome
- 11-Jun Marina Manrique & Raquel Tobes (Era7) Automatic annotation of HPA assembly of E. coli H112180280 genome
- 12-Jun Kat Holt Analysis and manual annotation of chromosomal insertion containing adhesin/pathogenicity island plus multiple drug resistance operons
- 13-Jun Kat Holt Biofilm-associated genes and acquired pic mucinase
- 12-Jun Patrik D’haeseleer Alignment of all three assemblies with 55989 and plasmids
- 13-Kwan Lab The outbreak strains harbor the whole set of EAEC secreted proteins
- 14-Jun Peter Slickers Read length matters: identifying the phiStx2 att site
- 14-Jun Lisa Crossman Salmonella matches & annotations on two TY2482 plasmid scaffolds
- 14-Jun Konrad Paszkiewicz & Kat Holt SNP-base phylogeny confirms similarity of E. coli outbreak to EAEC Ec55989
- 14-Jun Peter Slickers MLST and serotyping 55989 in silico
- 15-Jun Lisa Crossman Rearrangements in the plasmids from TY2482 and H112180280
- 16-Jun Kat Holt BRIG visualisation of 4 available genomes
***
More Gene-Jocky jabberwocky talk follows for the record :
- Mike the Mad Biologist
Category: E. coli • Genomics
Posted on: June 3, 2011 8:10 AM, by Mike
…in Europe. I’ll get to that in a moment. You’ve probably heard of the E. coli outbreak sweeping through Germany and now other European countries that has caused over one thousand cases of hemolytic uremic syndrome (‘HUS’). What’s odd is that the initial reports are calling this a novel hybrid or some new strain of E. coli.
BGI has done some sequencing using Ion Torrent of one of these isolates, and Nick Loman assembled the data. Without getting too technical, the genome is actually in about 3,000 pieces, but with those data (and thanks to Nick for assembling them and releasing them) I was able to perform multilocus sequencing typing (‘MLST’). Basically, we look at the partial sequences of several genes (in this case, seven) to identify its sequence type–think of it as a molecular barcode (for the scheme and details, see here).
So what did I find?
This EHEC strain is most likely a very close relative of ST678 (details in a bit). In fact, according to the mlst.net strain database, there is a strain “Jan-91”, isolated in 2001* from Europe (no further geographic information is provided). That strain belongs to phylogroup D, and is associated with HUS…just like the outbreak strain. And the older strain also has the exact same serotype as the outbreak strain, O104:H4. ….(continues at Mike’s blog)
- The work of Nick Loman :
A heady mix of bacterial pathogenomics, next-generation sequencing, type-III secretion, bioinformatics and evolution!
You are here: Home / EHEC Genome Assembly
http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/
EHEC Genome Assembly
By Nick Loman on June 2, 2011
BGI have released 5 runs of Ion Torrent data for the German EHEC/VTEC outbreak strain. I hope it is released with no specific restrictions on use for the benefit of the entire community,
but the site doesn’t make that entirely clear. Thanks to the BGI for putting it up!
Shall we crowd source some analysis? This comes at a very timely moment as I am currently help organise the Applied Bioinformatics & Public Health conference in Hinxton (#ABPH11 http://search.twitter.com/search?q=ABPH11), where we are discussing the use of whole-genome sequencing in epidemiology. The problem is I don’t have much time to dig into the data.
But I’ve put a first-pass de novo assembly up using MIRA (3.2.1.17_dev) herehttp://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt. 3,057 contigs, total bases: 5,491,032, N50 3,675. If you want the alignment files etc. get the big file here (282Mb).
Parameters are: mira –job=denovo,genome,accurate,iontor -GE:not=1
Update 3/6/11 09:15 GMT+1
Marina Manrique has run the assembly through their BG7 bacterial genome annotation pipeline, results are here. http://www.era7bioinformatics.com/en/E_Coli_EHEC_O104_STRAIN_EU_OUTBREAK_era
7bioinformatics.html
Torsten Seeman and Simon Gladman from the Victorian Bioinformatics Consortium have sent me the results of their in-house annotation pipeline. Results are available: contigs reordered according to E. coli EAEC 55989 and TWEC.
NCBI have also posted a preliminary assembly (of a different isolate – LB226692) – although it is not a true de novo assembly. The approach is a bit different. “Reads were mapped with TMAP against the publicly available E. coli 55989 chromosome (CU928145.2) and the derived consensus was split into contigs at zero-coverage regions. These contigs were used as a ‘backbone’ for mapping of reads, followed by de novo assembly of unmapped reads with the MIRA assembler (v 3.2.1). A small number of de novo and consensus contigs were merged using CAP3.”
Update 3/6/11 16:50 GMT+1
There are two O104 isolates sequenced from this outbreak now. This first – named TY2482 – was done by BGI in collaboration with University Medical Centre Hamburg-Eppendorf and the second was done by Life Tech in-house in collaboration with University of Muenster – this is called LB226692. So opportunities for comparison exist now.
In summary: TY2482 assembly (BGI reads, my assembly), LB226692 assembly (Life Tech reads, assembly).
Mike the Mad Biologist has looked at the TY2482 assembly and concludes it is ST678 (or closely related) which agrees with the original molecular typing release from the Robert Koch Institute.
I’ve heard from another group they are planning on sequencing another isolate. I am going to try and find a place where the latest information can be collated to aid in further crowd-sourcing analysis.
Update 3/6/11 19:50 GMT+1
BGI just released two more 314 chips worth of data and their own assembly of TY2482. I don’t have any details on program used or parameters just yet but I’ve enquired.
Who will take on the challenge of building a whole-genome phylogeny?
- Era 7 Bioinformatics

E Coli genome draft annotation by Era7 Bioinformatics (Era7 Information Technologies SLU) is licensed under a Creative Commons Reconocimiento-NoComercial-CompartirIgual 3.0 Unported License.
Creado a partir de la obra enwww.era7bioinformatics.com/en/E_Coli_EHEC_O104_STRAIN_EU_OUTBREAK_era7bioinformatics.html.
- Genetic database entries describing EAEC E. coli strain 55989 and providing access to highly relevant papers about these pathogenic variants.
LOCUS AF411067 12012 bp DNA linear BCT 29-JUL-2002
DEFINITION Escherichia coli strain 55989 plasmid pAA-like agg3 gene cluster, complete sequence.
ACCESSION AF411067
VERSION AF411067.1 GI:22001085
KEYWORDS .
SOURCE Escherichia coli 55989
ORGANISM Escherichia coli 55989
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia.
REFERENCE 1 (bases 1 to 12012)
AUTHORS Bernier,C., Gounon,P. and Le Bouguenec,C.
TITLE Identification of an aggregative adhesion fimbria (AAF) type III-encoding operon in enteroaggregative Escherichia coli as a sensitive probe for detecting the AAF-encoding operon family
JOURNAL Infect. Immun. 70 (8), 4302-4311 (2002)
PUBMED 12117939
LOCUS NC_011748 5154862 bp DNA circular BCT 15-MAY-2010
DEFINITION Escherichia coli 55989, complete genome.
ACCESSION NC_011748
VERSION NC_011748.1 GI:218693476
DBLINK Project: 59383
KEYWORDS .
SOURCE Escherichia coli 55989
ORGANISM Escherichia coli 55989
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
Enterobacteriaceae; Escherichia.
REFERENCE 1 (bases 1 to 5154862)
AUTHORS Touchon,M., Hoede,C., Tenaillon,O., Barbe,V., Baeriswyl,S., Bidet,P., Bingen,E., Bonacorsi,S., Bouchier,C., Bouvet,O., Calteau,A., Chiapello,H., Clermont,O., Cruveiller,S., Danchin,A., Diard,M., Dossat,C., Karoui,M.E., Frapy,E., Garry,L., Ghigo,J.M., Gilles,A.M., Johnson,J., Le Bouguenec,C., Lescat,M., Mangenot,S., Martinez-Jehanne,V., Matic,I., Nassif,X., Oztas,S., Petit,M.A., Pichon,C., Rouy,Z., Ruf,C.S., Schneider,D., Tourret,J., Vacherie,B., Vallenet,D., Medigue,C., Rocha,E.P. and Denamur,E.
TITLE Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths
JOURNAL PLoS Genet. 5 (1), E1000344 (2009)
PUBMED 19165319
REFERENCE 2 (bases 1 to 5154862)
LOCUS NC_011752 72482 bp DNA circular BCT 16-APR-2010
DEFINITION Escherichia coli 55989 plasmid 55989p, complete sequence.
ACCESSION NC_011752
VERSION NC_011752.1 GI:218511148
DBLINK Project: 33333
KEYWORDS .
SOURCE Escherichia coli 55989
ORGANISM Escherichia coli 55989
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia.
REFERENCE 1 (bases 1 to 72482)
AUTHORS Touchon,M., Hoede,C., Tenaillon,O., Barbe,V., Baeriswyl,S., Bidet,P., Bingen,E., Bonacorsi,S., Bouchier,C., Bouvet,O., Calteau,A., Chiapello,H., Clermont,O., Cruveiller,S., Danchin,A.,Diard,M., Dossat,C., Karoui,M.E., Frapy,E., Garry,L., Ghigo,J.M., Gilles,A.M., Johnson,J., Le Bouguenec,C., Lescat,M., Mangenot,S., Martinez-Jehanne,V., Matic,I., Nassif,X., Oztas,S., Petit,M.A., Pichon,C., Rouy,Z., Ruf,C.S., Schneider,D., Tourret,J., Vacherie,B., Vallenet,D., Medigue,C., Rocha,E.P. and Denamur,E.
TITLE Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths
JOURNAL PLoS Genet. 5 (1), E1000344 (2009)
PUBMED 19165319
REFERENCE 2 (bases 1 to 72482)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (18-DEC-2008) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 3 (bases 1 to 72482)
AUTHORS Genoscope -,C.E.A.
CONSRTM Institut Pasteur and Genoscope
TITLE Direct Submission
JOURNAL Submitted (15-DEC-2008) Genoscope – Centre National de Sequencage :
BP 191 91006 EVRY cedex – FRANCE (E-mail : seqref@genoscope.cns.fr)
LOCUS NC_011752 72482 bp DNA circular BCT 16-APR-2010
DEFINITION Escherichia coli 55989 plasmid 55989p, complete sequence.
ACCESSION NC_011752
VERSION NC_011752.1 GI:218511148
DBLINK Project: 33333
KEYWORDS .
SOURCE Escherichia coli 55989
ORGANISM Escherichia coli 55989
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
Enterobacteriaceae; Escherichia.
REFERENCE 1 (bases 1 to 72482)
AUTHORS Touchon,M., Hoede,C., Tenaillon,O., Barbe,V., Baeriswyl,S.,
Bidet,P., Bingen,E., Bonacorsi,S., Bouchier,C., Bouvet,O.,
Calteau,A., Chiapello,H., Clermont,O., Cruveiller,S., Danchin,A.,
Diard,M., Dossat,C., Karoui,M.E., Frapy,E., Garry,L., Ghigo,J.M.,
Gilles,A.M., Johnson,J., Le Bouguenec,C., Lescat,M., Mangenot,S.,
Martinez-Jehanne,V., Matic,I., Nassif,X., Oztas,S., Petit,M.A.,
Pichon,C., Rouy,Z., Ruf,C.S., Schneider,D., Tourret,J.,
Vacherie,B., Vallenet,D., Medigue,C., Rocha,E.P. and Denamur,E.
TITLE Organised genome dynamics in the Escherichia coli species results
in highly diverse adaptive paths
JOURNAL PLoS Genet. 5 (1), E1000344 (2009)
There are several things I’d like to know.
How long does it take to set up an accurate, high-throughput system to detect a specific/novel strain of E. coli? The availability and use of such a testing regime argues in favor of a non-vegetable vector as the culprit. For instance, bottled water. I understand a ‘challenge dose’ of this bacterium is 100 (one hundred) bacteria, so an extremely attenuated population can be virulent.
Doesn’t this incident provide support for ‘cold pasteurization’/irradiation? In turn, wouldn’t this cheap, effective food safety measure provide support for labeling of ground beef/raw vegetables etc., that hasn’t been irradiated, as ‘May Contain Live Fecal Bacteria’?
What are the odds that someone could come up with a human vaccine against E. coli? Are there too many strains, too diverse, to make that possible? But also, doesn’t ‘friendly’ E. coli actually provide benefits in the human gut, mean that a vaccine would be a bad idea?
Detecting something “novel” would be real hard to do. If it is novel, you can’t look for a specific tracer via PCR or antibodies. There are zillions of novel things out there.
Typically what is looked for are “coliforms” which cover a very large group of organisms, most of which are not disease causing. Everyone has about a pound of these in their gut and the specific ones in your gut are virtually always not causing any problems.
Daedalus,
I see your point about novelty — but it appears that in this case the novelty is now gone. Specific gene sequences have been identified as unique to this pathogen.
So the question becomes: how fast can a lab ‘scale up’ a high-throughput test when the unique sequences are known? Week, month, year?
Time and time again I try and correct students who describe a problem like this as being caused by E. coli. There are so many different types of this germ, and most are harmless. Lets call them “this pathogenic strain of E. coli” or some such label.
Yes, We depend of E.coli for some vitamins.
Working fast, it could take 48 hours to develop, plus another day to validate.
Is it possible that the pathogen involved is not a chimera, but rather, different strains coexisting?
No, not very likely..
The bacteria are routinely culture from single cells. E. coli consistently behaves a clonable cells. The Genome sequences also consists of CONTIGuous stretches of DNA, so within a contig the sequence data themselves support a hybrid event. Mixtures would generate overlapping signals.