Bklein7 Week 11

From LMU BioDB 2015
Jump to: navigation, search

Assessment of the Genome Sequencing Paper for Bordetella Pertussis

All of the content in this section discusses the following genome sequencing paper:

  • Parkhill, J., Sebaihia, M., Preston, A., Murphy, L. D., et al. (2003). Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nature genetics, 35(1), 32-40. doi:10.1038/ng1227
  • PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed/12910271
  • PubMed Central: Not available on PubMed Central.
  • Publisher Full Text (HTML): http://www.nature.com/ng/journal/v35/n1/full/ng1227.html
  • Publisher Full Text (PDF): http://www.nature.com/ng/journal/v35/n1/pdf/ng1227.pdf
  • Copyright: ©2003 Nature Publishing Group (information found on PDF version of article). This article is not Open Access, but it is freely available 6 months after publication.
  • Publisher: Nature Publishing Group (for-profit).
  • Availability: In print and online.
  • Did LMU pay a fee for this article: Yes, LMU pays a subscription fee for access to the journal Nature Genetics.

The Parkhill et al. (2003) paper was accessed using the link listed above as "Publisher Full Text (PDF)".

Defining Unknown Biological Terms from Parkhill et al.

Unknown terms from the Parkhill et al. (2003) paper were entered into search engines, and results were vetted until definitions for each were found in quality sources. Links to the sites from which the definitions were attained are included below.

  1. Pseudogenes
    • Pseudogenes are genomic DNA sequences similar to normal genes but non-functional; they are regarded as defunct relatives of functional genes.
    • Citation: http://pseudogene.org/background.php
  2. Fimbriae
    • Modern term for short, hair-like projections or appendages (organelles) on the outer surface of certain bacteria composed of protein subunits (pilin) extending outward from the surface that act as a virulence factor by promoting adherence; formerly known as pili.
    • Citation: http://www.life.umd.edu/classroom/bsci424/Definitions.htm
  3. Auxotrophy
  4. Ortholog
  5. Insertion Sequence Elements (ISEs)
    • Insertion sequences, or insertion-sequence (IS) elements, are now known to be segments of bacterial DNA that can move from one position on a chromosome to a different position on the same chromosome or on a different chromosome. When IS elements appear in the middle of genes, they interrupt the coding sequence and inactivate the expression of that gene.
    • Citation: http://www.ncbi.nlm.nih.gov/books/NBK21779/
  6. Horizontal Gene Transfer
  7. Prophage
  8. Autotransporter
    • The key feature of an autotransporter is that it contains all the information for secretion in the precursor of the secreted protein itself. Autotransporters comprise three functional domains: 1) an N-terminal targeting domain (amino-terminal leader sequence) that functions as a signal peptide to mediate targeting to and translocation across the inner membrane 2) a C-terminal translocation domain (carboxy-terminal) that forms a beta-barrel pore to allow the secretion of 3) the passenger domain, the secreted mature protein.
    • Citation: https://www.ebi.ac.uk/interpro/entry/IPR006315
  9. Type-III Secretion System
    • The protein Type III Secretion System (T3SS) is a supramolecular, organic nanomachine that injects bacterial virulence proteins into eukaryotic cells to modulate their physiology for the benefit of the pathogen.
    • Citation: http://lab.rockefeller.edu/stebbins/research/T3SS
  10. Constitutive Expression

Comparative Analysis of the Genome Sequences of Bordetella Pertussis, Bordetella Parapertussis and Bordetella Bronchiseptica Outline

The following outline was adapted from the original paper published by Parkhill et al. (2003) in Nature Genetics.

Introduction

Establishing the Importance of B. pertussis, B. parapertussis, and B. bronchiseptica

  • All three of these bacteria are pathogens that colonize the respiratory tracts of mammals.
    • B. bronchiseptica- infects a wide range of mammals.
    • B. parapertussis- infects both humans and sheep.
    • B. pertussis- specific to humans vectors.
      • B. pertussis is the causative agent of whooping cough.
      • Despite vaccination programs, whooping cough is still endemic in some countries, causing hundreds of thousands of deaths every year.
  • Evidence suggests that B. pertussis and B. parapertussis may have evolved in the recent past from a common ancestor, possibly the more genetically diverse species B. bronchiseptica.
    • The species exhibit similar virulence factors.
    • Gene expression in these species is regulated by the two-component BvgA/S regulatory system.
      • Bvg-plus phase: vector detected. Virulence-activated genes (vags) up-regulated and virulence-repressed genes (vrgs) down-regulated.
      • Bvg-minus phase: in the environment. Standard gene expression occurs.

Experimental Design: Sequencing B. pertussis, B. parapertussis, and B. bronchiseptica

  • Specific strains of each Bordetella species were sequenced.
  • Genome sequences were compared to:
    1. Compare genetic background.
    2. Assess factors influencing variable disease severity.
    3. Assess factors influencing variable host range.

Results

Structure of the Genomes

  • Figure 1 compares the general properties of the three sequenced genomes using circular representations. Genes are labelled to show similarities & differences between genomes and color-coded to show association with several key gene ontology (GO) terms.
    • Suggests that B. pertussis and B. parapertussis evolved from an ancestor similar to B. bronchiseptica.
    • Calculations of time to most recent common ancestor (MRCA) suggest a recent bottleneck as opposed to recent descent from B. bronchiseptica
  • Table 1 quantifies the general features of the sequenced genomes presented in Figure 1. Counts for categories such as "pseudogenes" and insertion sequence elements (ISEs) are included for use in later analyses.
    • Initially used to support the conclusions drawn from Figure 1.
  • Figure 2 linear genomic comparison between the sequenced genomes. Red lines indicate similarities and black triangles indicate ISEs.
    • B. parapertussis & B. bronchiseptica are more similar than B. pertussis & B. bronchiseptica.
    • Suggest that frequent recombination and deletion has occurred in the genomes of B. bronchiseptica and B. pertussis.
    • Losses in the B. pertussis genome are due to expansion in ISEs in this species, resulting in ISE-mediated deletion events.

Gene Complements

  • Figure 3 presents a venn diagram comparing the gene complements of the three Bordetella spp..
    • Support the hypothesis that B. pertussis and B. parapertussis are derivatives of B. bronchiseptica.
      • Very few genes are unique to B. pertussis and B. parapertussis (114 and 50, respectively).
      • B. bronchiseptica contains over 600 genes that are not present in the genomes of the other two species.
  • Figure 4 categories genes lost by B. pertussis and B. parapertussis based on their associated GO terms.
    • Demonstrates that genes lost in the derivative Bordetella species are involved in the following processes: membrane transport, small-molecule metabolism, regulation of gene expresion, and synthesis of surface structures.
  • B. pertussis and B. parapertussis appear to have lost the function of many genes present in B. bronchiseptica through the formation of pseudogenes (in addition to deletion losses).
    • Figure 4 demonstrates that the genes lost in this manner are, once again, associated in the same processes as above (e.g. membrane transport).

Metabolism

  • Genomic analysis demonstrates that all three isolates predominantly share a common central and intermediary metabolism.
  • Bordetella all share glutamate as their main carbon source, as they do not synthesize or break down glucose.
  • Bordetella all do not have a complete pathway for the biosynthesis of cysteine.
    • Suggests Bordetella have lost an ancestral ability to use external sulfur sources for cysteine synthesis.
  • The fact that B. pertussis and B. parapertussis cannot survive in the environment without a host while B. bronchisepticacan, despite these similarities, suggests the loss of many accessory pathways for use of alternative nutrient sources.

Host Range and Pathogenicity

  • FhaB (protein involved in attachment to host cells) variation.
    • Ortholog present in all three species, but there are internal variations in each.
    • Two extra genes encoding FhaB-like proteins are contained in the B. pertussis genome.
    • Variation Present in other fimbrial systems as well.
    • Suggests that the Bordetella variable host specificity may be influenced by changes in this receptor-ligand interaction.
  • Autotransporter protein variation.
    • Table 2 lists the autotransporter proteins encoded for by B. pertussis, B. parapertussis, and B. bronchiseptica. This table demonstrates that autotransporter complements differ amongst the three species.
    • B. pertussis and B. parapertussis have fewer genes and more pseudogenes.
    • May also impact host specificity.
  • Siderophore similarities and variation.
    • All three species contain operons for the siderophore alcaligin, suggesting they scavenge iron from their hosts.
    • Table 3 lists the TonB-dependent ferric complex receptors encoded for by B. pertussis, B. parapertussis, and B. bronchiseptica. This table demonstrates that each species codes for up to 16 of these molecules, which also suggests iron scavenging behavior.
    • B. pertussis and B. parapertussis have fewer genes and more pseudogenes that code for siderophores.
    • Lacking necessary siderophores may influence host specificity.
  • Variation is present in the following virulence structures: type-III secretion systems, O-antigens, and flagella.
    • Variations in the O-antigen that were identified supported claims made in previous literature.
    • The full flagellar operon is intact in B. bronchiseptica but inactivated in B. parapertussis and B. pertussis.
      • Suggests this lack of motility confers host-restrictions.
  • Discovered a locus that codes for a polysaccharide capsule.
    • Typical type II capsule arrangement.
      • The presence of this type of capsule in Bordetella spp. was argued for in previous literature.
    • Expression of this capsule is variable among the three species:
      • B. bronchiseptica has the intact capsule locus.
      • B. parapertussis has an introduced stop codon, prevent expression of this capsule.
      • B. pertussis has mutations at the beginning and end of the locus, suggesting that little to no capsule expression occurs.
    • These findings suggest that the capsule does not influence pathogenesis in humans, and that it instead aids in survival in the environment.
  • Overall conclusion: The absence of surface structures such as flagella, fimbriae, and a capusle, increase the virulence of B. pertussis (and B. parapertussis) by reducing immune system targets.
  • Variations in toxin production.
    • The pertussis toxin operon is present in all three species but expressed differently:
      • B. parapertussis and B. bronchiseptica contain the operon but do not express it due to changes in the ptxA promoter region.
        • This supports previous claims made in the literature.
      • B. pertussis produces the toxin.
    • Figure 5 compares the ptxA promoter region present in the pertussis toxin operon among all three Bordetella species.
      • Demonstrates that the majority of base changes (62%) are present only in B. pertussis.
      • Suggests recent mutations allowed for pertussis toxin expression.

Discussion

Comparing the Genomes of B. pertussis, B. parapertussis, and B. bronchiseptica elucidates differences in host range and pathogenesis.

  • Host interaction factors and virulence determinants were not recently acquired by the more virulent strains (B. pertussis and B. parapertussis). This was contrary to expectation.
  • Individual traits in the derivative species (B. pertussis and B. parapertussis) that have enabled virulence were generated by independent gene deletions and inactivations by creation of pseudogenes.
  • Increased virulence can be explained by:
    1. Overexpression of virulence traits in human vectors.
    2. Loss of structures that allowed for simpler immune recognition.
    • In the process of explaining increased virulence, several findings supported pre-existing literature:
      • The pertussis toxin operon is present in B. parapertussis and B. bronchiseptica but not expressed.
      • B. bronchiseptica and possibly B. pertussis have polysaccharide capsules.
      • O-antigen variations are present between the three species.
  • Why were these changes selected for?
    • Coevolution with the expansion of Homo sapiens.
    • Higher transmission rates no longer necessitate environmental survival (e.g. loss of capsule) or limited damage to the host (e.g. pertussis toxin).

Overall Impact

  • This paper sequenced the genomes of B. pertussis, B. parapertussis, and B. bronchiseptica; provided evidence for the common ancestry of these organisms; identified new information regarding their capsules; and proposed the genetic roots of increased virulence in the two strains that are human pathogens. These findings can be applied to genomic analyses of future pathogens to assess virulence.

Methods

DNA Preparation and Sequencing

  • Strains were gifted to the researchers from the following sources:
    • Food and Drug Administration (B. pertussis Tomaha I).
    • UCLA (B. bronchiseptica RB50).
    • Universitat Erlangen (B. parapertussis strain 12822).
  • Bacteria were grown on Bordet Gengou agar supplemented with 15% defibrinated horse blood at 37 degrees Celcius for three days.
  • Initial genome assemblies were obtained from thousands of paired-end sequences derived from genomic shotgun libraries using dye terminator chemistry on automated sequencers.
  • Thousands of paired-end sequences from a pBACe3.6 library with sizes ranging from 15-48k kb were used as scaffolding.
  • Thousands of further sequence reads were generated during sequencing.

Annotation and Analysis

  • Artemis was used for data collection and annotation.
  • Best-match FASTA comparisons were used to identify orthologous genes.
  • Pseudogenes were identified by direct comparison.
  • Calculations were made using the genome data produced:
    • Synonymous substitution frequency values were calculated from orthologous gene pairs.
    • Ages of divergence were estimated.

Preliminary Presentation

File Link: File:Genomepaper cw20151116.pdf

Bordetella Pertussis Model Organism Database

Identifying the Bordetella Pertussis MOD

On Thursday, November 12, several Google searches were conducted to find a potential MOD for Bordetella Pertussis. The following queries returned unsuccessful results:

  • "Bordetella Pertussis Model Organism Database"- too narrow of a search for this organism.
  • "Bordetella Pertussis Database"- too broad and unspecific of a search for a MOD.

However, the query "Bordetella Pertussis Gene Database" brought me to the MOD for "Bordetella Pertussis". To confirm that this database was the MOD, I did the following:

  1. I assessed the information present on the database homepage.
    • A reference was present to the original genome sequencing paper for Bordetella Pertussis that I outlined above, which was promising. The contact information for the primary author of this paper was also included.
    • Links to the sequence data for this organism were present.
    • A search bar and various gene search filters were present.
  2. I did a sample search for a Bordetella pertussis gene.
  3. I confirmed the GeneDB website as the Bordetella pertussis MOD with Dr. Dahlquist.

Assessing the GeneDB MOD for Bordetella pertussis

Citation for GeneDB:

  • Logan-Klumpler, F. J., De Silva, N., Boehme, U., et al. (2012). GeneDB—an annotation database for pathogens. Nucleic acids research, 40(D1), D98-D108. doi: 10.1093/nar/gkr1032
  1. What types of data can be found in the database (sequence, structures, annotations, etc.); is it a primary or “meta” database; is it curated electronically, manually [in-house], or manually [community])?
    • GeneDB contains Bordetella pertussis gene sequences, locations, extensive annotations, and information regarding the proteins that the genes code for such as the proteins' lengths in amino acids and molecular masses.
    • GeneDB is a primary database [1].
    • GeneDB undergoes extensive manual [in-house] curation [2]. For Bordetella pertussis, manual curation is directed by Dr. Julian Parkhill.
  2. What individual or organization maintains the database?
    • GeneDB is maintained by the Sanger Institute [3].
  3. What is their funding source(s)?
    • GeneDB receives their funding from grants. These grants are both held by researches at the Sanger Institute and by researchers elsewhere that collaborate in the GeneDB project [4].
  4. Is there a license agreement or any restrictions on access to the database?
    • Access to the GeneDB database is subject to their Terms and Conditions.
    • There are no restrictions on access to the data present in the database. However, the Sanger Institute has a Data Use Statement in which it expressly states that "permission of the principal investigator should be obtained before publishing analyses of the sequence/open reading frames/genes on a chromosome or genome scale" [5].
  5. How often is the database updated?
    • GeneDB is updated every 24 hours [6].
  6. Are there links to other databases?
    • The only link to an external site that is not hosted by the Sanger Institute is to the Bordetella pertussis Wikipedia Page.
    • No links to other databases are present. All of the tools presented by GeneDB such as their BLAST tool are entirely self-contained.
  7. Can the information be downloaded?
    • Information regarding genes and the proteins that they code for can be dowloaded directly from GeneDB. The download page, however, it somewhat tricky to access. To do so, follow these instructions accessed from the FAQ Page:
      • GeneDB Download Instructions.png
    • Additionally, there is a FTP link to the genome sequence data for Bordetella pertussis on the Main Page.
    • In what file formats?
      • Gene/Protein files and formats: .csv, .xls, .fas
      • FTP link files and formats:
      • GeneDB BP Files.png
  8. Evaluate the “user-friendliness” of the database.
    • Is the Web site well-organized?
      • The GeneDB web site is certainly well-organized. The site has a clean aesthetic look that is not overwhelming and provides quick access to powerful tools. Titles for tools and search options are intuitive, while the names of headings cover many of the main concerns the typical user would have. For instance, it was very easy to navigate to information regarding how the database is curated and where the information within the database comes from starting at the home page. Further, many of the links appear to be terminal (i.e. they do not include links to deeper GeneDB pages), which means that I never felt lost while using the database. The Home Page was almost always one click away. A screenshot of the Home Page is available below for reference:
      • GeneDB Home Page.png
    • Does it have a help section or tutorial?
      • The Main Page contains a link to a FAQ Page that answers some common questions regarding how to use the database such as how files are downloaded. However, this FAQ page if not a complete tutorial of how to use the database.
      • Unfortunately, I was not able to navigate to a more complete tutorial, and I do not believe one is present on the GeneDB website.
      • Although GeneDB is a relatively small database, it includes niche tools such as "Web Artemis", "Jbrowse", and "AmiGO" that are not intuitive for an inexperienced user. Documentation for using these tools should be included or linked to in the GeneDB website for completion and user-friendliness.
    • Run a sample query. Do the results make sense?
      • I ran a sample query for "BP3783", the gene that codes for pertussis toxin. I was instantly brought to a page that summarized the gene "BP3783" and provided links to detailed information regarding the gene such as its sequence. The results are all clearly labelled, and I found them very easy to read and interpret. As an added bonus, the search result was added to my search history for future access. The only part of the results window that did not immediately make sense to me was the "Web Artemis" window. A screenshot of the results page for "BP3783" is included below:
      • GeneDB Sample Query.png
  9. What is the format (regular expression) of the main type of gene ID for this species (the "ordered locus name" ID)? (for example, for Vibrio cholerae it was VC#### or VC_####).
    • The format of the main type of gene ID for Bordetella pertussis is BP####.

Links

Assignments Pages

Individual Journal Entries

Shared Journal Entries