Blitvak Week 11

From LMU BioDB 2015
Jump to: navigation, search

Initial Project Work

Work done on 11/10

  • A possible MOD covering B. cenocepacia was found: Burkholderia Genome Database
  • A test search was conducted for 'Burkholderia cenocepacia J2315

11/12

  • The possible MOD was accessed and a test search was conducted:

Test search BL.png

  • It was noticed that most "locus tags" were in the format BCAL####
  • Some of the tags included an A at the end
  • It was noticed that instead of BCAL, some genes were BCAM or BCAS
  • Another test search was performed to observe the total number of genes with an ID that begins with "BCA":

BL testSearch 2.png

    • This searched yielded 7341 records (potentially, 7341 different genes)
  • A similar search was conducted using the term "BCAL", which should cover all of genes with IDs that start with "BCAL"
    • This search yielded 3603 records
  • Another search was done using "BCAM"
    • This search yielded 2859 records
  • A search was done which covered the term "BCAS"
    • This search yielded 779 records
  • The last three searches were summed, yielding 7241 records (100 are unaccounted for)
  • An advanced search was done in order to find the 100 records with an unknown starting pattern:

Final TestSearch BL.png

    • It was found that the last 100 records started with pBCA (100 results, only)
    • In this last search, it was noticed that most of the tags only had 3 numbers and that some ended with a lowercase a
  • Reviewing the website, it was noticed that "%" is interpreted as wildcard by the database. A search was then done using the term "BCA%a" in order to see the number of records that ended with an "A"; it was noticed that some of these results included a lower-case r before the numbers (looking at the other columns in the search result, such as "Product name", it was realized that the tags that included the r corresponded to genes encoding for different types of tRNA (which correspond to the different amino acids)
  • In the latest news section of the database it was stated that there is an updated beta version of the MOD available (http://beta.burkholderia.com/); this updated website will be used in future work

Preparation for Genome Paper Presentation

Unfamiliar Biology Terms from the Manuscript

  • Saprophyte: An organism that absorbs, feeds upon, or grows upon decaying organic matter or waste (matter could be originally from animal or plant sources)
  • Orthologous gene/Ortholog: A gene that is found within two or more species that likely originated within a common ancestor (the genes, if orthologous, can be traced back to a common ancestor)
  • CDSs: abbreviated form of "coding sequences" (the region of DNA that is actually transcribed and translated to protein or functional RNA)
  • Replicon: Unit of DNA that contains a DNA replication origin, a termination point, and the potential for self-replication OR a linear or circular segment of DNA/RNA which replicates (sequentially) as a unit
  • Concatenation (in relation to genetics): The joining of two DNA fragments (in a lab, physically, or in software)
  • Genomic Island: Gene clusters that likely originated due to horizontal gene transfer (>8 kb in size) in bacterial/archaeal genomes. Genomic islands encode for genes that are notable adaptations with environmental and medical interest (these genes play an important role in the evolution, or population change, of such microbes with the islands)
  • Mobile genetic elements: small mobile sequences of DNA which can replicate and insert themselves at random sites within chromosomes (also known as a transposon). In bacteria, MGEs come in simple (only code for the genes needed for insertion) and complex (contains genes in addition to what's needed for insertion) forms.
  • Rearrangement: Structural change of a chromosome that leads to a change in the loci order
  • Prophage: Genome of a lysogenic bacteriophage that has come to be incorporated into the chromosome of the bacterial host (prophage is replicated along with the host chromosome)
  • Efflux system: An active transport system (localized in the cytoplasmic membrane) with the purpose of moving substrate in and out of a bacterial cell (e.g. antibiotics)
  • Peritrichous: Related to cilia/appendage organs projecting from around the cell; uniform distribution of flagella over a cell
  • Fimbrial: Short filamentous projection in a bacterial cell that is used for adherence to other bacterial cells or to animal cells (not for motility)

Article Outline

Link to Article: http://jb.asm.org/content/191/1/261.long

Importance of the work

  • B. cenocepacia is a very clinically relevant part of the B. cepacia complex (BCC), which is a group of hardy (high degree of antibiotic resistance) gram-negative bacteria that typically reside in water or soil (18 different species, with some being plant or human pathogens). B. cenocepacia is an opportunistic pathogen which causes lung infections in CF (cystic fibrosis) patients; infection by B. cenocepacia is extremely difficult to treat due to a high level of antibiotic resistance, and thus, infection is tied to increased levels of mortality and a decline in the functioning of the lung. The manuscript covered the genome of B. cenocepacia J2315, which is a member of a recently emerged (1990s) epidemic lineage of B. cenocepacia that was extremely transmissible (especially between people with CF); this epidemic lineage is known as the ET12 epidemic strain, that is a part of the IIIA subgroup of B. cenocepacia (subgroups were found, phylogenetically, through the analysis of the recA gene). IIIA strains, unlike those associated with the other subgroups, are rarely encountered in a natural environment, suggesting that the strains have strongly adapted to a host-associated pathogen lifestyle (versus that of a soil saprophyte). There also exist many virulence markers that are encountered more frequently with IIIA strains than with other subgroups; the ET12 isolates, additionally, are known to have a cable pilus which permits binding to molecules within the host environment, such as mucins (which are abundant in the lung). J2315, specifically, is an isolate derived from a CF patient and it exhibits strong levels of antibiotic resistance; it is a member of the ET12 lineage which is a part of the IIIA subgroup. The value of the genomic analysis of J2315 lies in the fact that it will give some elucidation regarding the factors responsible for the success of the strain (via CF patient infection); genomic analysis will also help explain how the members of the ET12 lineage adapted, recently, to holding a niche via human infection (instead of holding a niche in the soil, as a soil saprophyte). In short, J2315 represents a unique and extremely significant pathogen in the realm of CF treatment as it possesses properties that allow it thrive even further in the lung environment than other related strains/subgroups; genomic analysis will produce something that will serve as an essential resource for future investigations into J2315 and the disease that is caused by Burkholderia cenocepacia.

Methods Employed in the Study

Sequencing
  • Used strains of B. cenocepacia in this study: K56-2, BC7, LMG 13307 (BCC0162), CEP0791 (BCC0077), LMG 13320 (BCC0179), FC0504 (BCC0313), LMG 18827 (BCC0016), BCC1261, CEP0826 (BCC0222).
  • J2315 was grown via broth culture and was harvested through centrifugation. Bacterial pellets were suspended in a solution designed for cell lysis; the lysate was then incubated and the DNA was purified (via protein and polysacharride precipitation, which was later removed by centrifugation). DNA was collected from the lysate through ethanol precipitation
    • Note: Protocol for DNA extraction was not directly covered in the genome paper; another paper was cited with these methods, which is: Identification and characterization of a novel DNA marker associated with epidemic Burkholderia cepacia strains recovered from patients with cystic fibrosis (authors stated "DNA was extracted exactly as described previously")
  • Sequence data were derived from the creation of genomic shotgun libraries (m13mp18 and pUC18 libraries); the shotgun sequencing led to 215,165 end sequences (which represents 11.9 fold coverage).
  • Sequence was annotated using Artemis software and initial coding sequence predictions were done through the use of software (Orpheous, Glimmer2, and Easygene). The predictions made by the software were combined and they were further refined using comparisons to nonredundant protein databases via BLAST/FASTA software, positional base preference methods, and codon usage analysis
  • The whole DNA sequence, using all 6 possible reading frames, was also compared against UniProt, via BLASTX, to improve the quality of previous work (purpose was to identify any possible coding sequences that were missed to earlier work)
  • Protein structural motifs were identified through the use of Pfam and Prosite, transmembrane domains were found through TMHMM; signal sequences were identified through the use of SignalP version 2.0
  • Stable RNAs and tRNAs were identified through the use of Rfam and tRNAscan-SE, respectively
  • rRNAs were identified through the use of BLASTN alignment with defined rRNAs from EMBL nucleotide database
Genome Sequence Comparison
  • The J2315 genome was compared to B. vietnamensis strain G4, B. ambifaria strain AMMD, Ralstonia solanacearum strain GMI1000, B. thailandensis strain E264, B. mallei strain ATCC 23344, B. pseudomallei strain K96243, B. contaminans strain 383, B. xenovorans strain LB400, and B. cenocepacia strains AU1054 and HI2424
  • Artemis Comparison Tool was used to support the comparison of genome sequences; it allowed the visualization of TBLASTX and BLASTN comparisons.
  • FASTA, with manual curation, was utilized to identify orthologous proteins as "reciprocal best matches"
  • Inactivating mutations in pseudogenes were checked against the original sequencing data
PCR
  • PCR amplification was conducted using the primers BCAL3517 (annealing temp. 63 to 68°C), BCAL3223 (60 to 68°C), BCAL3125 (60°C), BCAM2228 (68°C), and BCAM0856 (68°C)
  • PCR was done using Platinum Pfx DNA polymerase with 1/10 enhancer solution
  • Amplification: First 94°C for 10 min, then 40 cycles of 94°C for 30 seconds, and 68°C for 1 min per kilobase, then a final extension of 10 min at 68°C
Sequence and Annotation Deposit
  • Complete genomic sequence of B. cenocepacia strain J2315 was placed in the EMBL database (accession numbers: AM747720, AM747721, AM747722, and AM747723)

Figure/Table Results

  • FIG. 1: Complete genome of J2315 is comprised of three circular chromosomes and a plasmid. Chromosomes are of sizes 3,870,082, 3,217,062, and 875,977 bp; plasmid is 92,661 bp. There exist several relatively large RODs and genomic islands (many other B. cenocepacia strains lack orthologues with respect to the genomic islands). Chromosome 1 seems to have a moderate number of genomic islands and too many RODs, Chromosome 2 appears to have the smallest number and size with respect to the genomic islands, Chromosome 3 seems to have the largest number of genomic islands and RODs (and, consequently, the smallest percent of orthologous genes compared to the other strains and to Ralstonia solanacearum).
  • FIG. 2: Chromosomes 2 and 3 contain a greater number (proportion) of coding sequences that have an accessory role (involved with functions like horizontal gene transfer and protective response) or with an unknown function. Chromosome 1 has a greater proportion of coding sequences that are related to core cell functions, such as cell division/chromosome replication, macromolecule/amino acid/nucleotide biosynthesis (genes related to metabolism and division).
  • FIG. 3: Number/percent of orthologous coding sequences in the J2315 was greatest in groups that were more taxonomically related to J2315. 78 to 63% of total coding sequences in BCC members, 56 to 50% in other Burkholderia species, and 37% of all coding sequences in Ralstonia solanacearum. Chromosome 1 has the highest degree of conservation, chromosome 2 a little less, and chromosome 3 has the lowest degree of conservation.
  • TABLE 1:8,055,782 total bp; chromosome 1 largest, chromosome 2 is similar to size but is smaller, chromosome 3 is the smallest (also exists a small plasmid). G+C content % is similar between all four replicons (smallest in plasmid, followed by chromosome 1). There are 7,261 total coding sequences (85.9% of DNA involved with coding). Plasmid has the smallest average gene length. Chromosome 1 possess the vast majority of tRNA related genes (66, compared to the 6 and 2 of chromosome 2 and 3, respectively). Chromosome 1 also holds the majority of IS elements (vast), pseudogenes/partial genes (moderate majority), and of miscellaneous RNA (vast majority).
  • TABLE 2: 15 genomic islands (most and largest appear on chromosome 1); many genomic islands also have IS elements which are integrated into various sites in the bacterial genome (all islands, except one, contain one putative integrase). Many islands are prophages/have phage origins and most are miscellaneous islands. Many resistances are linked to the cenocepacia island (antibiotic/arsenic resistance, along with stress response coding sequences).
  • TABLE 3: Variety of virulence functions are encoded in the J2315 genome; several genes related to the virulence functions are absent from other strains of B. cenocepacia. Cable pilus coding sequences are unique to J2315 compared to other strains; five BuHA family proteins are also unique to J2315
  • TABLE 4: Many drug resistance determinants target unknown antibiotics/antimicrobial compounds. Some coding sequences are strain-specific (J2315 has elevated drug resistance, compared to other strains). Six families of transport systems were identified (ABC, MFS, MATE, RND, SMR, and fusaric acid resistance family proteins)
  • TABLE 5: Many virulence determinants found in other B. cenocepacia strains were found to be pseudogenes in J2315
  • TABLE 6: All tested ET12 strains possess pseudogenes which disrupt cepacian capsule functions and pyochelin biosynthesis. Only J2315 has a pseudogene at BCAL3517 (T2SS) with a 110 bp deletion. O antigen is also interrupted in J2315 and similarly in BCC0016 (which is an ET12 strain), all others are uninterrupted or with no product. BCAL3223 is also interrupted (differently from the K56-2 strain).

How the results relate to other work

  • Exchange of mobile genetic elements/genomic island movement supports the spread of genetic information between diverse bacteria (a benefit to a bacterium in its current environment, or something that can allow the bacterium to adapt to new niches like that of the CF lung via a host-associated pathogen lifestyle)
  • J2315 genome has 14 genomic islands that are absent from the other B. cenocepacia strains; acquirement of genomic island genes are suggested to have introduced adaptations and functions that promote survival/pathogenic character within the lung environment)
    • CCI (BcenGI11) island is involved with infection, is found throughout the members of the ET12 lineage, and is more frequently occuring in IIIA strains than IIIB; this island is important with respect to the virulence and survival of the strain within the lung environment and the importance/contribution of the other genomic islands is yet unclear (many islands involved with metabolic functions or have still unknown functions)
  • Other RODs do not have mobile genetic element properties and are a more stable part of the genome of J2315; cable pilus locus and AdHA adhesion protein are located within these unique genes. Cable pilus/AdhA complex has been tied to the binding of mucin (abundant in the CF lung environment as a result of poor clearance) and cytokeratin 13 (cytoplasmic protein that is known to become surface exposed during chronic infection in individuals with CF). The complex is also related with the ability of the species to bind/invade epithelial lung cells (is known to bind to explant culture lung tissue from CF patients). Each piece of the complex is suggested to have been acquired independently but it exhibits synergy (orthologs abasent from other BCC strains)
    • Many other virulence related genes have been found in the RODs specific to J2315
  • B. cenocepacia strains have been shown to be very hardy and resistant to a variety of antibiotics in recent studies; strains derived from the ET12 lineage have different antibiotic sensitivities compared to other lineages/strains.
    • Among other members of the ET12 lineage, J2315 has shown to have an enhanced level of antibiotic resistance to several antimicrobial agents
    • It was found that the J2315 genome has drug resistance related genes within the islands and RODs (conclusions: horizontal gene transfer plays a role in the acquirement/evolution of drug resistance in already resistant organisms).
  • Enhanced drug resistance is suggested to have been the result of point mutation (a mutation affecting only one or few nucleotides in a sequence)
    • gene loss via mutation played an important role, in conjunction with the acquirement of new functions, played essential roles in the success and development of J2315
  • Other work has found that BCC have many characteristic virulence determinants that hold important roles in the survival within the "natural" environment (soil); this study found that many of these determinants are pseudogenes or are nonfunctional within the J2315 strain
    • The work suggests that many of these functions are not essential or are even superfluous for the lung environment (lack of these genes could have actually boosted the survival and success of J2315 as a host-associated pathogen)
  • All screen ET12 strains were found to have the exact same frameshift mutation (insertions or deletions of a number of nucleotides) in the pchF gene (pyochelin siderophore biosynthesis); other studies have found that pyochelin-negative strains occur more frequently in patients with moderate/mild CF related infections whereas pyochelin-positive strains were more frequently found in patients with serious pulmonary disease (high possible mortality, cases). Pyochelin production was linked in previous studies to a more severe disease progression (towards mortality) and the lack of pyochelin production in J2315 sheds some light on its infectious nature.
    • No expression of pyochelin production genes in ET12 strains could support the persistence of the strain within the lung and could facilitate its spread between patients (effects of infection are less pronounced, initially)
    • Chronic infections (J2315) puts pressure against the selection of functions that lead to acute infections (rapid onset of disease) -> J2315 is tied to a long-term infection which facilitates a spread between individuals
    • Other recent studies have suggested that phenotype switching occurs in B. cenocepacia strains that reside in the lung environment of CF individuals (study of CF patients in the Vancouver area over a 26-year period); loss of EPS (extracellular polymeric substance) production was tied to increased disease severity (rather than persistence within the lung), ET12 lineage members like J2315 and K56-2 have mutations in the EPS cluster. Recent studies regarding the adaptations/mutations of other bacterial species/strains during infection reflect, in some aspects, what was found with J2315 (some parallels and differences; key difference is the difference between mucoid and nonmucoid character with respect to virulence). Nonmucoid character is tied to lesser virulence in one species but is tied to more virulence with ET12 lineage strains (differences exist in the roles that EPS plays in the pathogenicity and host interactions among CF pathogens)
  • Overall, J2315 has a genome that promotes CF lung growth and spread (knowledge of it can help explain its character and infectious nature)
    • Many functions were lost through mutations; many functions were gained through horizontal gene transfer (both cases appear to have promoted the growth/persistance within the CF lung environment, success contribution)

Exploring the MOD

  • CITATION: Winsor, G. L., Khaira, B., Van Rossum, T., Lo, R., Whiteside, M. D., & Brinkman, F. S. (2008). The Burkholderia Genome Database: facilitating flexible queries and comparative analyses. Bioinformatics, 24(23), 2803-2804.
  1. What types of data can be found in the database (sequence, structures, annotations, etc.); is it a primary or “meta” database; is it curated electronically, manually [in-house], or manually [community])?
    • Sequences, annotations, replicon information, sample information (provider, geographic location, host), subcellular localization details, along with a general strain overview are made available for 355 burkholderia strains ( ortholog and subcellular localization predictions are provided). This is a "meta" database that organizes and works with information provided by other databases (NCBI primary databases are utilized). The database is manually curated (in-house) by the The Brinkman Lab at Simon Fraser University; however, annotation updates are based upon other sequence databases, literature review, and research submissions (log of annotation updates is available).
  2. What individual or organization maintains the database?
    • The Burkholderia Genome Database is managed by The Brinkman Lab at Simon Fraser University
  3. What is their funding source(s)?
    • Funding is provided by: Cystic Fibrosis Foundation Therapeutics, Inc. (CFFT), "a non-profit drug discovery and development affiliate of the Cystic Fibrosis Foundation".
  4. Is there a license agreement or any restrictions on access to the database?
    • Publicly available
  5. How often is the database updated?
    • Functional characterizations (TIGRFAM, COG, PFAM or GO) are updated on an annual basis (updates that utilize other databases occur annually). Updates based on the input in literature/researchers occur much more frequently (all updates are logged, annotation updates occur at variable times, sometimes monthly)
  6. Are there links to other databases?
    • Links were found to NCBI BioSample, NCBI Taxonomy, NCBI BioProject, NCBI Assembly (as Cross-References on the "Profile" for each strain")
    • On the gene pages, cross-references were found to NCBI Gene, NCBI Protein, UniProt, STRING, and to Ensembl
  7. Can the information be downloaded?
    • Yes, for sequences (each strain): genomic, ORF DNA, Intergenic, and amino acid data is available. Gene annotations are also downloadable.
  8. In what file formats?
    • Data related to the sequences are downloadable as text in FASTA format
    • Gene annotations can be downloaded in TAB, CSV, GBK, EMBL, GFF3, and GTF formats
  9. Evaluate the “user-friendliness” of the database.
    • Database appears very clean and every section is clear and easily found. Searches are simple to conduct and information is available regarding each type of search (with its purpose and result). Site is user-friendly with a clear interface; all searches were easy to conduct and assistance was given in the form of information on several pages.
  10. Is the Web site well-organized?
    • It is very well-organized with clearly defined sections (no valuable section is relatively hidden, everything is very clear and easy to find
  11. Does it have a help section or tutorial?
    • No explicit tutorial/help section but there is an FAQ page and several pages that explain the various searches. Many pages/links are accompanied by descriptive text which every section very clear (notes are often provided on pages that explain certain details or clarify others)
  12. Run a sample query. Do the results make sense?
    • A sample query was run via the simple search using the J2315 strain, return was set to similar match, search term utilized was "membrane". The results made sense and were organized in a good manner.
    • Sample search results MOD BL 11.15.png
  13. What is the format (regular expression) of the main type of gene ID for this species (the "ordered locus name" ID)? (for example, for Vibrio cholerae it was VC#### or VC_####).
    • Several different common formats: BCAL####, BCAM####, BCAS####, pBCA### (A's after the numbers were also encountered)
    • Also noticed: Formats of all types with lower-case r's before the numbers (e.g. BCASr0269A); tags with lower-case r's, with are for tRNA related sequences, often also have letters at the end to signify the type of tRNA (alphabetical order, for related tRNA genes, starting from A)

Journal Club Presentation

Genome Paper Presentation Week 11


Weekly Group Assignments Shared Group Journals Project Links Team Members

Brandon Litvak
BIOL 367, Fall 2015

Weekly Assignments Individual Journal Pages Shared Journal Pages