Ksherbina Week 11

From LMU BioDB 2013
Jump to: navigation, search
Katrina Sherbina
Class Page    User Page
Assignment Description Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 15
Class Journal Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9
Individual Journal Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11
Other Week 5: Database Wiki
Final Project Team H(oo)KD Project Page Journal Club Presentation Project Individual Journal

Paper for Journal Club:

Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282: 754-759. doi: 10.1126/science.282.5389.754

Contents

10 Biological Terms from Paper

  • Biovar - A strain differentiated from other strains by biochemical or other non-serological means.
Abedon, S. T. (1998) Supplemental Lecture (98/04/14 update). <http://www.mansfield.ohio-state.edu/~sabedon/biol3010.htm>. Accessed 10 November 2013.
  • Clusters of orthologous groups (COGs) - A group of at least three proteins that have the same evolutionary origin.
Koonin, E.V. (2013) The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins. from Complete Genomes. In J. McEntyre & J. Ostell (Eds). The NCBI Handbook. Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK21090/. Accessed 11 November 2013.
  • Contig - Copies of pieces of DNA that represent the overlapping regions of a chromosome.
U.S. Department of Energy Human Genome Project (2013) The Human Genome Project Information Archive 1990-2003. <http://web.ornl.gov/sci/techresources/Human_Genome/glossary.shtml>. Accessed 10 November 2013.
  • Enoyl-acyl carrier protein reductase - The last enzyme in the fatty acid elongation cycle.
Massengo-Tiasse, R.P. & Cronan, J. E. (2009) Diversity in enoyl-acyl carrier protein reductases. Cell. Mol. Life Sci. 66: 1507–1517.
  • Entner-Doudoroff pathway - A pathway that converts glucose to pyruvate and glyceraldehyde-3 phosphate by producing and then dehydrating 6-phosphogluconate.
EMBL-EBI (2013) GO:0009255 Entner-Doudoroff pathway. <http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0009255>. Accessed 10 November 2013.
  • Hydrodynamic shearing - A technique that fragments DNA molecules by forcing them through a small orifice or the bore of a small-diameter tube at high velocity.
Joneja, A. & Huang, X. (2009) Supplementary Material For: A device for automated hydrodynamic shearing of genomic DNA. BioTechniques 46: 553–556.
  • Paralog - Genes that are homologous and are a result of gene duplication.
Paralogy. (n.d.) Biology Online. <http://www.biology-online.org/dictionary/Paralogy>. Accessed 11 November 2013.
  • Pathogenicity island - Genetic element within an organism's genome that is responsible for the ability of the organism to cause disease.
Pathogenicity island. (n.d.) MedicineNet.com. <http://www.medterms.com/script/main/art.asp?articlekey=20705>. Accessed 11 November 2013.
  • Serology - A type of laboratory medicine that analyzes blood serum for signs of infection by looking at antigen-antibody interactions in vivo.
Serology. (n.d.). Medical-Dictionary. <http://medical-dictionary.thefreedictionary.com/serology>. Accessed 10 November 2013.
  • Type II secretion system - A secretory pathway in Proteobacteria correlated to a bacterium's pathogenesis that is responsible for the secretion of toxins and hydrolytic enzymes.
Sandkvist, M. (2001) Biology of type II secretion. Molecular Biology 40: 271–283.

Outline of the Article

Importance of sequencing the Chlamydia trachomatis genome

  • C. trachomatis causes several diseases in human beings, including trachoma, which leads to blindness.
  • C. trachomatis infections in humans may increase the risk of HIV infection.
  • At the time the article was released, little was known about the two developmental stages of C. trachomatis (i.e. the elementary body and the reticulate body) as well as the bacterial pathogen's physiology and genetics.

Method used to sequence the genome of Chlamydia trachomatis strain D/UW-3/CX

  • Chlamydial elementary bodies (EBs) were isolated from infected host cells using sonication on ice and then centrifuged to create pellets that were suspended in 5 mL of Hank's balanced salt solution.
  • After two washes through centrifugation, extracellular DNA was removed by incubating the cells in 5µg/mL of DNase and RNase at 37°C for 30 min.
  • The cells were purified using 30%, 40%, 45%, and 50% Renografin and then suspended in HBSS or SPG and stored at -70°C.
  • After hydrodynamic shearing, the fragmented DNA was cloned into M13 phage.
  • 28,458 sequencing reactions were performed using dye-labeled primers with the ABI Catalyst 800 Turbo robot followed by 4,688 dye-terminator reactions.
  • 23 contigs from 4 to 164 kbp were observed using the Phrap and Phred software.
  • Physical gaps in the genome were closed by either sequencing a PCR product that spans the gap or by using custom oligonucleotide primers.
  • The plasmid that was sequenced from strain D/UW-3/CX had two less codons that previously sequenced chlamydial plasmids.
  • Two methods were used to validate the assembled sequence: the predicted restriction digest map of the sequence as compared to the physical genome map of NotI and SgrAI and the restriction map and fragment sizes were analyzed after digesting oligonucleotides spaced 15kbp apart with BamHI .
  • The results showed that the strain contains a 1,042,519 base pair chromosome and a 7493 bp plasmid.

Methods used to annotate the genome

  • The programs PEPDATA and FRAMES were used to translate the chlamydial genome.
  • BLASTP was used to find open reading frames in the C. trachomatis genome.
  • RNAse P RNA, tRNAs, and rRNAs were identified using BLASTN.
  • BLASTP-(-mp4-option) and CLUSTALW were used to estimate the location of start codons by looking for conserved sequences.
  • GeneMark and Glimmer were used to evaluate the predicted start codons.
  • When more than one start codon was identified for a given protein sequence, the first of the codons to appear in the genome was set as the start codon.
  • PSI-BLAST was used to detect homologs to chlamydial protein sequences.
  • EMOTIF was used to analyze sequence motifs.
  • The COGNITOR program was used to compare chlamydial protein sequences to Clusters of Orthologous Groups.
  • The above analysis resulted in the identification of 894 protein-coding genes and functional assignment for 604 of these genes.
  • 35 chlamydial protein-coding genes were found to be similar to known genes in other bacteria.
  • There are 256 chlamydial proteins that are paralogs belong to 58 families of similar genes.

Analysis of gene expression pathways

  • Enzymes were discovered in C. trachomatis orthologous to enzymes in other bacteria that are involved in DNA replication, repair, transcription, and translation.
  • The bacterium seems to have DNA repair and recombination systems, as is evident by the presence of two DNA helicases of the Swi2/Snf2 family.
  • Genes were identified in the bacterial genome that code for aminoacyl-transfer RNA synthetases, two identical ribosomal RNA operons, RNA modification enzymes, translation factors, and a complete set of ribosomal proteins.
  • The chlymadial genome codes for two alternative sigma factors, σ28 and σ54, which are likely involved in initiating differentiation from one developmental stage to another.
  • The presence of an σ-factor regulatory system in C. trachomatis suggests that his system regulates different stages within the developmental cycle, monitors ATP status, or participates in the heat-shock response similar to that in Bacillus subtilis.

Analysis of metabolic pathways and macromolecule synthesis

  • Genes were missing for both the Entner-Doudoroff pathway and the TCA cycle.
  • As a result of the presence of a complete glycogen synthesis and degradation system, the authors suggest that glycogen is a the primary carbon source and may be involved in developmental stage differentiation.
  • Genes were found that are necessary for aerobic respiration and that protect the bacterium from the toxic oxygen intermediates created through respiration.
  • In contrast to the prevailing belief that the bacterium only obtains ATP from its host, genes were found in the chlamydial genome that are involved in ATP synthesis.
  • While few genes were found that are involved in amino acid biosynthesis, many genes were discovered that encode enzymes involved in fatty acid and phospholipid biosynthesis.
  • Contrary to past belief that C. trachomatis lacks peptidoglycan, the authors of the paper found that the bacterium's genome codes for the entire peptidoglycan biosynthesis pathway. This finding suggests that the bacterium utilizes peptidoglycan differently than other bacteria.
  • The C. trachomatis genome lacks genes that are involved in purine and pyrimidine nucleotide synthesis but does contain genes to synthesize and convert deoxyribonucleotides.

Membrane, Intracellular Vacuole, and Pathogenesis

  • The presence of numerous amino acid and peptide transporters as well as porins and membrane transport components suggest that C. trachomatis obtains the nutrients it needs from its host through membrane transport systems.
  • Figure 1 discusses the identification of nine paralogous genes encoding Pmp outer membrane proteins and proteases located in two clusters (one of the genes was not found in either cluster).
  • Part A of the figure shows the orientation of the genes within each cluster indicating the direction of the coding strand.
  • Part B of the figure displays the predicted molecular mass and pI for each the nine Pmp proteins. While the proteins differed in their sequence overall, two types of tetrameric amino acid repeat motifs were found in each of the proteins toward the N-terminus: FXXN and GGAI.
  • Part C of the figure shows that certain motifs are conserved between the the proteases encoded in the C. trachomatis genome and adenovirus proteases.
  • The C. trachomatis genome contains genes orthologous to type III secretion systems that determine that virulence of Gram-negative bacterial pathogens. It is proposed that these genes are involved in modifying host cell processes to facilitate invasion and remodeling the inclusion membrane.
  • Two genes homologous to those found in Chlamydia psittaci were found in C. trachomatis that, along with a third protein, form an operon that may be involved in the inclusion (i.e. intracellular vacuole) membrane remodeling and transport.
  • Six paralogous chlamydial proteins were found in C. trachomatis that belong to the HKD superfamily and may be involved in the modification of host cell phospholipids. However, Figure 2 shows that these proteins show little sequence similarity to other proteins previously identified in the HKD superfamily.

Phylogeny and Horizontal Gene Transfer

  • Within the inclusion, C. trachomatis present today likely do not exchange genetic information with the host often because there are no genes in the chlamydial genome that are homologous to genes that are transposons or are involved in transformation or the acquirement of foreign DNA.
  • Despite the aforementioned, Table 1 in the paper provides evidence that a majority of the genes in C. trachomatis are a result of horizontal gene transfer with bacterial ancestors or eukaryotic hosts.
  • Figure 3 traces the origins of the chlamydial enoyl-acyl carrier protein reductase through various organisms supporting the theory that some chlamydial genes originated from horizontal gene transfer with eukaryotes.
  • The mechanisms of chromatin condensation-decondensation may be related to that in eukaryotes as evidenced by the discovery of SET and SWIB domains in the chlamydial genome previously only found in eukaryotes.

EnsemblBacteria Database for Chlamydia trachomatis A/HAR-13

  • NOTE: The Model Organism Database was found for the strain A/HAR-13 rather than the strain in the genome paper because the former strain is the one for which we will be analyzing the microarray data.
  1. This database contains the full genome sequence of the bacterium including a map of the different genes with the option to search for particular genes. In addition, the database contains information on cDNA, non-coding RNA, and protein sequences. This is electronically curated, meta database.
  2. The European Molecular Biology Laboratory maintains the database.
  3. The database is funded by the European Molecular Biology Laboratory, United Kingdom Biotechnology and Biosciences Research Council, The Bill and Melinda Gates Foundation, and The Wellcome Trust.
  4. In the EMBL Terms in Conditions, it is stated that the data is free to use for any individual for any purpose as long as proper citations are used.
  5. The current database is Release 20. Looking at the notes for previous releases, it seems like the database is updated every 2-4 months.
  6. Within a gene entry, there is a link to the UniProtKB entry for the protein encoded by that gene.
  7. Information regarding the entire genome or specific genes can be exported in a variety of file formats including FASTA, CSV, and tab separated values. In addition, information regarding cDNAs, ncRNAs, and proteins can be exported as a FASTA file.
  8. In the navigation toolbar at the top of the page, there is a link called "Help & Documentation" that takes you to a page with numerous tutorials and other information regarding the database. On the homepage, I clicked on the term "rpIE" beneath the search field to run a sample query. It was frustrating to be taken to a page that stated that no results were found. Then, I tried to query for the gene ID "CTA_0498" and was taken to the same page. However, when I clicked on the link 'CTA_0498*' to search for wild cards, the database was able to find the gene with exact same ID as I initially serached. To make the database more user-friendly, it would be helpful to fix whatever may be causing this bug. Aside from this problem, I would say that the database organizes all of the genomic and proteomic information well.
  9. The format of the main type of ID for this species is CTA_####.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox