Troque Week 11

From LMU BioDB 2015
Jump to: navigation, search

User Page        Bio Databases Main Page       


QA/Coder Meetings

  • Thursday, 11/13/2015 -- 7pm

Journal Club

Genome Paper

  • Citation: Jin, Q., Yuan, Z., Xu, J., Wang, Y., Shen, Y., Lu, W., … Yu, J. (2002). Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Research, 30(20), 4432–4441.
  • Link: HERE

Preparation for Journal Club on Your Species

Your team will split into two halves for journal club presentations that will take place in class on Tuesday, November 17 and Tuesday, November 24. The Coder and Quality Assurance person will present the genome paper for your species and the GenMAPP Users will present the microarray paper for your species. You will decide within your team who will present on which day. Please edit the schedule on the Main Page to show who is presenting on which day.

In preparation for your journal club presentation, you will each individually complete the following assignment on your individual journal page.

  1. Make a list of at least 10 biological terms for which you did not know the definitions when you first read the article. Define each of the terms. You can use the glossary in any molecular biology, cell biology, or genetics text book as a source for definitions, or you can use one of many available online biological dictionaries. Cite your sources for the definitions by providing the proper citation (for a book) or the URL to the page with the definition for online sources. Each definition must have it's own URL citation.
  2. Write an outline of the article. The length should be a minimum of the equivalent of 2 pages of standard 8 1/2 by 11 inch paper (you can use the "Print Preview" option in your browser to see the length). Your outline can be in any form you choose, but you should utilize the wiki syntax of headers and either numbered or bulleted lists to create it. The text of the outline does not have to be complete sentences, but it should answer the questions listed below and have enough information so that others can follow it. However, your outline should be in YOUR OWN WORDS, not copied straight from the article.
    • What is the importance or significance of this work (i.e., your species)?
      • Shigellosis is an inflammatory disease caused by the S. flexneri bacteria where the afflicted suffers from internal bleeding in their intestinal system. Because it is one of the leading causes of death in young children in countries such as China, discovering its pathogenicity is of great importance. Gathering more information about the genome of this species could lead to new preventative methods to combat shigellosis, or even contribute to the creation of a vaccine. The paper comments that understanding the regulation between the DNA and the virulence of this organism requires the accessibility of its entire genome sequence. Additionally, the study revealed the volatility of the Shigella chromosome when its genome is compared to its non-pathogenic relative E. coli K12 and O157.
    • What were the methods used in the study?
      • The steps that the research group took mainly involved automating the process of genome sequencing, namely, base-calling, identifying open reading frames, and comparing genomes of the Shigella flexneri strain under observation. This particular strain, Sf301, was originally isolated from a patient with an acute case of shigellosis in the Changping District of Beijing in 1984. The culture used was grown in tryptic soy broth agar containing 0.01% Congo red dye at a constant 37 degrees Celsius. Shotgun sequencing, which involves randomly breaking up DNA sequences into small pieces and then reassembling them by looking at overlapping regions, initially involved the employment of a highly accurate base-calling software, called phred, which significantly reduced human interaction with the DNA sequences, thus also reducing the errors that would have resulted from human involvement. After reaching 318 overlapping regions in the specie’s genome, a program called consed was then used for sequence finishing. Identifying open reading frames involved the Glimmer 2.0 software, but some manual inspection was still employed for overlapping ORFs. The databases BLASTP and COGs were used to identify families of related proteins. Genomic comparison with E. coli K12 was then executed using the GenomeComp software. The resulting genome sequence from these processes is now accessible under accession numbers in GenBank.
    • Briefly state the result shown in each of the figures and tables.
      • Entire genome is composed of 4 607 203 bp chromosome and a 221 618 bp virulence plasmid, designated pCP301.
      • The chromosome shares a common ‘backbone’ sequence ∼3.9 Mb with those of E.coli K12 (MG1655) (10) and O157 (EDL933) (11), which is essentially collinear. However, the backbone sequence is interrupted by numerous segments of K12‐, O157‐ and Shigella‐specific DNA, designated ‘K‐islands’ (KIs), ‘O‐islands’ (OIs) and ‘S‐islands’ (SIs), respectively (Fig. 1, circle 1). The co‐linearity is also broken by numerous inversions and translocations compared with the E.coli sequences
      • Circular genome map has been divided into 10 parts ranging from circle 1, the outermost circle, and 10, the innermost circle. The map is made in order to compare the chromosomes of S. flexneri and E. coli. The collinear strands in the map show that those genes shared a collinear backbone. The colors were designated in order to differentiate the strands by their functions; for example, salmon color is used for those that are used for translation, ribosomal structure, and biogenesis.
      • The second figure represents the translocations and inversions within the genome. Certain areas of importance have been labeled, such as the replication origin and terminus of MG1655. The arrows show the K12 islands whose deletions are crucial for the organism’s virulence.
      • Table 1 of the paper shows a more detailed comparison between the S. flexneri and E. coli strains. The table includes more information on the IS elements, total number of ORFs, percentage of total genome and protein coding genes, etc.
      • Figure 3 shows the amino acid sequence alignment of N-terminal halves of Ipah proteins identified in Sf301. The consensus line displayed above the aligned sequences depicts the identical amino acids as asterisks, with conserved residues shown as dots.
      • Table 2 shows more information about the IS elements identified in genomes of the 301 strain, E. coli MG1655 and EDL933, the virulence plasmid, and pWR501, from serotype 5a. Figure 4 compares the region of genes that are involved in LPS biogenesis. Table 3 describes the pseudogenes with known functions identified in Sf301 genome.
    • How do the results of this study compare to the results of previous studies (See Discussion).
      • The results obtained from comparing the genomes of S. flexneri and E. coli support a previous study that concluded that the two species are, in fact, related to each other and may even belong to the same genus. In addition, even though S. flexneri is pathogenic, it is more closely related to the non-pathogenic strain of E. coli, K12, than the pathogenic one, O157. One explanation that the paper proposed as to the connection between the species is that the characteristics developed by S. flexneri were acquired from a massive virulence plasmid in order to help with its survivability. Previous studies already suggested the relationship between Shigella and E. coli and the paper discusses that this might warrant the renaming of Shigella as a member of the genus of E. coli. This study, which suggests that S. flexneri evolved from multiple E. coli strains, can lead to a better understanding of bacterial evolution and pathogenesis.
    • For the genome paper (Coder and QA only): in addition to the journal article, please find and review the Model Organism Database (MOD) for your species similarly to what you did to review your assigned database for the NAR assignment. In particular, make sure to answer the following:
      • In order to find our database, we first had to search for our model organism from UniProt. I typed in the search bar at the top the phrase "shigella flexneri 2a 301" since this is the organism we are observing. Once the results showed up, I then copied one of the genes into the clipboard, googled "shigella flexneri genome database" and pasted the gene name into some of the database that were yielded in the Google search. Some of the viable databases that I found can be located here and here.
      • In the end, after discussing with Dr. Dahlquist, she suggested that we use this database instead.
      1. What types of data can be found in the database (sequence, structures, annotations, etc.); is it a primary or “meta” database; is it curated electronically, manually [in-house], or manually [community])?
        • The ShiBASE database that we decided to use contained genome maps, gene comparison mechanisms, analysis tools (such as the ability to conduct a BLAST search), and links to related databases.
      2. What individual or organization maintains the database?
        • The database is created and maintained (from what we can tell from the database itself) by State Key Laboratory for Moleclular Virology and Genetic Engineering, Institue of Pathogen Biology, CAMS
      3. What is their funding source(s)?
        • The database is supported by the State Key Basic Research Program and High Technology Project from the Ministry of Science and Technology of China
      4. Is there a license agreement or any restrictions on access to the database?
        • There are no restrictions to access the database (open source)
      5. How often is the database updated?
        • The database was last updated on June 11, 2014. From what we can extrapolate from this, it appears to be updated after at least 1.5 years.
      6. Are there links to other databases?
        • Yes. The database provides 2 links to other databases (namely GenomeComp and VFDB database which are still owned by the MGC).
      7. Can the information be downloaded?
        • Yes.
        • In what file formats?
          • We could only download the files in .fas formats.
      8. Evaluate the “user-friendliness” of the database.
        • At first glance, it doesn't seem as though there are ways to do a keyword search in this database. However, clicking on the "Quick Guide" would allow the user to type in a gene ID for any of the 4 strains of Shigella. Additionally, the clip arts in the homepage are a little distracting.
        • Is the Web site well-organized?
          • Yes.
        • Does it have a help section or tutorial?
          • The site does not have a tutorial. However, it has a "User Help" section which contains the color coding and legend explanations in ShiBASE.
        • Run a sample query. Do the results make sense?
          • It is relatively easy to search for a certain specific gene in this database by looking up the genes in UniProt first and then copying and pasting it into the search bars of this site. The database allows for boolean queries when the "Text Query" link on the left-hand nav bar is clicked. The database also provides a BLAST search option; in this type of query, the search has to be in a specific format, namely FASTA. Note: Certain gene ID's like SF4436, when entered in the "Quick Search" Text Query section of the database, does not show any results. For example, we were somehow redirected to the page for SF4436 when we clicked on one of the genes in the genome map for Sf301, but when we typed SF4436 into the quick search field, the page would say that there are no results. If we typed that ID, however, in the Quick Guide search option, we would get the result. This phenomenon confused us slightly since we couldn't determine what exactly was going on.
      9. What is the format (regular expression) of the main type of gene ID for this species (the "ordered locus name" ID)? (for example, for Vibrio cholerae it was VC#### or VC_####).
        • The format of the gene ID for this species is SF####.

Link to Journal Club Presentation

Assignment Links

Weekly Assignments

Individual Journal Entries

Shared Journal Entries