The Class Whoopers

From LMU BioDB 2015
Jump to: navigation, search

Team Information & Links

GenMAPP Analysis of Bordetella pertussis Microarray Data

Gene Database Project Links
Overview Deliverables Reference Format Guilds Project Manager GenMAPP User Quality Assurance Coder
Teams Heavy Metal HaterZ The Class Whoopers GÉNialOMICS Oregon Trail Survivors

Journal Entries

Class Whoopers Individual Journal Entries
Brandon Klein Week 11 Week 12 Week 14 Week 15
Lena Olufson Week 11 Week 12 Week 14 Week 15
Mahrad Saeedi Week 11 Week 12 Week 14 Week 15
Team Entries Week 11 Week 12 Week 14 Week 15

Group Members

Team Weekly Assignments

  • Week 10 Creation of page and combined annotated bibliography (midnight 11/10)
  • Week 11 (midnight 11/17)
  • Week 12 (midnight 11/24)
  • Week 14 (midnight 12/8)
  • Week 15 (midnight 12/15)

Deliverables

Bordetella Pertussis GenMAPP Analysis Deliverables

Presentation Download Links

File Naming Protocol

All file types generated in this project will receive their own unique names composed of two key parts:

  1. Description
    • This will contain a brief, file-specific description of what content the file contains.
    • Descriptions for different versions of the same file will remain consistent.
  2. Identifier Tag
    • This tag will be listed as a suffix in the following form: "_cwYYYYMMDD"
      • cw- team name abbreviation
      • YYYYMMDD- date the file was created in the form year/month/day

Additionally, the following file naming best practices will be observed when creating descriptions for new files:

  • Our species will be referred to consistently as "bpertussis".
  • Spaces will be written as underscores.
  • No capitalization will be used.
  • No special characters will be used.
  • If sequential numbering systems are used, leading zeros will be included for clarity.

Weekly Updates

Week 15

  • Goals
    • Assignment due date: Midnight Tuesday, December 15
    • Coder: Adjust the GenMAPP Builder code to account for the one EnsemblBacteria reference ID that was missing in our last export; conduct a new import-export cycle to create the (hopefully) final .gdb file; begin characterizing the exported .gdb file in a Gene Database Testing Report; customize the GenMAPP Builder TallyEngine to account for any changes made.
    • Quality Assurance: Reconfigure TallyEngine Configuration with Coder in order to accommodate missing gene IDs that were not exported the previous time. Test the revised database by running TallyEngine count, XmlpipeDB Match, and PostgreSQL. Locate missing gene IDs if any.
    • GenMAPP User: Import data into GenMAPP, create ColorSets, and run MAPPFinder. Document and take notes on test runs with GenMAPP. Use the EX.txt file to help the Coder/Quality Assurance team members to validate the .gdb. Create a .mapp file showing one pathway that is changed in your data.
  • Progress
    • Brandon (Coder and Project Manager): I began this week by customizing the GenMAPP Builder TallyEngine to report ORF counts for Bordetella pertussis (see Bklein7_Week_15). After this, I worked with Mahrad to identify the 1 gene ID that was missing in the .gdb file File:Bpertussis-std cw20151203.zip. I found that this gene was a necessary EnsemblBacteria reference ID and edited the GenMAPP Builder code with the help of Dr. Dionisio to include this ID in our next export (see Bklein7_Week_15). I conducted a complete import-export cycle on 12/10/2015 to create the .gdb file File:Bpertussis-std cw20151210.zip. I then characterized this export, authoring sections 1-5.2 of its testing report: Gene_Database_Testing_Report-_cw20151210. During our Sunday meeting, I worked with Lena to use this new gene database in GenMAPP. During our Monday meeting, I worked on our PowerPoint presentation: File:Bpertussis findings powerpoint.pdf.
    • Mahrad (Quality Assurance): I worked closely with the coder Brandon in order to re-customize TallyEngine to include the 11 missing ORF genes. The specific customizations and following results are detailed in my Week 15 Journal Entry Having located the missing gene IDs, Brandon went into Eclipse to code for them to be included in the export. Following this, we tested out revised gene database to make sure these missing IDs were actually exported. We ran TallyEngine count, which gave a total of 3446 gene IDs, demonstrating that the IDs were now exported. Then we ran XMLpipeDB Match, and this provided a total of 3447 gene IDs exported, one additional. Finally, we ran PostgreSQL and this gave a total of 3446 gene IDs. We came to find that gene "BP3167A" was in the original XML file, but not accounted for in the exported file. With further investigation we concluded that "BP3167A" is a reference ID from EnsemblBacteria and corresponds to the same ID as "BP3167.1" which was exported.
    • Lena (GenMAPP User): I was able to import the data into GenMAPP and then I created color sets in order to run MAPPFinder. I obtained the ontology results and did some background research on what exactly the top results related to from the microarray article. I then used Kegg pathways for my specific organism to create two separate MAPPS, one for ribosome and one for the nitrogen cycle.
  • Meetings!
    • This week, our group used class work sessions to coordinate our work:
      • Tuesday, December 8, 2:40 - 4:00
      • Thursday, December 10, 2:40 - 4:00
    • In addition, we scheduled meetings outside of class to work on the final PowerPoint Presentation and deliverables for our project:
      • Sunday, December 13, 7:00 PM - 1:00 AM
      • Monday, December 14, 2:00 PM - 11:00 PM

Week 14

  • Goals
    • Assignment due date: Midnight Tuesday, December 8
    • Coder: Create the custom species profile for Bordetella pertussis, run an export using the customized version of GenMAPP Builder, add further customizations to the custom species profile as appear necessary, and run a second export using the further customized version of GenMAPP Builder.
    • Quality Assurance: Identify gene IDs that are missing in the first custom export, work with the coder to classify these IDs, configure the Tally Engine, and complete a gene database testing report for the second custom export.
    • GenMAPP User: Complete the statistical analysis of the data, format the data for import into GenMAPP, and coordinate with the coder/QA to import this data into GenMAPP using the custom gene database.
  • Progress & Reflection
    • Brandon (Coder and Project Manager): This week, I focused on creating and customizing the species profile for Bordetella pertussis in GenMAPP Builder, the details of which can be found in my Week 14 Journal Entry. I documented the first export I conducted using a custom Bordetella pertussis species profile here: Gene Database Testing Report- cw20151201. I demonstrated that the custom species information implemented in this export worked as intended, but Mahrad and I identified 11 ORF genes that failed to export. I updated the Bordetella pertussis species profile to account for these ORF genes and conducted a new export, detailed here: Gene Database Testing Report- cw20151203. Mahrad analyzed the exported .gdb file. In addition to this, I kept tabs on my fellow group members to keep us on track to accomplish our long-term project goals in a timely manner.
      • What worked?
        • Thus far, we have exported two versions of the Bordetella pertussis gene database that have been created using modified versions of GenMAPP Builder. Both custom exports worked as intended. The first one simply created the Bordtella pertussis custom class. However, we identified 11 ORF genes conforming to the unique patterns "BP####A" and "BP####B" that warranted inclusion into the gene database. Exporting ORF gene IDs is a common issue other custom classes appear to have had, so implementing this fix was very straightforward in practice.
      • What didn't work?
        • Although all of the changes we implemented to GenMAPP Builder worked as intended, we have yet to produce a comprehensive gene database for Bordtella pertussis. The most recent export included 11 ORF genes that we thought encompassed the only IDs with the patterns "BP####A" and "BP####B". However, we found that there is one more relevant gene ID in the UniProt XML file that conforms to the patterns "BP####A" and was not imported. We will have to find a way to export this ID as well.
      • What will I do next to fix what didn't work?
        • Next, I will confer with Drs. Dahlquist and Dionisio to come up with a strategy for isolating the one missing EnsemblBacteria reference ID and exporting it into our final gene database. After this is done, I will characterize the database for completeness and work on further modifying the TallyEngine. Hopefully, these steps will generate a complete gene database so that we can transition to working on our final deliverables.
      • Bklein7 (talk) 13:39, 7 December 2015 (PST)
    • Mahrad (Quality Assurance): This week as Q and A I worked directly with Brandon to do the initial data exports. The work can be summarized here: Week 14 Journal Entry. Next we meticulously characterized regular expression patterns to detect discrepancies in extracting the data from the original samples. In the following week I will work to do the tally configuration to customize it according to our specific species. Now I will focus on the tally configuration which may take some time and coding assistance from Brandon. Once the Tally Engine has been configured to our specific species, Lena can proceed with with GenMAPP processing. Week 14 reflection:
  1. What worked?
    • We were able to use the various counting systems to detect the total number of gene IDs that were imported into our gdb file. Through our investigation, Brandon and I came to find four specific missing IDs.
  2. What didn't work?
    • There were four ID inconsistencies detected to be missing in our gdb file. We were able to target the specific IDs that were missing and now the code will have to be changed to incorporate these missing IDs in our database.
  3. What will I do next to fix what didn't work?
    • Work more closely with Brandon to ensure the Tally Engine is configured properly and that we can properly import and obtain confirmation that all the gene IDs were imported successfully.
    • Lena (GenMAPP User): This week, I made progress on performing the statistical analysis of the data to prepare it for GenMAPP. I was able to post my progress for each of the class working sessions on my Week 14 Journal Entry as I updated the excel data sheets after each session. Dr. Dahlquist helped me figure out a problem with the original raw data that was causing the values to be very skewed. I then sent her my updated data sheet and she was able to use a program to separate the duplicates of the chips. After she sent me back the data with the sorted values, I performed the statistical analysis on the data, the most updated version of the file can be found on my Week 14 journal entry linked previously.
  1. What worked?
    • I was able to perform the correct statistical alterations to the data in order to prepare it for submission to Dr. Dahlquist to run it through her program to split the data since there are duplicates of the genes. I had little trouble at all while working in excel and following the protocol from the Vibrio cholerae exercise, and I was able to adapt the protocol to fit my own data. Since there were a lot of columns with the dye swaps, I was careful to stay organized and name my columns with appropriate and easily identifiable names so that I would not get confused or mixed up. It was important for me to be meticulous as I was the only GenMAPP user for my group and so I did not have another person to check my work with.
  2. What didn't work?
    • This week I faced a challenge when I finished my calculations in excel because my values for the averages and standard dev. (and thus many other columns) were much too large. After consulting and looking over the data with Dr. Dahlquist, we were able to see that some of the gene ID values were labeled as 100000 or -100000, thus throwing the values way off. Upon detecting this problem, I had to go back into the original raw data I downloaded from the microarray site and check to see if this was an error included in their data or if it was a result of my own work. I found that the large numbers were included in the raw data, so with the assistance of Dr. Dahlquist again I deleted these large numbers out of my data, and it proved to solve the problem.
  3. What will I do next to fix what didn't work?
    • As I described above, I was able to figure out that the large 100000 and -100000 numbers were from the original raw data I downloaded, so they were not an error on my personal calculations applied in excel. I went into my data and replaced all of the large numbers with a blank space, and this proved to solve my problem as now my values were more logical and fit the numerical values that were desired.

Lenaolufson (talk) 19:54, 7 December 2015 (PST)

  • Meetings!
    • This week, our group used class work sessions to coordinate our work:
      • Tuesday, December 1, 2:40 - 4:00
      • Thursday, December 3, 2:40 - 4:00
      • Monday, December 7, 10:30 - 12 am

Week 12

  • Goals
    • Assignment due date: Midnight Tuesday, November 24
    • Coder: Set up a GitHub repository clone of the XMLPipeDB project on your development device, the development rig, and the initial as-is build for gmbuilder. Complete an import-export cycle in association with QA.
    • Quality Assurance: Complete an import-export cycle for the 1st Bordetella pertussis gene database. Complete a Gene Database Testing Report for this export.
    • GenMAPP Users: Create a Master Raw Data file that contains the IDs and columns of data required for further analysis. Consult with Dr. Dahlquist on how to process the data (normalization, statistics).
  • Progress
    • Brandon (Quality Assurance and Interim Coder): This week, I focused on completing an import-export cycle for our first Bordetella pertussis gene database- File:Bpertussis-std cw20151119.zip. With my QA hat, I imported the appropriate data, exported the gene database, and discussed the gene database creation & counting protocol here- Gene Database Testing Report- cw20151119. With my Coder hat, I followed the instructions on the Coder Guild Page to setup a GitHub repository clone of the XMLPipdeDB project on my personal laptop, the Eclipse developer rig, and the initial as-is build for gmbuilder. The electronic lab notebook for my QA and Coder work is present on my Week 12 Page. Finally, I wrote a PowerPoint presentation on our genome sequencing paper, which is linked to on my Week 12 Page as well.
    • Lena (GenMAPP): I worked on downloading the correct data sample files from the provided files on the microarray paper page. The files were unzipped and prepared to be imported into excel. In excel, the data was manipulated to form a spreadsheet that had all of the gene IDs from the different samples with their appropriate columns to be analyzed. The corrections and further manipulations of the data are to be continued to be done in the coming week in order to create the desired dataset to be exported from excel. File:Bpertussis CompiledRawData MS2015.xlsx
    • Mahrad (GenMAPP--> Quality Assurance): This week I downloaded the six data sample files provided by the microarray paper. The process is detailed in my Week 12 Journal Entry. Files were unzipped, imported into excel, and manipulated to form a single spreadsheet containing all gene IDs from the different samples. Each sample was placed in its respective column to be further analyzed and manipulated in the upcoming week. Following this, I assumed the position of quality assurance to accommodate the absence of Nicole.
    • Nicole was absent this week. Bklein7 (talk) 18:52, 23 November 2015 (PST)
  • Meetings!
    • Monday, November 23: Seaver 120- Brandon and Lena met to work on the GenMAPP testing of the gene IDs from our database.

Week 11

  • Goals
    • For all:
      • Outline your assigned paper on your user page and include a list of 10 defined terms from the paper.
    • Nicole & Brandon
      • Prepare Journal Club presentation on the designated genome sequencing article
      • Slides Due: by midnight, Tuesday, November 17
      • Presentation Date: Tuesday, November 24
    • Lena & Mahrad
      • Prepare Journal Club presentation on the designated microarray paper
      • Slides Due: by midnight, Tuesday, November 17
      • Presentation Date: Tuesday, November 17
  • Progress
    • Nicole Anguiano (Coder): Nicole was absent this week for a medical emergency and is (hopefully) getting some much deserved rest. Bklein7 (talk) 23:14, 16 November 2015 (PST)
    • Brandon Klein (QA): This week I made several edits to the Class Whoopers Team Page in accordance with the Week 11 assignment. These edits included the following: revising the Class Whoopers template, reorganizing the Team Page structure, commenting out unneeded articles in the annotated bibliography, creating the new bibliography entry as requested by Dr. Dahlquist, and writing the naming conventions for our files. Additionally, I outlined our genome sequencing paper for "Bordetella pertussis" and assessed the GeneDB MOD on my Week 11 Individual Journal Entry. A preliminary draft of the genome sequencing paper that I will likely be presenting solo was uploaded there. Finally, I kept tabs on group members as the interim Project Manager. Bklein7 (talk) 23:14, 16 November 2015 (PST)
    • Lena Olufson (GennMAPP): This week Mahrad and I met up and analyzed the microarray paper together. We split up the powerpoint into two halves; I did the introduction/significance of the study as well as the methods performed. Mahrad and I created our presentation together and worked through a google doc to edit it simultaneously as we discussed out loud. We also created a flow chart together that demonstrated the experimental design, thus we have the same ones included in our individual assignments. We made sure to check in with the temporary project manager and keep him updated on our progress. Lenaolufson (talk) 23:24, 16 November 2015 (PST)
    • Mahrad Saeedi (GennMAPP): This week Lena and I worked on analyzing the microarray paper and creating an outline. The outline and detailed process involved with the experiment can be found in my Week 11 Journal Entry. We each defined 10 terms separately based upon words we didn't recognize in the article. We then proceeded to producing the powerpoint presentation for journal club.

Msaeedi23 (talk) 23:46, 16 November 2015 (PST)

  • Meetings!
    • 11/15- Lena & Mahrad met to work on outlining article and answering questions
    • 11/16- Lena & Mahrad met to prepare powerpoint presentation for journal club

Week 10

  • Goals
    • For all:
      • Create an annotated bibliography including one genome sequencing paper and two microarray experiments for Bordetella pertussis
      • Create/update team page & compile group annotated bibliography
      • Assignment due date: Midnight Tuesday, November 10
  • Progress
    • All group members created annotated bibliographies and compiled them on the newly created group page.
  • Meetings!
    • Monday, November 9, 8pm-9pm, Seaver 120

Annotated Bibliography

Genome Sequencing Paper

Neither of these papers is the first to report the genome sequence of B. pertussis. The paper that you will want to use is this one. I found it by looking at the introduction and references of the Zhang et. al (2011) paper. For your Week 11 assignment, please remove your annotated bibliography entries for the two papers below and create one for this new paper by Parkhill et al. (2003). You will use the Parkhill paper for your project. Kdahlquist (talk) 09:54, 10 November 2015 (PST)

  • Parkhill, J., Sebaihia, M., Preston, A., Murphy, L. D., et al. (2003). Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nature genetics, 35(1), 32-40. doi:10.1038/ng1227
  • PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed/12910271
  • PubMed Central: Not available on PubMed Central.
  • Publisher Full Text (HTML): http://www.nature.com/ng/journal/v35/n1/full/ng1227.html
  • Publisher Full Text (PDF): http://www.nature.com/ng/journal/v35/n1/pdf/ng1227.pdf
  • Copyright: ©2003 Nature Publishing Group (information found on PDF version of article). This article is not Open Access, but it is freely available 6 months after publication.
  • Publisher: Nature Publishing Group (for-profit).
  • Availability: In print and online.
  • Did LMU pay a fee for this article: Yes, LMU pays a subscription fee for access to the journal Nature Genetics.

Microarray Paper

This paper is suitable for your project. Kdahlquist (talk) 10:04, 10 November 2015 (PST)

Hoo, R., Lam, J.H., Huot, L., Pant, A., Li, R., Hot, D., & Alonso, S. (2014). Evidence for a Role of the Polysaccharide Capsule Transport Proteins in Pertussis Pathogenesis. PLoS ONE, 9(12):e115243. doi: 10.1371/journal.pone.0115243