Talk:Streptococcus pneumoniae

From LMU BioDB 2013
Jump to: navigation, search

Contents

Week 13 Feedback

GenMAPP User

  • It was difficult for me to find Kevin's Week 13 electronic lab notebook. Please be consistent in adding your links to your team's template and using your team's categories.
  • The compiled raw data file you posted currently contains as many ID columns as there are data columns. Once you are convinced that all the columns are pasted in correct alignment with each other, you can remove all of the ID columns except for the first one on the left.
  • You have put the data in reverse chronological order from left to right. Does this make sense? What order should the data be in to be consistent with the experiment they did?
  • The headers for your data columns should also say what the numerical values are in addition to the correspondence with the sample. Are these data fold changes (ratios) or log2 fold changes (ratios)? Whichever one it is, you need to add that to the column header.
  • You need to figure out which chips are independent biological replicates and which are technical replicates of those samples. Page 3, right column, paragraph 3 of your paper has the description of the replicates you should expect to see. What experiment did they perform in the paper? What is the experimental design?
  • Once you have figured out the experimental design and have the sample values in an appropriate order you will need to:
    • Take the log2 of the ratios (if they aren't already logged).
      • If you do this operation, replace all cells that contain text error messages with a single space character.
    • For the microarrays that have been dye swapped (Cy3 over Cy 5 instead of Cy5 over Cy3), you will need to multiply the values by -1 so that the dye orientation of all chips is Cy5 over Cy3, experiment over control.
  • At this point, your spreadsheet will be in a format that is ready to follow the Vibrio microarray data analysis protocol.
    • You would then proceed to normalize the data
    • Average the technical replicates
    • Average the biolgoical replicates
    • Perform the Tstat and pval calculations
    • Perform the Bonferroni correction
  • Please let me know if you have questions. It would be ideal if you had this done by Tuesday so that you can work with the Coder and QA to import your data into GenMAPP using the Gene Database they will have ready by then.

Kdahlquist (talk) 15:34, 20 November 2013 (PST)

Week 12 Feedback

Copy of e-mail sent to team on 11/15/13, 9:13 AM. Kdahlquist (talk) 09:15, 15 November 2013 (PST)


I reviewed the issue that Dr. Dionisio said came up in class yesterday for the Coder: the UniProt XML file you were using had three taxon IDs. Upon investigation, I saw that these IDs corresponded to:

  • 1313: Streptococcus pneumoniae "parent" strain of the two substrains that follow
  • 171101: (strain ATCC BAA-255/R6)
  • 373153: Serotype 2 (strain D39/NCTC 7466)

I then went to the UniProt Complete Proteomes page and re-downloaded the UniProt XML from strain R6 and found that it exclusively had the 171101 taxon ID in it. I downloaded from this page: http://www.uniprot.org/uniprot/?query=organism%3a171101+keyword%3a1185&format=*&compress=yes


So it appears that either you downloaded a different file somehow or that UniProt has changed the file since you downloaded it.


I was unable to find on your team's pages documentation as to the version of the UniProt files you are using. I also did not see any pages that contained your Gene Database Testing Reports for the three databases you have exported. The Testing Report prompts you to record the version of the files you are using for the export. See Gene Database Testing Report Sample


Completing a Testing Report should be done simultaneously with each export to document the results of that particular export.


So, before you perform any additional exports, you need to verify the version information for each of the files (UniProt XML, GOA, GO OBO-XML). If you cannot do so, then you need to re-download those files and record the version information and use those as you go forward for exporting new databases.


Also, we had discussed focusing on the TIGR4 strain first since the microarray data contains the most IDs from that species, not the R6. I understand that your long-term goal is to create a combined database for all three strains, but a first goal needs to be to have a complete and validated gene database for the TIGR4 strain.


I have corresponded with the Dr. Nikhil Kumar, the contact for the microarray data. He has written back that he will get the column headers to me today. If I don't hear from him, I'll send a reminder so that we will have the information ready for class on Tuesday. Kevin will need to have completed figuring out the correspondence between samples and data by then for us to move forward. I didn't see any additional notes on this on his Week 12 electronic notebook page.


One final note, your team needs to be consistent with using your categories on all pages related to the project so that we can easily find your pages (some pages have "Team ATK", some have "Streptococcus pneumoniae", some have both, some have neither). This can be accomplished by using your team's template on all team subpages, including individual electronic notebook pages.


Week 10 Feedback

  • The Sanchez et al. paper is approved for your project. Kdahlquist (talk) 09:06, 5 November 2013 (PST)
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox