Mpetredi Week 8

From LMU BioDB 2013
Jump to: navigation, search

Contents

Mitchell Petredis

[[Team Name]]

Relevant Documents

References

Lab Journal

Part 1

  • Note: all of these events occurred after the introductory lab on October 10, 2013. Any instructions prior to the ones below can be found in the sections above "Sanity Check: Number of genes significantly changed" at the OpenWetWare link above, which was followed word for word without any issues, with the exception of the equation "=TDIST(ABS(R2),degrees of freedom,2" under the section "Perform statistical analysis on the ratios" which was properly changed to "=TDIST(ABS(Q2),degrees of freedom,2" in order to fix an error message in Excel.
  1. Open Excel Document and in Microsoft Excel 2010 for PC, look at the top menu bar and navigate to Data/Filter. Small arrows should now appear to the right of every cell on the top row.
  2. Find the "Pvalue" column, click on the first cell labeled "Pvalue", and click on the small downward facing arrow to the right of the cell text. A list of options will open in a small window next to the cell.
    • Clicking on the "Number Filters" option will open a window adjacent to the option, with another list of options to choose from. Select "Custom Filter..."
    • A new dialog box appears with two scroll-down boxes and two circle checks; by default, the first scroll-down box is labeled "equals" with nothing appearing in the second scroll-down box, and the circle "And" is marked instead of "Or". Click on the "equals" box and change it to "is less than", and to the right of that type in "0.05" in the text box. #*Clicking "OK" will then only display Pvalues less than 0.05 and also changes the downward facing arrow icon to the right of "Pvalues" to a smaller downward facing arrow and a filter icon, indicating that the filter is active. Notice the information at the bottom left of the Excel program that tells us how many values match that of less than 0.05 (in this example, 948 of 5221 records fit the criteria we set).
      • Repeat the steps in #2 for Pvalues < 0.01, 0.001, and 0.0001.
        • Results for other Pvalues
          • <0.01; 235/5221
          • <0.001; 24/5221
          • <0.0001; 2/5221
  3. "When we use a p value cut-off of p < 0.05, what we are saying is that you would have seen a gene expression change that deviates this far from zero less than 5% of the time."
    • Replace 0.05 with 0.01, 0.001, 0.0001 to get better certainty of data results.
  4. Change Pvalue filter back to 0.05, and add a filter to AvgLogFC_all to display results greater than 0.
    • Results display 352/5221 records
  5. Keep Pvalue filter set to 0.05, but change AvgLogFC_all filter to display results less than 0.
    • Results display 596/5221 records
  6. Keep Pvalue filter set to 0.05, but change AvgLogFC_all filter to display results greater than 0.25.
    • Results display 339/5221 records
  7. Keep Pvalue filter set to 0.05, but change AvgLogFC_all filter to display results less than -0.25
    • Results display 579/5221 records
  8. Merrel used the Statistical Analysis for Microarrays (SAM) program and organized her data by determining which genes were deferentially regulated (237 genes), induced (44 genes), and repressed (193). I could not find any mention of Pvalues or other similar aspects of data throughout the paper, and therefore could not quite compare Merrel's results to my results.

Part 2

  1. Launch GenMAPP from Programs
  2. Download and extract appropriate gene database file to the Downloads folder. For this assignment, I will use this one: [Vc-Std_External_201001022.zip]
  3. From the extracted zip file, place the Vc-Std_External_201001022.gdb file into C:\GenMAPP 2 Data\Gene Databases
  4. From GenMAPP, look at the menu bar and navigate to Data/Choose Gene Database, which will prompt to locate the file from step 3. Select it and click open.
  5. Now that the database is loaded, go to Data/Expression Dataset Manager, and select the text file located at mpetredi Week 8 that I made last Thursday (download it if you don't have it already). Open the text file.
  6. Another prompt will appear asking to check off boxes that have text data in a column. There's no text in any columns, so ignore this and continue.
    • After this step, the program ran and I got an error message from Raw Data File Conversion, saying that "121 errors were detected in your raw data." My partner Gabriel obtained 772 errors with an older version of the database, probably because the newer database is more refined and accurate than the other one.
    • A new text file was created from my original text file, but now with the file extension EX.txt. Open it in Excel and go to Data/Filter.
    • Scroll all the way to the right to find an errors column, and click on the filter icon to the right of the first cell in the errors column. Only have "Gene not found in OrderedLocusNames" checked off.
    • The following errors were discovered in these Gene IDs, going down the list from AutoFilter:
      • VC2209
      • VC2209
      • VCA1031
      • VCA0745
      • VC1476
      • VCA0534
      • VCA0276
      • VCA0276
      • VC1660
      • VC2209
      • VC2209
      • VCA0449
      • VCA0745
      • VC1759
      • VCA0534
      • VCA0276
      • VCA0276
      • VCA0280
      • VC1018
      • VC1307
      • VC1513
      • VC2049
      • VC2049
      • VC2338
      • VC2338
      • VCA0596
      • VC0500
      • VC0518
      • VC0521
      • VC0793
      • VC1620
      • VC1620
      • VC1620
      • VC1625
      • VC1625
      • VC1018
      • VC1307
      • VC1513
      • VC1513
      • VC1807
      • VC2209
      • VC2049
      • VC2338
      • VC2338
      • VCA0596
      • VCA0358
      • VC0499
      • VC0501
      • VC0518
      • VC0521
      • VC0793
      • VC0793
      • VC0818
      • VC1620
      • VC1620
      • VC1620
      • VC1625
      • VC1625
      • VCA0133
      • VC0284
      • VC0284
      • VC0583
      • VC0607
      • VC1660
      • VC2209
      • VC2209
      • VCA0454
      • VCA1031
      • VCA1031
      • VCA1036
      • VCA1073
      • VCA1073
      • VCA0232
      • VCA0232
      • VC0663
      • VC2209
      • VCA1089
      • VCA1104
      • VC0284
      • VC0284
      • VC0583
      • VC0607
      • VC1395
      • VC1660
      • VC2209
      • VC2209
      • VC2209
      • VCA1031
      • VCA1036
      • VCA1073
      • VCA1073
      • VCA0232
      • VC1476
      • VCA1089
      • VCA1104
      • VC1807
      • VC2109
      • VCA0596
      • VCA0887
      • VC1620
      • VC1620
      • VC1620
      • VC2437
      • VC2700
      • VC2700
      • VCA0674
      • VCA0692
      • VCA0938
      • VC1018
      • VC2109
      • VCA0887
      • VC0254
      • VC0518
      • VC1620
      • VC1620
      • VC2437
      • VC2700
      • VC2700
      • VC2700
      • VCA0674
      • VCA0692
  7. In GenMAPP, create a new color set by going to Data/Expression Dataset Manager/Expression Datasets/New, and supply a name for it (mine is mpetredi color set 2013-10-15). The file will be created with the extension ".gex" at the end of it.
  8. Supply a name in the "Color Sets" box (I chose pathogenic vs lab)and select "AvgLogFC_all" from the drop-down box for "Gene Value"
  9. Click on "New" under the "Gene Value" drop-down menu to start building functions for the MAPP. Fill in the information as seen in this picture: Mpetredi 20131017 Expression Dataset Manager.PNG
  10. Save your progress by going to Expression Datasets/Save in the menu bar.

MAPPFinder Procedure

  1. Open MAPPFinder from Programs or within GenMAPP by going to Tools/MAPPFinder.
  2. Click on Calculate New Results, and MAPPFinder will automatically detect the color set GEX file created previously. Hit OK.
  3. Select a criteria to filter by (for class purposes, I selected "Decreased"
  4. Enable "Gene Ontology" and p values by clicking on the appropriate checkboxes.
  5. Click "Browse" to save the file, and then click "RunMAPPFinder" to continue the operation. Note that the procedure takes a few minutes to complete and may appear unresponsive. Mpetredi 20131017 MAPPFinder Initial Step.PNG
    • After MAPPFinder finishes, you'll get a window that looks like this: Mpetredi 20131017 MAPPFinder Browser.PNG
    • Anything highlighted in yellow indicates the gene ontology terms that have at least 3 genes measured and a p value < 0.05; any p value < 0.05 is considered highly significant
  6. Find out the most significant gene ontology terms by clicking on "Show Ranked List" in the menu bar of MAPPFinder.
    • My top 10 were as follows:
      • hexose catabolic process
      • glucose catabolic process
      • glycolysis
      • monosaccharide catabolic process
      • cytoplasm
      • alcohol catabolic process
      • cellular carbohydrate process
      • glucose metabolic process
      • protein folding
      • hexose metabolic process
    • Compared to Gabriel, my gene ontology results were... because...
  7. Click on "Collapse Tree" towards the upper right corner to simplify the list menu, and search for the following gene IDs by typing in these queries in the "Search for a specific Gene ID" text box and select "OrderedLocusNames" in the drop-down box to the right of the text box: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583. Any relevant results will now be highlighted in blue.
    • VC0028
      • branched chain family amino acid biosynthetic process
      • cellular amino acid biosynthetic process
      • metabolic process
      • metal ion binding
      • iron-sulfur cluster binding
      • 4 iron, 5 sulfur cluster binding
      • catalytic activity
      • lyase activity
      • dihydroxy-acid dehydratase activity
    • VC0941
      • glycine metabolic process
      • L-serine metabolic process
      • one-carbon metabolic process
      • cytoplasm
      • pyridoxal phosphate binding
      • catalytic activity
      • transferase activity
      • glycine hydromethyltransferase activity
    • VC0869
      • glutamine metabolic process
      • purine nucleotide biosynthetic process
      • 'de novo' IMP biosynthetic process
      • cytoplasm
      • nucleotide binding
      • ATP binding
      • catalytic activity
      • ligase activity
      • phosphoribosylformylglycinamidine synthase activity
    • VC0051
      • purine nucleotide biosynthetic process
      • 'de novo' IMP biosynthetic process
      • nucleotide binding
      • ATP binding
      • catalytic activity
      • lyase activity
      • carboxy-lyase activity
      • phosphoribosylaminoimidazole carboxylase activity
    • VC0647
      • mRNA catabolic process
      • RNA processing
      • cytoplasm
      • mitochondrion
      • RNA binding
      • 3'-5'-exoribonuclease activity
      • transferase activity
      • polyribonucleotide nucleotidyltransferase activity
    • VC0468
      • glutathione biosynthetic process
      • metal ion binding
      • nucleotide binding
      • ATP binding
      • catalytic activity
      • ligase activity
      • glutathione synthase activity
    • VC2350
      • deoxyribonucleotide catabolic process
      • metabolic process
      • cytoplasm
      • catalytic activity
      • lyase activity
      • deoxyribose-phosphate aldolase activity
    • VCA0583
      • transport
      • outer membrane-bounded periplasmic space
      • transporter activity
  8. Pick a GO term from one of the genes (I picked catalytic activity from VC2350)
    • This will open GenMAPP and give you a graphical interface of all genes associated with the GO term (catayltic function in my case) along with a color legend to indicate special genes (pink = increase, green = decrease). Click on any gene to find out more about it.
      • I clicked on CDD_VIBCH (Uniprot ID: Q9KSK5). Its function is described in uniprot as follows: "this enzyme scavenge exogenous and endogenous cytidine and 2’-deoxycytidine for UMP synthesis" Mpetredi 20131017 GenMAPP GO results.PNG
  9. Open you CriterionX-GO.txt file created when you made the MAPP in Excel. This will display the information in tabular form as to how MAPPFinder calculated the results.
    • Compared to Gabriel, my results were... while his results were...
  10. Perform an Autofilter with the row that starts with "GOID"
    • Apply these specific filters
      • Z Score: greater than 2
      • PermuteP: less than 0.05
      • Number Changed: greater than or equal to 4 or 5 AND less than 100 (I used 4)
      • percent Changed: greater than or equal to 25-50%
  • Comparing the Excel results with the top 10 MAPPFinder results, the only matches shared between the two were hexose catabolic process, glucose catabolic process, glycolysis, monosaccharide catabolic process, alcohol catabolic process, and cellular carbohydrate catabolic process.
    • Other Filtered GOs with definitions from geneontology.org
      • organelle organization: A process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of an organelle within a cell. An organelle is an organized structure of distinctive morphology and function. Includes the nucleus, mitochondria, plastids, vacuoles, vesicles, ribosomes and the cytoskeleton. Excludes the plasma membrane.
      • translation elongation factor activity: Functions in chain elongation during polypeptide synthesis at the ribosome.
      • translational elongation: The successive addition of amino acid residues to a nascent polypeptide chain during protein biosynthesis.
      • chromosome organization: A process that is carried out at the cellular level that results in the assembly, arrangement of constituent parts, or disassembly of chromosomes, structures composed of a very long molecule of DNA and associated proteins that carries hereditary information.
      • DNA packaging: Any process in which DNA and associated proteins are formed into a compact, orderly structure.
      • chromosome condensation: The progressive compaction of dispersed interphase chromatin into threadlike chromosomes prior to mitotic or meiotic nuclear division, or during apoptosis, in eukaryotic cells.
      • nucleotide catabolic process: The chemical reactions and pathways resulting in the breakdown of nucleotides, any nucleoside that is esterified with (ortho)phosphate or an oligophosphate at any hydroxyl group on the glycose moiety; may be mono-, di- or triphosphate; this definition includes cyclic-nucleotides (nucleoside cyclic phosphates).
      • protein targeting: The process of targeting specific proteins to particular membrane-bounded subcellular organelles. Usually requires an organelle specific protein sequence motif.
      • nucleobase, nucleoside and nucleotide catabolic process: The chemical reactions and pathways resulting in the breakdown of nucleobases, nucleosides, nucleotides and nucleic acids.
      • nucleobase, nucleoside, nucleotide and nucleic acid catabolic process: The chemical reactions and pathways resulting in the breakdown of nucleobases, nucleosides, nucleotides and nucleic acids.
      • unfolded protein binding: Interacting selectively and non-covalently with an unfolded protein.
  • I'd compare my results to Merrell's results if I could, but the embedded link within the paper no longer works, displaying a "Not Found" error.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox