Vkuehn Week 8

Electronic Lab Notebook

10.10.2013

Data accessed from [Microarray Analysis Vibrio Cholera page]

Downloaded the Raw Data for Vibrio colera in the excel format
Working with the data:
1. Create new worksheet and copy over all the data
2. Find STDEV and AVG for each one as new rows on top
3. Scale and center by copying header next to it (Value-AVG)/STDEV
Inserted new worksheet called Statistics
1. Added 3 new columns: Avg_Log_FC_A, Avg_Log_FC_B, Avg_Log_FC_C
2. Compute average log full change for each patients ex:(=AVG(B2:E2))
3. Computed average of averages of 3 patients in the column:Acg_LogFC_all
4. T-test: =AVERAGE(N2:P2)/(STDEV(N2:P2)/(SQRT(# of replicates))
5. Determine if it is significantly different than zero
6. Create new column titled P-Value and Calculated the P-value lower than 0.05 (Got 948 values that were significant).
Created new worksheet titled GenMAPP
1. copy over the data from Statistics worksheet (using values only)
2. Select all fold changes and format cells under number tab to 2 decimal places
3. Do the same for columns R and S
4. Delete AVG and STDEV rows
5. Insert "SystemCode" column next to the ID column and input "N" for all rows
6. Save as tab-delimited text file

Data accessed from uploaded txt file and excel file:

Merrell Compiled Raw Data VK.xls

Merrell Compiled Raw Data VK.txt

10.15.2013

Expression dataset manager converted data resulted in 'Media: Merrell_Compiled_Raw_Data_Vibrio_VK_2013.10.15.gex

GenMAPP Expression Dataset Procedure

Generated lines that resulted in error. During the conversion the data for 2010 resulted in 121 errors
Generated an exception file Media:Merrell_Compiled_Raw_Data_Vibrio_VK_2013.10.15.EX.txt
Used autofilter to determine errors. The errors were all gene not found. This was because it was not updated recently so it is ok to proceed. Media:Merrell_Compiled_Raw_Data_Vibrio_VK_2013.10.15.EX.xls
My partner had a lot more errors than I did. This is because I was using the newer version while he was using the older dataset. Mine had been updated more recently which is why there were less errors because more of the data matched
Customize the new Expression Dataset by creating new Color Sets which contain the instructions to GenMAPP for displaying data on MAPPs.
Input the "Avg_LogFC_all" for the Vibrio dataset created in the excel file

Create criteria builder for "increased":
[Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05
Uploaded color sets

MAPPFinder Procedure

Launch MAPPFinder and make sure that the database for the correct species is loaded
Click "Calculate new results" choose the dataset just created (choose Merrell_Compiled_Raw_Data_Vibrio_VK_2013.10.15.gex)
Choose color set and criteria, select gene ontology and p value (see image to the right)
Click "run mappfiner" Media:MerrellVK.mapp
Gene Ontology window will open. All of the Gene Ontology terms that have at least 3 genes measured and a p value of less than 0.05 will be highlighted yellow. A term with a p value less than 0.05 is considered a "significant" result. Browse through the tree to see your results. Click on "ranked list"

Top 10 Gene Ontology terms (image to the left)
My terms in comparison with my partners are different, this is because he had different gene dataset that was less updated and there were differences in expression because of this.
Find the Gene Ontology term(s) with which a particular gene is associated. Search for matched gene from Merrell et al. (2002)VC0647
Below are the GO terms associated with the gene VC0647:
1. mRNA catabolic process
2. RNA processing
3. cytoplasm
4. mitochondrion
5. RNA binding
6. 3'-5' exoribonuclease activity
7. ransferase activity
  nucleotidyltransferase activity
One GO term was chosen: Clicked on "mRNA catabolic process" this resulted in two genes listed on a GenMAPP. The one with the gene ID: PNP_VIBCH showed a decease in Pathogenic vs Lab according to my color chart. This means it significantly increased.
Clicked on the gene ID, creating a backpage File:PNP VIBCH Backpage.txt
From here the link to find a description of the gene was followed. According to AmiGO [[1]] This gene ID corresponds to 3'-5'-exoribonuclease activity. It plays a role in the Catalysis of the sequential cleavage of mononucleotides from a free 3' terminus of an RNA molecule.
Uploaded the Criterion.GO in excel for the relevant Increased genes. Here is the unfiltered text file version: File:Increased VK-Criterion0-GO-2013-10-17-Criterion0-GO.txt
In excel The information was filtered to display approximately 20 terms. To view the exact filters applied, view the results below.
File:Increased VK-Criterion0-GOFiltered.xls
When these results were compared to those of my partners we found that the following categories were the same:
1. Number of probes that met the [Avg_LogFC_All]>0.25 AND [PValue]<0.05 criteria
2. Number of probes in the dataset
The other results were all different. This makes sense because my partner used an older version of the Vibrio Gene Database. The similar terms were the ones linked to Uniprot IDs and Go terms, but the other information was more updated in my dataset. There were larger results in my procedure because there were more links to the genes made between the times the data was uploaded.
Interpreting the GO terms. Focusing on relatedness
The main reoccurring term I found was flagellum assembly and Pilus assembly. These were found under cellular component organization, biogenisis at cellular level, and cellular process under multiple sub branches. Although there were other differences in gene expression as well, I thought that these were particular significant.By looking into the gene exprssion differences between the pathogenic vs lab I found that many of them had increased activity in the Pathogenic strain. This led me to conclude that the structure of the flagella and pili is probably altered in the pathogenic strain in a way that makes it harmful. It is possible the the change in pili structure makes it so that it is more likely to transfer mutagenic plasmid DNA to other cells causing them to become pathogenic. The flagella structure could also have changed in this strain and played a role in its virulence. Another significant change in gene expression in the two was that the pathogenic strain had decreased mRNA degredation, which could have have alerted the number of mutations, contributing to the pathogenicity.

Vkuehn Week 8

Contents

Electronic Lab Notebook

10.10.2013

10.15.2013

GenMAPP Expression Dataset Procedure

MAPPFinder Procedure

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox