Kmeilak Week 8

Overview of Microarray Data Analysis

Electronic Lab Notebook

10/10/13

Downloaded Merrill Compiled Raw Data file from Sample Microarray Analysis for Vibrio cholerae page
Saved as Merrell_Compiled_Raw_Data_Vibrio_KM_20131010.xls
Opened file in excel; created second worksheet and named it scaled_centered
Copied all data from compiled_raw_data worksheet into scaled_centered worksheet
Inserted two rows underneath header row (ID, A1, etc)
Calculated average and standard deviation for each column {i.e. =AVERAGE(B4:B5224); =STDEV(B4:B5224)} by typing function into appropriate labeled row and copying and pasting formulas across all columns.
Calculated the scaled centered values by subtracting the average value for each column from the value in each and dividing by the standard deviation {i.e. (=B4-B$2)/B$3}
Inserted a new worksheet and named it "statistics".
Copied and pasted all of scaled_centered worksheet into statistics worksheet (note: did paste special values only).
Added three new columns: "Avg_LogFC_A", "Avg_LogFC_B", "Avg_LogFC_C"
Computed average log fold change {i.e. =AVERAGE(B2:E2)} for all patients
Computed average of averages of three patients in new column titled "Avg_LogFC_all"
Created a new column titled "Tstat" in order to run a T test using the following equation {=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))}. The T test was run in order to see which, if any, of the scaled and centered average log ratios are significantly different from 0 (no change).
Created a new column titled "Pvalue". Calculated P value using the following equation {=TDIST(ABS(R2),degrees of freedom,2)}
Created a new worksheet titled "forGENMAPP".
Copied and pasted everything in "statistics" worksheet into "forGENMAPP" worksheet (note: did paste special values only).
Selected all fold changes and formatted cells under number tab to 2 decimal places.
Columns R and S were set to 4 decimal places in the same manner
Columns N through S were cut and inserted next to column B
Deleted rows "Average" and "StDev"
Added "SystemCode" column to the right of "ID" column and put "N" as value for all rows.
Saved as Tab-delimited Text file.

10/15/13

Launched GenMAPP
Selected Data > Choose Gene Database and selected Vc-Std_External_20090622.gdb Gene Database (2009). (note: this had to be downloaded from [XMLPipeDB Download Page] and then extracted)
Selected the Data menu then the Expression Dataset Manager which opened the Expression Dataset Manager window.
Selected "new dataset", then selected the Tab-delimited Text file from previous day.
The Data Type Specification window appeared. Did not select any columns as containing character data.
Allowed the Expression Dataset Manager to convert the data. 772 errors were recorded by the completion of the conversion. I resulted in far more errors than my partner (she had 121 errors). This is most likely due to my use of an older database and her use of a newer database. Because her database was newer and more updated, it contained more of the known genes for V. cholera than mine, and therefore she resulted in fewer errors.
Created a Color Set for the Expression Database (pink = increased expression; green = decreased expression; gray = no change; white = no data)
Used Avg_LogFC_all as the gene value.
Clicked the new button to activate the Criteria Builder.
Created and named two criteria by entering the name of the criteria and choosing a color. The two criteria created were "increased" colored pink and "decreased" colored green.
Selected increasing results which had an AvgLogFC change > 0.25 and a p-value less than 0.05 {AvgLogFC change > 0.25 and a p-value less than 0.05}
Selected decreasing results which had an AvgLogFC change < -0.25 and a p-value less than 0.05 {([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)}
Selected Save from Expression Dataset menu, saved as .gex file
Launched MAPPFinder
Chose "calculate new results"
Chose "find file" and selected the saved .gex file from previous steps
Selected "increase" and checked boxes for "Gene Ontology" and "p-value"
Clicked "browse" and saved file
Clicked "run MAPPFinder"
Clicked "show ranked list" (seen below). These results are different from my partner's who used the more recently updated version of the database most likely because of new information about old processes as well as the incorporation of new processes which may demonstrate a higher level of significance than some of those found in the 2009 version of the database.

Top 10 Gene Ontology terms

macromolecule metabolic process
cellular macromolecule metabolic process
marcomolecule biosynthesis process
biopolymer metabolic process
cell projection organization
branched chain family amino acid metabolic process
amino acid metabolic process
cellular amino acid and derivative metabolic process
cellular nitrogen compound metabolic process
cellular amine metabolic process

Clicked "collapse the tree"
Did a gene id search for VC0028, found nothing
Did a gene id search for VC0941, found nothing
Did a gene id search for VC0869, found nothing
Did a gene id search for VC0051, found nothing
Did a gene id search for VC0647, and selected ordered locus names

Associated GO Terms

3'-5' exoribonuclease activity
transferase activity
nucleotidyltransferase activity
polyribonucleotide nucleotidyltransferase activity

Clicked on 3'-5' exoribonuclease activity which opened GenMAPP
3'-5' exoribonuclease activity green, which means it was significantly decreased
VC0647 gene functions in mRNA degradation through its 3'-5' exonuclease activity, and is known to hydrolyze single-stranded polyribonucleotides processively in the 3'- to 5'-direction
Clicked on transferase activity which opened GenMAPP
transferase activity genes mostly decreased (green)
Clicked on nucleotidyltransferase activity which opened GenMAPP
nucleotidyltransferase activity genes either did not meet criteria or were decreased
Clicked on polyribonucleotide nucleotidyltransferase activity which opened GenMAPP
polyribonucleotide nucleotidyltransferase activity gene decreased expression
polyribonucleotide nucleotidyltransferase activity "Involved in mRNA degradation. Hydrolyzes single-stranded polyribonucleotides processively in the 3'- to 5'-direction" (GenMAPP)

Did a gene id search for VC0468, found nothing
Did a gene id search for VC2350, found nothing
Did a gene id search for VCA0583, and selected ordered locus names

Associated GO Terms

transport
outer-membrane bound periplasmic space
transporter activity

Clicked on transport, which opened GenMAPP
mix of increased and decreased genes
Clicked on outer-membrane bound periplasmic space, which opened GenMAPP
mix of increased and decresed genes, more increased than decreased
Clicked on transporter activity, which opened GenMAPP
mix of increased and decresed genes, more increased than decreased

backpage Media:PNP_VIBCH_Backpage.txt
Saved Gene Ontology file as .txt file
Opened Gene Ontology file in Excel

Results same as partner's

Number of probes that met the [AvgLogFC_all] > 0.25 AND [Pvalue] < 0.05 criteria.
Number of probes in the dataset

All other results differed from partner. These numbers are linked to UniProt IDs or GO terms, and as my partner was using a more updated version of the database, it makes sense that her numbers are greater than mine as UniProt incorporated more proteins and more GO terms were discovered and linked to genes in the time between the publishing dates of each database.

Filtered data in Excel file by Z score and PermuteP data. Z score was filtered to be greater than 2 and PermuteP was filtered to be less than 0.05 using the filter function in excel.
Filtered number changed greater than or equal to 4 and less than 100
Saved changes in Excel file

Closely related GO terms

cell projection organization is closely related to flagellum organization
amino acid metabolic process is closely related to amino acid biosynthesis process
cellular nitrogen compound metabolic process is closely related to cellular amine metabolic process, amine biosynthesis process, and amino acid biosynthesis process
cellular amine metabolic process is closely related to amine biosynthesis process, amino acid biosynthesis process, and amino acid metabolic process
flagellum organization was related to cellular developmental process, cell morphogenesis, cellular structure morphogenesis
magnesium ion binding was related to hydrolase activity
establishment of localization was related to cellular amino acid and derivative metabolic process, amino acid metabolic process, amino acid biosynthesis process
anatomical structure development was related to anatomical structure morphogenesis, cell structure morphogenesis, cell morphogenesis, flagellum organization
anatomical structure morphogenesis was related to anatomical structure development, cell structure morphogenesis, cell morphogenesis, flagellum organization
cellular developmental process was related to cell structure morphogenesis, cell morphogenesis, flagellum organization
cell morphogenesis was related to cell structure morphogenesis, flagellum organization
cell structure morphogenesis was closely related to cell morphogenesis, cell developmental process, flagellum organization
organic acid transport was related to amino acid transport, amine transport, carboxylic acid transport
carboxylic acid transport was related to organic acid transport, amino acid transport, amine transport
hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds was related to hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds in linear amides
hydrolase activity was related to hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds in linear amides, hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds
nitrogen compound biosynthetic process was related to amine biosynthetic process, amino acid biosynthetic process
amine biosynthetic process was related to nitrogen compound biosynthetic process, amino acid biosynthetic process
amino acid transport was related to amine transport, transport
amine transport was related to amino acid transport, transport
hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides was related to hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, hydrolase activity

Final Paragraph

Due to the vast number of genes for many of these processes, it is impossible to accurately say exactly what distinguishes the pathogenic strain of V. cholerae from the non-pathogenic strain in most cases. However, it is known from looking at the decreased expression of the 3'-5' exonuclease activity and of the polyribonucleotide nucleotidyltransferase activity genes that the pathogenic strain has significantly decreased 3'-5' exonuclease activity and decreased mRNA degradation. The decreased mRNA degradation and proofreading in the post-transcriptional stages most likely increases the number of mutations seen in the pathogenic strain, which along with an increased rate of translation from the decreased mRNA degradation can significantly change the biochemical nature of the bacteria and its processes. This alone may be enough to confer the pathogenic nature of V. cholerae seen in the patient.

Kmeilak (talk) 21:20, 12 September 2013 (PDT)

Files

Kmeilak Week 8

Contents

Overview of Microarray Data Analysis

Electronic Lab Notebook

10/10/13

10/15/13

Final Paragraph

Files

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox