Dwilliams Week 8 assignment

Digital Lab Notebook

Part 1

Normalize the log ratios for the set of slides in the experiment

Downloaded the Stanford Microarray Database.
Created new excel sheet and performed scale_centered.
Created a row for Average and row for Standard Deviation.
- Formula for calculating the average: =AVERAGE(B4:B5224)
- Formula for calculating the STd.Dev: =STDEV(B4:B5224)
Applied both functions across all columns.
After computing average and standard deviation of the log ratios for each chip, performed actual do scaling and centering based on these values.
Created new columns for A1_scaled_centered, A2_scaled_centered, etc.
- Formula for scaling:=(B4-B$2)/B$3
- Applied the scaling and centering equation for each of the columns of data.

Perform statistical analysis on the ratios

Inserted a new worksheet and named it "statistics".
- Went back to the "scaling_centering" worksheet and copy the first column ("ID").
- Paste the data into the first column of your new "statistics" worksheet.
- Went back to the "scaling_centering" worksheet and copy Column C ("A1_scaled_centered).
- Went to new worksheet and clicked on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
- Went to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
- Compute the average log fold change for the replicates for each patient by typing the equation:=AVERAGE(B2:E2).
Computed the average of the averages.
- Type the header "Avg_LogFC_all" into the first cell in the next empty column.
Insert a new column next to the "Avg_LogFC_all" column computed in the previous step. Label the column "Tstat".
- Fromula for Tstat test: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))
Create Pvalue column.
- Formula for Pvalue: =TDIST(ABS(R2),degrees of freedom,2).

Using Data to Create GenMapp Files

Insert a new worksheet and name it "forGenMAPP". Go back to the "statistics" worksheet and Select All and Copy. Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP. Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK. Select Columns R and S. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK. Select Columns N through S and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns. Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1. Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N". Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new #*.txt file is now ready for import into GenMAPP. But before we do that, we want to know a few things about our data as shown in the next section. Upload both the .xls and .txt files that you have just created to your journal page in the class wiki. Make sure that your file name is distinct from your other classmates so that nobody overwrites anyone else's file.

Part 2

Map Onto Biological Pathways (GenMAPP & MAPPFinder)

Used Vc-Std_External_20090622.gdb Gene Database that was created by the Fall 2008 Biological Databases class.
- Downloaded from: http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020090622/Vc-Std_External_20090622.zip/download

Mapfinder Procedure Answers

List the top 10 Gene Ontology terms in your individual journal entry:
- Localization
- Macromolecule Metabolic Process
- Cell Projection Organization
- Amino Acid Metabolic Process
- Cellular Nitrogen and Compund Metabolic Process
- Cellular Amine Metabolic Process
- Cellular Amino Acid and Derivative Metabolic Process
- Cellular Macromolecule Metabolic Process
- Transporter Activity
The lists are different. This may be due to the fact that I am using results from 2008 whereas his results are from 2010.
List the GO terms associated with each of those genes in your individual journal. (Note: they might not all be found.) Are they the same as your buddy who is using a different Gene Database? Why or why not?
- VC0028: No results found.
- VC0941: No results found.
- VC0869: No results found.
- VC0051: No results found
- VC0647: mRNA Catabolic Process, RNA Processing, cytoplasm, RNA Binding, 3'-5' exoribonuclease activity, transferase activity, nucleotidyltransferase activity, polyribonucleotide nucleotidyltransferase activity.
- VC0468: No results found.
- VC2350: No results found.
- VCA0583: Transport, outer membrane-bounded periplasmic space.
- The results between my partner and I were different, a lot of the gene loci for which I did not get results, he did.

I looked at the GO term outer membrane-bounded periplasmic space, then I looked at the gene DSBA_VIBCH, which showed decreased expression.
Function: Involved in disulfide-bond formation. Required for the functional maturation of secreted virulence factors. Acts by transferring its disulfide bond to other proteins.

Comparing the Numbers Between My Partner and I

339 probes met the [AvgLogFC_all] > 0.25 AND [Pvalue] < 0.05 criteria.
- Partner had 338
291 probes meeting the filter linked to a UniProt ID.
- Partner had 219
184 genes meeting the criterion linked to a GO term.
5221 Probes in this dataset
4449 Probes linked to a UniProt ID.
- Partner had 5100
1990 Genes linked to a GO term.
- Partner had 2475
The z score is based on an N of 1990 and a R of 184 distinct genes in the GO.

Top 20 Filtered Terms:

localization
cell projection organization
amino acid metabolic process
cellular nitrogen compound metabolic process
cellular amine metabolic process
cellular amino acid and derivative metabolic process
transporter activity
nitrogen compound metabolic process
establishment of localization
transport
extracellular region
flagellum organization
cellular structure morphogenesis
anatomical structure morphogenesis
cellular developmental process
cell morphogenesis
anatomical structure development
magnesium ion binding
carboxylic acid transport
organic acid transport

Uploaded Files

Concluding Paragraph

To be completely honest, both my partner and I spent a very adequate amount of time on calculating results and running the tests described in Part 2 of this assignment. However, it was difficult for both of us to even interpret what the assignment was asking for. We compared our results to the best of our ability, but are still very confused as to what exactly the assignment is asking us to do.

772 errors found.
27 results

Dwilliams Week 8 assignment

Contents

Part 1

Normalize the log ratios for the set of slides in the experiment

Perform statistical analysis on the ratios

Using Data to Create GenMapp Files

Part 2

Map Onto Biological Pathways (GenMAPP & MAPPFinder)

Mapfinder Procedure Answers

Comparing the Numbers Between My Partner and I

Uploaded Files

Concluding Paragraph

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox