Johnllopez Week 8

From LMU BioDB 2017
Jump to: navigation, search

Electronic Lab Notebook

Experimental Design and Getting Ready

The strain I used is dSWI4, my timepoints I will be analyzing are t30, t60, t90, and t120. Data was not provided for t15. There were 4 replicates for each of the timepoints.

The first steps I took to complete this assignment were performed in class as I followed along to Dr. Dahlquist's instructions. Note that each time the list below advances 1 number, I performed a save.

  1. After initially downloading the Excel document, I went through and deleted all of the columns that did not relate to me and my partner's strain (dsWI4). Then, I went through the data and replaced cells with "NA" with a blank string. There were 3641 replacements.
  2. I then renamed the document BIOL367_Fall2017_Dahlquist-microarray-data-master_20171017_JL.

Statistical Analysis Part 1

  1. I created a new worksheet named "dSWI4_ANOVA" which would act determine if any of the genes had a significantly different gene expression change han zero at any timepoint.
  2. I copied the "MasterIndex", "ID", and "Standard Names" columns from the master sheet, and created 5 column headers in the form of "dSWI4_AvgLogFC_(TIME)", and my time values were 15, 30, 60, 90, and 120.
  3. I populated the columns for t30, t60, t90, and t120 by calculating the average of the 4 replicates, using the =AVERAGE() function.
  4. I then created a column which applied the sum of squares calculation (=SUMSQ) for the average logs of fc (as explained in the previous step).
  5. Next, I performed the following formula in 4 columns for the timepoints 30, 60, 90, and 120: = SUMSQ([average log of range]-COUNTA([range of cells for the timepoint])) * [Average log of range].
  6. In another column, I computed the aum of the values in the previous 2 steps.
  7. I decided our total n would be 16 because we are analyzing 4 time points and we have 4 replicates.
  8. After letting 16 = n, we applied the two following functions in order to receive our dSWI4_Fstat and dsWI4_p-values: =((n-4)/4)*(Y2-AD2) and =FDIST(AE2,4,n-4).
  9. Finally, I filtered through my p-value data to show only p-values less than 0.05. The result was 5475 records found.

Bonferonni and p value Correction

I started this section by creating two new colums with the label "dsWI4_Bonferonni_p-value". Next, I filled the entire first column of that using the following equation: (dSWI4_p-value * 6189) and filled the column AG. Letting that result = AG, I filled the column AE by using the following formula: =IF(AG2>1,1,AG2).

Benjamini and Hochberg p value Correction

To do this, I created a new worksheet to represent the Benjamini and Hochberg p value Correction calculations. I copied the "MasterIndex", "ID", and "Standard Names" columns from the master sheet and the "p-values" sheet from the ANOVA sheet. Then, I sorted these values from smallest to largest by p-value. This was necessary to achieve an index from smallest p-value to largest. Then, I applied the 2 Benjamini and Hochberg p-value correction formulas, which were (D2*6189)/E2 and =IF(F2>1,1,F2). Finally, I put the values in ascending order by MasterIndex, and copied the last column into my ANOVA file.

Sanity Check: Number of genes significantly changed

I then sorted through all of the genes using the following criteria, with x representing the percentage of time you would see the expression change that deviates far from zero in the formula p < x.

  1. I saw that 2,802 / 6,189 genes have p <.05, or 45.274%.
  2. I saw that 1,842 / 6,189 have p <.01, or 29.762%.
  3. I saw that 975 / 6,189 have p < .001, or 15.754%
  4. I saw that 512 / 6,189 have p < .0001, or 8.273%.
  5. Out of the Bonferonni-corrected p-value, 212 / 6,189 have p < .05, or 3.425%.
  6. Out of the Benjamini-Hochberg corrected p-value, 2,076 / 6,189 have p < .05, or 33.543%

I made a PPT slide that contained these values, which you can see here].


Result Comparison

NSR1

  • Unaltered P-Value: 1.196 E-7
  • Bonferonni-corrected P-Value: 0.0007
  • B-H-corrected P-Value: 1
  • Average Log Fold Change @ 30: 3.253
  • Average Log Fold Change @ 60: 3.565
  • Average Log Fold Change @ 90: -3.693
  • Average Log Fold Change @ 120: -.084

Given the change in expression from 60 to 90, it would appear that NSR1 changes expression due to cold shock in between this interval.

ADH1

  • Unaltered P-Value: .161
  • Bonferonni-corrected P-Value: 1
  • B-H-corrected P-Value: .554
  • Average Log Fold Change @ 30: -.252
  • Average Log Fold Change @ 60: -1.126
  • Average Log Fold Change @ 90: .144
  • Average Log Fold Change @ 120: -.554

Given the change in expression from 60 to 90, it would appear that ADH1 changes expression due to cold shock in between this interval.

Summary

Our ultimate goals were to perform the ANOVA tests to determine if any of the genes we were given had a gene expression change that was significantly different than zero at any time point, despite us using 4 time points. The ANOVA tests gave us p-values, which tells us the expression change. After applying the Bonferonni-correction and Benjamini-Hotchberg p-value corrections, we were informed that several of the genes (given in percentages) had expression changes which deviates far from zero by chance less at least than 5% of the time. The correction tests gave us even more precise deviation percentages. We saw that 45.274%. of the p-values had this, and 3.425% / 33.543% had them when given the B / B-H corrections.

My Documents

Here are my spreadsheets. Here is my PPT slide.

Acknowledgements and References

Acknowledgements

I worked with my homework partner Corrine Wong in class. We met face-to-face twice on Monday, October 23rd in the computer lab. We worked through the values with our own calculations to assure that we had similar values.

While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source.

Johnllopez616 (talk) 21:45, 23 October 2017 (PDT)

References

*LMU BioDB 2017. (2017). Week 8. Retrieved October 23, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_8

Individual Journal Entries and Assignments

Class Assignments

Class Weekly Journal Entries / Project Weekly Journal Entries

My Page