Week 8

From LMU BioDB 2015
Jump to: navigation, search

This journal entry is due on Tuesday, October 27, at midnight PDT. (Monday night/Tuesday morning) with an interim deadline for part 1 due at midnight on Tuesday, October 20.

Overview

The purpose of this assignment is:

  • To gain an in depth understanding of how DNA microarray data is analyzed by carrying out all of the steps yourself on a dataset for which the "answers" are known.
  • To introduce you to the process we will use for the final projects in the course using a series of in-class and journal assignments where we will first analyze microarray data from Vibrio cholerae, and then learn how to create a GenMAPP-compatible Gene Database for this organism.
  • As, perhaps, the most complex assignment to date, for you to show discipline and proficiency in day-to-day science and engineering best practices, such as maintaining journals and notebooks, managing your files and code, and critically evaluating scientific and technical information (taken directly from the overall course objectives, emphasis added).

Individual Journal Assignment

  • Store this journal entry as "username Week 8" (i.e., this is the text to place between the square brackets when you link to this page).
  • Link from your user page to this Assignment page.
  • Link to your journal entry from your user page.
  • Link back from your journal entry to your user page.
  • Don't forget to add the "Journal Entry" category to the end of your wiki page.
    • Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.
  • Keep an "electronic lab notebook", containing your methods, results, and interpretations of the Vibrio cholerae microarray analysis part 1 and part 2 in your "username Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page.
    • Your electronic notebook should contain enough information such that you or someone else could reproduce what you did given only the information on your page.
      • It is acceptable to copy and paste the methods from the protocol page into your electronic lab notebook as long as you cite the source and change it to reflect what you actually did.
        • That means that you will often need to change the tense of the instructions from imperative to past tense, for example.
        • You need to change the instructions if you did it differently than what is stated or did a subset of what is stated there.
    • You should use screenshots and hyperlinks as appropriate to illustrate your notebook.
    • Be sure to answer any questions embedded in the protocol. You do not need to make a separate section for these. I.e., the methods and results can be mixed.
  • Upload the requested files from part 1 and part 2 to this wiki and link to them on your individual journal page.
    • IMPORTANT upload your completed spreadsheet (both the .xls and .txt versions) from part 1 by the interim deadline of midnight, Tuesday, October 20 (Monday night/Tuesday morning) so that Dr. Dahlquist can check them before moving on to part 2 of the exercise. She will not be assigning grades at this point; you will have the chance to make corrections, if necessary, before completing part 2.

Homework Partners

The homework partners for this week are listed below. Initially, you will compare your analysis with your partner to check your work. Then, you and your partner will carry out two slightly different analyses and contrast them with each other. It will be vital for you and your partner to have at least one face-to-face meeting outside of class to complete this assignment.

  • Mary Alverson, Kristin Zebrowski
  • Nicole Anguiano, Emily Simso
  • Brandon Klein, Veronica Pacheco
  • Josh Kuroda, Ronald Legaspi
  • Brandon Litvak, Anu Varshneya
  • Lena Olufson, Kevin Wyllie
  • Trixie Roque, Erich Yanoschik
  • Mahrad Saeedi, Jake Woodlee

Reading

Overview of Microarray Data Analysis

This is a list of steps required to analyze DNA microarray data.

  1. Quantitate the fluorescence signal in each spot in the microarray image.
    • Typically performed by the scanner software, although third party software packages do exist.
    • The image of the microarray slide and this quantitation are considered the "raw-est" form of the data.
    • Ideally, this type of raw data would be made publicly available upon publication.
    • In practice, the image data is usually not made available because the raw image file of one slide could be up to 100 MB in size.
    • Also, some journals do not require data deposition as a requirement for publication, so often published data are not actually available anywhere for download.
    • Microarray data is not centrally located on the web. Some major sources are:
  2. Calculate the ratio of red/green fluorescence
  3. Log(base 2) transform the ratios
  4. Normalize the log ratios on each microarray slide
  5. Normalize the log ratios for a set of slides in an experiment
  6. Perform statistical analysis on the log ratios
  7. Compare individual genes with known data
  8. Look for patterns (expression profiles) in the data (many programs are available to do this; we are going to skip this step)
  9. Perform Gene Ontology term enrichment analysis (we will use MAPPFinder for this)
  10. Map onto biological pathways (we will use GenMAPP for this)

In this week's exercise, we will do steps 5-7 (part 1, using Microsoft Excel) and 9-10 (part 2, using GenMAPP & MAPPFinder).

Statistical Analysis of Vibrio cholerae Microarray Data (Part 1)

MAPPFinder Analysis of Vibrio cholerae Microarray Data (Part 2)

Conclusion

  • Write a paragraph that briefly summarizes and gives a scientific conclusion for the work that you did this week.

Optional: downloading and installing the GenMAPP and MAPPFinder Software

  • We will be using GenMAPP and MAPPFinder version 2.1 (http://genmapp.org). This software is Windows-only and is already installed on the machines in the Seaver 120 computer lab.
    • This version is now called "GenMAPP Classic" and can be downloaded from this page.
    • Follow the instructions in the installer.
    • During installation, the installer will open a window called the GenMAPP Data Acquisition Tool. It will not function because it cannot connect to the server. This is OK, you will download your Vibrio cholerae Gene Database from the XMLPipeDB project at SourceForge.org.
  • Click on the link for the Gene Database to which you have been assigned, download the file, and save it into the folder C:\GenMAPP 2 Data\Gene Databases (if you accepted the default folders during the installation), and extract it.

Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 8 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).

View

Now that you've done your own microarray data analysis, we will revisit the case "Deception at Duke".

Reflection

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
  • Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?