Class Journal Week 8

From LMU BioDB 2013
Jump to: navigation, search

Contents

Alina Vreeland

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

Baggerly and Coombs identified the issues of inconsistent data. DataONE touched upon the issue of being diligent in the entry of data into a spreadsheet, and this was obviously violated, since the data was not able to be successfully reproduced by others.

  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

He recommends documentation of all steps and checking labels on every gene, etc, so that your peers can easily follow what you're doing, just like keeping an electronic journal in bio databases so that other people can follow the process that you followed. This would include being consistent in how you enter data into your spreadsheet, having all information in one place, and using file types that can be easily used be others in the future, like DataONE stresses in their powerpoint.

  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

I still have the same general feeling about the case. It seems odd that people in the science field would not take better care of their data, and be so lazy when it comes to making their data valid and easy to use by others. In order to have any sort of high reputation in your respective field it seems like you wouldn't want to seem like an amateur when presenting your data to others, especially if you present fraudulent data as something to be prized.

  • Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?

The methods and results were not clearly defined. Therefore, it would be extremely difficult for another person to attempt to reproduce the results, since even the results were not clearly defined. The authors used many terms such as "one possibility," suggesting that even they cannot be certain in their own findings. For one to be able to successfully reproduce the data, more detailed information about the process would be needed.

Ajvree (talk) 22:35, 17 October 2013 (PDT)

Dillon Williams

  1. The main issue with the data is that it didn't match up with the results that Baggerly and Coombs had formulated from the data. DataONE enumerates that valid and organized to support ease of use, the data sets presented from Duke did not match up with either of these. Dr. Baggerly used the excuse that mixing up sample labels, gene labels, and group labels were common mistakes that could have happened to anybody.
  2. Dr. Baggerly recommends keeping data records as clear and consistent as possible at the research level and advises labeling published data with a code in order that others can reproduce the same results effectively. DataONE also recommends keeping better data, especially in regards to keeping data maximally consistent.
  3. I don't really understand how researchers that were actively involved in this experiment were so lax about monitoring their data.
  4. I would not be willing to assume one way or the other. To be honest, I'm not well educated enough in the field to give an accurate hypothesis as to such results.

-Dwilliams (talk) 23:42, 17 October 2013 (PDT)


Miles Malefyt

1.What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

The main issues with the data and analysis as identified by Baggerly and Coombs were that the end result of the data did not match up with the methods used. The numbers must have been made up or manipulated in many cases in order to get the results which ended up being non-reproduceable.

2.What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

The DataONE and Dr.Baggerly reccomend being consistent with the data used and adhering to the methods described so that they can be reproduceable

3.Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

I feel that much of the scientific community is more oriented towards showing results than making data that is able to be reproduced. It makes me feel like this is more about money and fame than it is about the actual science.

4.Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?

Mmalefyt (talk) 18:44, 17 October 2013 (PDT)



Lauren Magee

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issue with the data was that it didn't match up with the results that Baggerly and Coombs had formulate from the data. The data points given by the researchers were in between the extremes that were plotted by Baggerly and Coombs (i.e. those resistant to a specific drug or those sensitive to a specific drug). However, the heat maps they created were consistent with those done by Baggerly and Coombs for the most part, so it was there analysis of the graphs that were the main concern for biostatisticians. At some points the researchers had even interpreted the data backwards, so that their conclusion was the opposite of what it should have been. Baggerly also thought it important to note that later on in their analysis the researchers had numerous repeats in their data that was also messing with their final results and when they suggested taking these out, the researchers produced a new list with still a few repeats that even contradicted themselves. The general issue that Baggerly speaks of is the need for researchers to keep a detailed log of what processes they used to analyze their data. Not only did the researchers in question lack a detailed log of their work, but they also refused to give the biostatisticians some of their data, because it was "confidential".
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Baggerly recommends keeping a detailed journal of your research making sure that someone looking back at these notes, would be able to reproduce your data with exact accuracy. If someone starts with the same numbers as you did, they should get the same numbers at the end of the analysis.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I think that this talk speaks volumes of the need for individuals, who can analyze big data correctly. In this case, there may have been fraudulent data being produced by the researchers themselves to create a desired outcome, but I also think that in general researchers have issues analyzing their data correctly and effectively. There are so many amazing programs in place to help analyze big data, but the individual must now how to interpret such results to make applicable conclusions. This is one of the reasons I decided to take this class, because I want to be able to enter the scientific research community with the knowledge of how to handle big data.
  4. Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • I think they communicated their analysis processes well, but I don't think i would be able to reproduce their exact same data. This is due to the fact that I think they could have provided a lot more detail as to what exactly they were doing at each step, because that would allow for me to follow their steps exactly without the worry of interpreting their description wrong.

Hilda Delgadillo

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The labeling of the genes associated with the particular drugs were incorrect. Their genes were one off in terms of the list, so the set of genes that the research paper described were not actually describing the corresponding biology. Baggerly and Coombs were not able to replicate the data analysis. Some of the common issues that were mentioned is the idea that the easier steps can often be erroneous such as the labeling of genes, sample labels, and group labels, all in all very simple mistakes. The practices that were also violated which is encountered in the DataONE slides was the inconsistency of data and the mislabeling of samples.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • He recommends to record everything to avoid incomplete documentation. He recommends labeling as much as possible for graphs as an example and as the slides mention, labeling columns are important if charts are used. Also, Dr. Baggerly recommends providing the codes of the analysis.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • It is shocking how the analysis that proved the data to have significant errors were practically ignored, so eventually the clinical trials were permitted. Therefore, the deaths of the cancer patients that took part in this research trial could have been prevented.
  4. Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • I don't think there's enough information to reproduce their data analysis. Some of the descriptions are very specific, but some terms are not explained in detail and just mentioned. I was also hoping to see their microarray data by clicking on this provided link http://genome-www5.stanford.edu/microarray/SMD/), but it took me to a "Not Found' page.


HDelgadi (talk) 15:58, 17 October 2013 (PDT)


Tauras

  1. The main issues with the data and analysis was that the procedure did not match up with the results. The data appeared reasonable, but when Baggerly and Coombs attempted to replicate the analysis the results were different than the reported findings. In particularly, the labeling schema was incorrect (a violation of best practices) which made it difficult to tell exactly how their analysis was completed and track data from one part of the analysis to the next. Dr. Baggerly identifies the issues of mislabeling, inconsistent terms, and group labels as common mistakes in the field.
  2. Dr. Baggerly recommended a clearer system of organizing data that made sure to keep consistent labels and terms. Additionally, he recommended keeping a clear record of intermediate steps and what was done to produce each step of the analysis. This coincides well with the best practices discussed in DataONE which lays out the same general guidelines and principles in a more formalized and objective system that one can be evaluated based upon.
  3. Dr. Baggerly's insight hit me on a kinda personal level as I realized a lot of the issues he identified in my own lab work for classes. Although there is no issue of fraud or serious malpractice if I get my variables mistaken and the data is simple enough that I can complete it in a messy format, having clearer formatting and a better organized structure would definitely make it easier for me to write lab reports and efficiently describe my process. However, I really didn't have any more of a reaction to the Duke case and felt like this was moving on from that one example into how to prevent accidental research malpractice.
  4. I think there is sufficient information present for someone knowledgeable in the field to reproduce their results but I cannot be sure without going through the raw data. They appear to lay out their steps in a well organized manner, but I don't have the technical experience to properly evaluate all components or tell if there are steps or assuming they are skipping or glossing over.

Taur.vil (talk) 23:51, 17 October 2013 (PDT)

Lena Hunt

1.) What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  • There was an off-by-one indexing error in the genes, so they were referencing the wrong genes in the paper. Furthermore, of the genes in the paper, some worked to split the test set, some worked to split the training set and some worked to explain why the biology worked, but there was no overlap. Overall, the genes had mislabeled as sensitive or resistance when they were in fact the opposite. DataONE enumerates that valid and organized to support ease of use, the data sets from Duke were neither. Dr. Baggerly claimed that mixing up sample labels, gene labels, and group labels were common mistakes.
2.) What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  • Dr. Baggerly recommends keeping better records at the research level and labeling published data with a code so that others can reproduce the results. DataONE recommends the better data keeping as well, especially in regards to keeping data consistent.
3.) Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
  • I am still shocked at how careless the researchers were about checking their data.
4.) Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
  • I think so, it seems more or less straightforward. It was written for a scientists with more knowledge about those particular techniques, and so while I think I understand mentally from what I have learned in class, I don't know how easy it would be to reproduce in a wetlab because I have never done it before.
Lena (talk) 21:53, 17 October 2013 (PDT)

Mitchell Petredis

  1. The genes in Anil Potti's paper were off by one in the index, which subsequentially caused later genes to be misidentified. When other researchers attempted to recreate the results, none of them were successful because of this.
  2. Dr. Baggerly recommends that you include everything and anything that happens throughout experimentation as well as proper labeling in order for others to be able to follow procedures accurately and reproduce consistent results; DataONE agrees with Baggerly's recommendations.
  3. My reaction remains unchanged, as I'm still surprised that there weren't strict proofreading procedures at Duke for this particular case.
  4. While the paper seems too complex for someone like me to reproduce, I'm sure it is possible for someone else with the proper resources to reproduce the same or very similar results. The paper describes their wetlab methods with enough detail to follow along; however, not much information is given as to how to perform microarray analysis and obtain their results, but I suppsoe someone familiar with the process could figure out how to repeat the microarray analysis.

Mpetredi (talk) 22:31, 17 October 2013 (PDT)Mitchell Petredis

Stephen Louie

  1. The data was inconsistent with the results produced by Baggerly and Coombs. Data was not consistently recorded and maintained which made it even more difficult to verify results. Common mistakes such as gene labels, sample labels, and group labels were made frequently.
  2. Baggerly suggests to keep a thorough recording of all of one's data and careful labeling. DataONE recomends that one keep a consistent record of their data throughout the experiment.
  3. It was a bit disappointing to see how something as noticeable as this was not given closer examination. While Potti was mainly culpable for the deaths of the cancer clinical trial patients, I feel that the lax amount of oversight also played a huge part in this incident.
  4. It is unlikely that I would be able to replicate the results of this paper. From my impression, the writer's seem to make several assumptions about their readers in that they are thoroughly proficient in wetlab. Thus only some steps are outlined in detail while others are not as well-covered.

Slouie (talk) 23:43, 17 October 2013 (PDT)

Katrina Sherbina

  1. Baggerly and Coombs were not able to reproduce the results from data analysis. One of the problems they noticed was that the indices for the microarray chips were off. Furthermore, the sample IDs were mislabeled leading to an incorrect determination of the number of unique samples. In addition, Baggerly and Coombs were not able to match heat maps and gene lists describing the effects of different drugs on gene expression. The practice of storing data in a consistent format as described by DataONE was violated. Specifically, the probe sets were not separated by which array was used to obtain the sets. This could also be considered a violation of the practice of using descriptive column names. The name of the probe sets should indicate what array was used to collect the set. Baggerly claimed that mislabeling samples and groups is a very common mistake.
  2. Dr. Baggerly suggests that all those who submit a paper should also clearly label the columns/samples in the data, submit the code that they used, and include descriptions of steps taken that were not part of the script. He further suggests that all of the aforementioned must be submitted to a journal before clinical trials are started.
  3. After viewing Dr. Baggerly's take, I am astounded at the number of simple mistakes, such as mislabeling samples, that were made by the Duke researchers. Furthermore, it was puzzling to hear that these simple mistakes were not discovered in the initial investigations of the data and analysis by other organizations.
  4. I think that all of the wet-lab procedures are described in enough detail to allow other researchers who are familiar with some of those techniques to repeat them. However, I do not think that there is enough detail given about the microarray data analysis in order to be able to reproduce it. In particular, while the paper mentions that the data was normalized, it is not stated what specific techniques were used to perform this normalization.

Ksherbina (talk) 03:36, 18 October 2013 (PDT)

Kevin Meilak

1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

  • The main issue with the data and analysis was that it could not be reproduced, and that the procedure did not match up with the results. The labeling scheme was incorrect (was off by one, a violation of best practices). Mislabeling, inconsistent terms, and group labels were the common issues in the field.

2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

  • Dr. Baggerly recommends precise and accurate data organization to prevent inconsistent labeling that then leads to errors. He also recommends writing out each intermediate step clearly to ensure reproducibility without the need for forensic bioinformatics. This matches up well with the DataONE best practices which recommend clear and simple data storage so that any tests can be reproduced if necessary.

3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

  • I am even more convinced that the data was manipulated intentionally because of the level of and repeated nature of the inconsistencies and "mistakes" in the data. It demonstrates how important data and experimental review are in order to ensure useful data and clinical practices.

4. Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?

  • Their methods and results are well laid out and are very detailed. There is sufficient information to reproduce their data analysis, as each step is gone over in detail and several are referenced.

Gabriel Leis

  1. The main issues identified by Baggerly and Coombs were issues with data entry and reproducibility. In particular, the mislabeling and incorrect entry of data was detrimental to the research. Baggerly and Coombs noticed that the gene IDs were mislabeled which resulted in any pattern found in the data to be invalid. The researchers also had issues with consistency in transferring data. This resulted in the data being very hard to track and lead to difficulty analyzing the procedure used by the researchers. The researchers violated the DataOne best practices of inconsistent entry and relationships between data sets were not well defined. Baggerly identified that inconsistent or incorrect data entry is very common especially when working with new or complex software.
  2. Baggerly suggests a precise and thorough record of all procedures performed in research. He also suggests taking the time to ensure quality data entry and organization to prevent error. Baggerly also recommends submitting code used to analyze to data so that errors can be identified quickly and easily.
  3. I am amazed at the number of ways that data can be manipulated and how easy it can be to perform errors that can be deleterious to research. After watching this video I have a better understanding of how hard it can be to identify error in research due to the level of complexity involved in data analysis.
  4. There is plenty of detail in describing the collection of samples and methods used to produce the microarray data. This part of the experiment seems reproducible. However, the description of methodology used to analyze the data seems sparse. Even though all raw data is included I doubt that I could reproduce the analysis of that data.

Gleis (talk) 16:43, 20 October 2013 (PDT)

Viktoria Kuehn

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    The main issues with the article were the ambiguity of the procedure which made it difficult to reproduce the data and to check the accuracy by retracing the analysis. There was a lack of quality control in the data which led to a few problems with the entry of the data. For example, the data was mislabeled and off by one, and in other instances it was unclear where the resulting statistical values came from. These are common issues that occur in articles submitted to scientific journals. The quality is often checked, but the actual methods that produced the analysis is not reproduced in detail to check for correctness.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    Dr. Baggerly recommends that the data behind research is easily accessible, and that the procedure is described in detail so that someone can easily check the validity of the information given by following the steps that were originally taken. He also suggests describing the program used for analysis so that the analysis can be reproduced. This would lead to more quality assurance, because it would make it easier for an outsider to reproduce the steps, which would result in more careful steps in the original so that mistakes do not surface when the information is already published. DataONE also stresses the importance of clarity in papers when conducting research.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    I was shocked at the amount of error and the many ambiguities that were encountered when this research was being reproduced. I assumed that the papers published in such journals went through much more quality control before being published, especially those that were being applied to clinical trials and affecting people's lives. I was also surprised at the way many people responded when Dr. Baggerly and his team were trying to access information and notifying the journal that there were mistakes. I thought that they would have addressed this issue more openly and been more adamant about resolving the errors rather than denying them.
  4. Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    It looks to me as though the methods were explained clearly, and the steps in conducting the experiment seem reproducible. SInce I am not an expert in the subject it is hard for me to be certain, the steps statistical analysis of the actual data seems to be a little sparse in detail and I do not think I would be able to reproduce this with the given information. There is a good chance, however that someone more knowledgable in the area would be able to do it with the information given.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox