Class Journal Week 7

From LMU BioDB 2019
Jump to navigation Jump to search


Contents

Mihir Samdarshi's Response

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

According to Baggerly and Coombs, the gene lists were also did not match with the training gene lists. Furthermore, resistant and sensitive genes were derived from another paper, which conflicted with the data in Potti’s data. Additionally all the genes in the signature’s index are all off by 1 because they were mislabeled error in the software. Furthermore, a lot of the samples were being reused.

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

He recommends that scientists take care of data by providing any code used for analysis openly, describe any non scriptable steps and planned designs thoroughly, and label each of the data categories descriptively. Most of the recommendations put forward by Baggerly were broader in scope than many of the recommendations in DataONE which have to do with actual data entry into relational databases and spreadsheets.

What best practices did you perform for this week's assignment?

I tried to follow the methods for the analysis to a T. Any and all work that I did was immediately saved and versioned by my computer, and thus someone could follow all of my steps. Additionally, I uploaded all the data to an open source project (this wiki), and shared every single formula within the Excel sheets.

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

I have seen this talk before, and since I watched it 2 years ago, I have learned quite a bit about molecular biology and bioinformatics, and thus it was far more easy to follow along with some of the mathematical aspects of this reverse analysis. This gave me the epiphany of how stupid it was for Potti to do this, as anyone with a computer and the right education to analyze the data could have discovered the errors in it.

Msamdars (talk) 20:41, 16 October 2019 (PDT)

Links
Mihir Samdarshi User Page
Assignment Pages Personal Journal Entries Shared Journal Entries
Week 1 Journal Week 1 Class Journal Week 1
Week 2 Journal Week 2 Class Journal Week 2
Week 3 FAS2/YPL231W Week 3 Class Journal Week 3
Week 4 Journal Week 4 Class Journal Week 4
Week 5 Database - AmtDB Class Journal Week 5
Week 6 Journal Week 6 Class Journal Week 6
Week 7 Journal Week 7 Class Journal Week 7
Week 8 Journal Week 8 Class Journal Week 8
Week 9 Journal Week 9 Class Journal Week 9
Assignment Pages Personal Journal Entries
Week 10 Journal Week 10
Week 11 Journal Week 11
Week 12/13 Journal Week 12/13
Team Project Links
Skinny Genes Team Page

Iliana Crespin's Responses

Icrespin User Page

Assignment Page Individual Journal Entry Shared Journal Entry
Week 1 Icrespin Journal Week 1 Class Journal Week 1
Week 2 Icrespin Journal Week 2 Class Journal Week 2
Week 3 ILT1/YDR090C Week 3 Class Journal Week 3
Week 4 Icrespin Journal Week 4 Class Journal Week 4
Week 5 RNAct Week 5 Class Journal Week 5
Week 6 Icrespin Journal Week 6 Class Journal Week 6
Week 7 Icrespin Journal Week 7 Class Journal Week 7
Week 8 Icrespin Journal Week 8 Class Journal Week 8
Week 9 Icrespin Journal Week 9 Class Journal Week 9
Week 10 Icrespin Journal Week 10 Class Journal Week 10
Week 11 Icrespin Journal Week 11 FunGals
Week 12/13 Icrespin Journal Week 12/13 FunGals
Week 15 Icrespin Journal Week 15 FunGals
  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • Dr. Baggerly states that the intuition of many scientists/biologists is considered poor. Biologists continue to find patterns are random lists. Most of the time the documentation is very poor and leads to "forensic bioinformatics". In DataONE, the violations that came up were the reversal of Quantification Matrix. Mislabeling and using a software that required two input files had issues with a hetero sample. He mentions that common issues are the outliers and predictions.
  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends that there shouldn't be a hetero sample. Having consistent columns and data in general is similar to what DataONE recommends. Consistency is very important.
  • What best practices did you perform for this week's assignment?
    • When downloading the data files, I would put my last name on it. In addition, I copied and pasted the methods from the source to make sure it is easy to replicate this assignment.
  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • After this case, it is shocking how there are a lot of misconceptions dealing with the collection of data. Over and over the mistakes continue to be made. During non-health event, it can be something that doesn't mean much; however, if it's a case dealing with patients, it can cause a catastrophic situation. Many of them are barely holding on to hope, and dealing with sensitive information should be reviewed more thoroughly.Icrespin (talk) 20:28, 15 October 2019 (PDT)

Naomi Tesfaiohannes's Responses

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

Baggerly states that our intuition of what makes sense is poor. Some documentations of research are written poorly, making it more difficult to reproduce the same method. Genes were being referenced even though they were not present. They are outliers. There was an offset of the P-Values by one. This was likely done because of the software they used which required two input files. Quantification matrix and gene names. The second input cannot have a hetero sample. There was likely a swapping of data in the software, meaning that medication is given to patients that don't need it. Poor clinical practice is a big issue in this case. Some samples were reused and sometimes labeled resistance and other times not labeled resistance. Of the 95 samples 15 were duplicated and 6 were inconsistent to each other. When matching the samples not all lined up and 16 did not match at all. Common mistakes are missing up labels, gene labels, group labels, and incomplete documentations.

  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Data having labeled columns, provenance, provide code, a description of non-scriptable steps, and a description of planned design. DataONE states to have consistent columns of data, consistent names, codes, and formats. DataONE also suggests to have data all in one table. With missing data leave a field empty or use a distinct value such as 9999 to indicate a missing value.

  • What best practices did you perform for this week's assignment?

Adding my initials to the downloaded file name. This was helpful since multiple people will be downloading the same file name and uploading it to their wiki page so having a distinct name will avoid any over-submissions from occuring. This also made it easier to naviage the database for my strain.

  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Poor clinical practicing cost the lives of many hopeful patients in stage 4 cancer. They knew their options were slim and put their trust in the clinical trial. The samples were duplicated and inconsistent multiple times. When trying to match the samples 16 did not math at all and 43 were mislabeled. These errors caused an incorrect validation dataset for clinical trials that were being used for 2 years. This video helped me understand how Dr. Potti's mistakes were caught.

Ntesfaio (talk) 10:40, 16 October 2019 (PDT)

Bio DB Home page

Template:Ntesfaio

Week 1

User:Ntesfaio

Class Journal Week 1

Week 2

Ntesfaio Week 2

Class Journal Week 2

Week 3

RAD53 / YPL153C Week 3

Class Journal Week 3


Week 4

Ntesfaio Week 4

Class Journal Week 4

Week 5

DrugCentral Week 5

Class Journal Week 5

Week 6

Ntesfaio Week 6

Class Journal Week 6

Week 7

Ntesfaio Week 7

Class Journal Week 7

Week 8

Ntesfaio Week 8

Class Journal Week 8

Week 9

Ntesfaio Week 9

Class Journal Week 9

Week 10

Ntesfaio Week 10

Week 11

Ntesfaio Week 11

Sulfiknights

Week 12/13

Ntesfaio Week 12/13

Sulfiknights

Sulfiknights Deliverables

Ntesfaio Week 15

Ntesfaio Final Individual Reflection

Aby Mesfin's Response

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

One of the main issues Baggerly and Coombs found was that the sensitive and resistant labels in the Quantification Matrix were intentionally reversed by Potti and his colleagues in order to produce more favorable data. Rather than interpreting 0 as "resistant" and 1 as "sensitive", Potti's team switched how they interpreted the input files. Dr. Baggerly notes that the prominent violation of the best practices in regards to data integration performed by Potti was his inability to maintain the provenance of his data. The results of his research were very much skewed not only due to mislabeling of the quantification matrix but also because it contained duplicates.

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Dr. Baggerly recommends that researchers include data, code, descriptions of nonscriptable steps, descriptions of the planned design, and maintain provenance. He also recommends that reproducible research report structure, executive summaries, and reuse templates. Some of these practices parallel those recommended by DataONE, such as maintaining provenance.

What best practices did you perform for this week's assignment

The best practice that I used was creating a descriptive file name for the dataset.

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

I appreciated how this video broke down the bioinformatics and data sharing that went behind discovering the fraudulence of Potti's research. It helped me better understand the mechanisms that went into this discovery while also underlining the value of reproducibility in research.

Ymesfin (talk) 16:03, 17 October 2019 (PDT)

Christina Dominguez's Response

User Page

template: cdomin12

Assignment Page Individual Journal Entries Class Journal
Week 1 cdomin12 Week 1 Class Journal Week 1
Week 2 cdomin12 Week 2 Class Journal Week 2
Week 3 RAD53 / YPL153C Week 3 Class Journal Week 3
Week 4 cdomin12 Week 4 Class Journal Week 4
Week 5 IMG/VR Week 5 Class Journal Week 5
Week 6 cdomin12 Week 6 Class Journal Week 6
Week 7 cdomin12 Week 7 Class Journal Week 7
Week 8 cdomin12 Week 8 Class Journal Week 8
Week 9 cdomin12 Week 9 Class Journal Week 9
Week 10 cdomin12 Week 10 Class Journal Week 10
Week 11 cdomin12 Week 11 Skinny Genes
Week 12/13 Skinny Genes Quality Assurance Skinny Genes
Week 15 Skinny Genes Deliverables Skinny Genes

1.What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

Issues with the data and analysis included swapping 0 and 1 when putting in the data. This means that the data was reversed. The same sample was also labeled as both resistant and sensitive. This caused medicine to be prescribed as the best treatment when it was not. This is unfortunate for the people that were a part of the clinical trials. The best practice of using descriptive column names was violated. Common issues include mixing up sample labels that can be easy to fix in excel. The best practice of creating descriptive column name and organizing your data correctly would help to resolve this.

2.What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Consistency is important in reproducible research. Dr. Baggerly explains that always using a certain program as well as maintaining accurate reports and data management serve to make it reproducible. DataONE always emphasizes consistency in column naming and data filing for future use. This is important for accuracy as well as allowing others to come to the same conclusion that you did by reproducing one's research.

3.What best practices did you perform for this week's assignment?

I used the best practice of using a descriptive file name. This is important to be able to track your files and organize them in an efficient way.

4.Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

It is shocking how much manipulation was done to the data. It makes it even more sad for those that were part of the clinical trials; however, it shows the importance of bioinformatics and its ability to analysis the legitimacy of the data.

Cdomin12 (talk) 22:04, 15 October 2019 (PDT)

Ivy Macaraeg's Response

Imacarae's User Page

Assignment Shared Entries Individual Entries
Week 1 Class Journal Week 1 ----
Week 2 Class Journal Week 2 Imacarae Week 2
Week 3 Class Journal Week 3 HSF1/YGL073W Week 3
Week 4 Class Journal Week 4 Imacarae Week 4
Week 5 Class Journal Week 5 CancerSEA Week 5
Week 6 Class Journal Week 6 Imacarae Week 6
Week 7 Class Journal Week 7 Imacarae Week 7
Week 8 Class Journal Week 8 Imacarae Week 8
Week 9 Class Journal Week 9 Imacarae Week 9
Week 10 Class Journal Week 10 Imacarae Week 10
Week 11 Sulfiknights Imacarae Week 11
Week 12/13 Sulfiknights Sulfiknights DA Week 12/13
---- Sulfiknights Sulfiknights DA Week 14
  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues with the data and analysis identified by Baggerly and Coombs include heat maps and predictions that were completely reproducible, misinterpretation of the data (ie. the switching of resistant or sensitive), or the replication of data. Best practices from DataONE that were violated include standard representation and files that are readable into the future. The presented data was not in these forms for researchers to analyze. Some of the common issues include confounding the experimental design, mixing up data labels, simple mistakes usually from Excel.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends that in their papers, researchers use well-maintained data, provenance, code, descriptions of nonscriptable steps, and descriptions of planned design. These are very similar to the DataONE recommendations.
  3. What best practices did you perform for this week's assignment?
    • For this weeks assignment, I tried to practice with formatting cells correctly (ie. using descriptive names without spaces) as well as formatting cells correctly, making sure the selected data was complete.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I think my main reaction is surprise that it took so long for Dr. Baggerly's data observations to be fully analyzed. It is sad that these data errors were not taken seriously first-hand, and I think a lot of people's time, money, resources, and health could've been better off if these errors were caught. This presentation reemphasized how important data analysis is, as it is something I haven't given a second thought to.

Imacarae (talk) 01:55, 16 October 2019 (PDT)

DeLisa Madere's Response

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues with the data and analysis that Baggerly and Coombes found was that the samples were inaccurate. The scientists scrambled the gene labels so bad that there is a large uncertainty as to which samples they belong to, which unfortunately happened to be the samples that were incorrect for the drugs that the scientists used in their clinical trials for 2 years. The best practices that were violated include inconsistencies of the data. In the experiment, there were many errors in the mislabeling of the genes, leaving them with inconsistencies in their gene titles. They also had missing data in which they still included in the research, whereas, if there is missing data, there should be no entry at all to indicate that instead of making up some kind of data that can be harmful. Dr. Baggerly claimed that the common issues occurred within the labeling of the actual gene, creating inaccurate results in the data itself.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends that for reproducible research, the scientists should provide data along with identifiers, provenance, code descriptions of nonscriptable steps, and descriptions of the planned design. These correspond to DataONE's ideas because they recommend that the data is consistent with the usage of titles to properly label each column in a spreadsheet for example and they recommend descriptions and data that are literate.
  3. What best practices did you perform for this week's assignment?
    • For this assignment, I had to make sure my data file was labeled correctly to ensure that I wasn't interfering with anyone else's files. It is important to label correctly for accuracy in the data.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • After hearing this talk, it seemed more so that the data presented was not complete at all and implied that the scientists behind the scandal knew this and wanted to give 50% of their effort into this experiment. The fact that the common issues with the data had to do with the labeling revealed that they were not being careful about even the most simple things including the labels.

Dmadere (talk) 21:59, 16 October 2019 (PDT) My Page

Assignments Journal Entries Shared Journal
Week 1 Dmadere Week 1 Class Journal Week 1
Week 2 Dmadere Week 2 Class Journal Week 2
Week 3 CMR2/YOR093C Week 3 Class Journal Week 3
Week 4 Dmadere Week 4 Class Journal Week 4
Week 5 CancerSEA Week 5 Class Journal Week 5
Week 6 Dmadere Week 6 Class Journal Week 6
Week 7 Dmadere Week 7 Class Journal Week 7
Week 8 Dmadere Week 8 Class Journal Week 8
Week 9 Dmadere Week 9 Class Journal Week 9
Week 10 Dmadere Week 10 Class Journal Week 10
Week 11 Dmadere Week 11 Sulfiknights
Week 12/13 Dmadere Week 12/13 Sulfiknights
Week 15 Dmadere Week 15 Sulfiknights

Template:Dmadere

David Ramirez's Response

User:Dramir36 template:Dramir36 Skinny Genes

  • Week 1
Week 1
Class Journal Week 1
  • Week 2
Week 2
Class Journal Week 2
Dramir36 Week 2
  • Week 3
Week 3
Class Journal Week 3
CDC28/YBR160W Week 3
  • Week 4
Week 4
Class Journal Week 4
Dramir36 Week 4
  • Week 5
Week 5
Class Journal Week 5
CRISPRlnc Group Journal
  • Week 6
Week 6
Class Journal Week 6
Dramir36 Week 6
  • Week 7
Week 7
Class Journal Week 7
Dramir36 Week 7
  • Week 8
Week 8
Class Journal Week 8
Dramir36 Week 8
  • Week 9
Week 9
Class Journal Week 9
Dramir36 Week 9
  • Week 10
Week 10
Class Journal Week 10
Dramir36 Week 10
  • Week 11
Week 11
Dramir36 Week 11
  • Week 12/13
Week 12/13
Dramir36 Week 12/13
  • Week 14
  • Week 15

1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

The first issue mentioned was that there was an off-by-one indexing error for all of the genes in the signature. This may have happened through the software that was used. Another issue was that the software also gave predictions. Not all heatmaps were labeled correctly. Transferring data from excel to another data software can cause error. When predictions were made for Adriamycin, the data portrayed that the drug was resistant for most patients, but in actuality, is very effective. The best practices that were violated are that the columns of data were not kept consistent during data transfer from one software to the other. It also seems that the data was not stored in a format that could be used by any application, which would have prevented incorrect data analysis by the software used. Dr. Baggerly claimed that change in the data sets between data transfer of software is a very common issue.

2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

The recommendation is that the data process should be transparent and reproducible during the integration process. As Dr. Baggerly stated, reproducible data should be accessible to any person so that they are able to read the paper and reports that contain all the code so that proof can be shown for the data results. This implies, from DataONE, that others should be able to understand and can evaluate the person's decision making process.

3. What best practices did you perform for this week's assignment?

The columns of data are consistent with only text and numbers. Also, the data is all in one table, which is much easier for a statistical program to work with than multiple small tables.

4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

The talk brought up point of how Duke University said that they have become more confident into evolving the personalized cancer treatment, which seems really odd. I would have thought that Duke would have taken some time off of cancer research, but have instead become more encouraged to continue the cancer research.

Dramir36 (talk) 23:56, 16 October 2019 (PDT)

Jonar Cowan's Response

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

The main issues revolved around the inconsistency of data, lack of proper labeling and instructions. Baggerly and Coombs showed that data replication was inconsistent and repeating the process brought inconsistent results. There was also mention of data being read incorrectly, which resulted in misunderstanding. DataONE suggests labeling is an important practice to create reproducible data. Common issues would be mislabeling data.

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Dr. Baggerly suggests that consistency with reusing templates and labeling, as well as having a report of the whole process in detail is important for reproducible data. DataONE recommends properly labeled data (columns and any data entry) and proper long term storage, essentially they are promoting the same advice.

What best practices did you perform for this week's assignment?

For this weeks assignment, I practiced proper labeling of data and using the same applications and files.

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

In regards to Potti's work, I am shocked at how long the process took for Potti's work to be disproved. At the same time, I thought that with all the regulations and practices in the scientific field, we would not run into such a problem with false data.

Assignment Individual Journal Shared Journal
Week 1 jcowan4 Class Journal Week 1
Week 2 jcowan4 Journal Week 2 Class Journal Week 2
Week 3 FAS2 Week 3 Class Journal Week 3
Week 4 jcowan4 Journal Week 4 Class Journal Week 4
Week 5 iDog Week 5 Class Journal Week 5
Week 6 jcowan4 Journal Week 6 Class Journal Week 6
Week 7 jcowan4 Journal Week 7 Class Journal Week 7
Week 8 jcowan4 Journal Week 8 Class Journal Week 8
Week 9 jcowan4 Journal Week 9 Class Journal Week 9
Week 10 jcowan4 Journal Week 10 Class Journal Week 10
Week 11 jcowan4 Journal Week 11 Skinny Genes
Week 12/13 Skinny Genes Quality Assurance Skinny Genes
Week 15 jcowan4 Journal Week 15 Class Journal Week 15

Misc. Links

Jcowan4 (talk) 22:50, 16 October 2019 (PDT)

Emma Young's Response

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues that Baggerly and Coombs found in these journals was an inconsistency in the data. This was shown in inconsistencies in the labeling of the genes, the fact that the genes were not labeled in the graphs, and sample sets that at times did not make sense due to repeats and inconsistencies in results. They also at times mislabelled the results to the point that the results were flipped. They defiantly did not use the best practice of consistently labeling their Data, they also did not seem to use a common document format, or label what was on their graphs and charts properly. Dr. Baggerly claimed that inconsistent labeling was the most common issue.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Bradley recommends publishing with the data, labelling the columns, labelling the samples, providing the code, listing method, and planned design should all be used. He also recommends doing research in sweave which allows for everyone to use and rerun the data. The ideas about labeling, fits with the best practice of giving descriptive names. The use of sweave as the one format they use definitely follows the best practice of keeping the data in a constant format, it also allows them to be read in the future.
  1. What best practices did you perform for this week's assignment?
    • In this weeks assignments we have used the best practices of labeling our data with distinct and informative names, we are keeping all our data in one table, and we are keeping progressive saved files of our data analysis.
  2. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • After Dr. Baggerly's talk I am further shocked by how much someone could misinterpret and falsify this data. It also made me realize that while the video we watched seem to focus on the errors of one man there were so many other names on that paper and that it could not have all been done by one man.

Eyoung20 (talk) 22:00, 16 October 2019 (PDT)

Eyoung20 user page

Assignment pages Individual Journal Class Journal
week 1 Eyoung20 journal week 1 Class Journal Week 1
week 2 Eyoung20 journal week 2 Class Journal Week 2
week 3 ASP1/YDR321W Week 3 Class Journal Week 3
week 4 Eyoung20 journal week 4 Class Journal Week 4
week 5 Ancient mtDNA Week 5 Class Journal Week 5
week 6 Eyoung20 journal week 6 Class Journal Week 6
week 7 Eyoung20 journal week 7 Class Journal Week 7
week 8 Eyoung20 journal week 8 Class Journal Week 8
week 9 Eyoung20 journal week 9 Class Journal Week 9
week 10 Eyoung20 journal week 10 Class Journal Week 10
week 11 Eyoung20 journal week 11 FunGals
week 12/13 Knguye66 Eyoung20 Week 12/13 FunGals
week 15 Knguye66 Eyoung20 Week 15 FunGals

Michael Armas' Response

Michael Armas' User Page
Weekly Pages Individual Journals Shared Journals
Week 1 Individual Journal Week 1 Class Journal Week 1
Week 2 Individual Journal Week 2 Class Journal Week 2
Week 3 Individual Journal Week 3 Class Journal Week 3
Week 4 Individual Journal Week 4 Class Journal Week 4
Week 5 Individual Journal Week 5 Class Journal Week 5
Week 6 Individual Journal Week 6 Class Journal Week 6
Week 7 Individual Journal Week 7 Class Journal Week 7
Week 8 Individual Journal Week 8 Class Journal Week 8
Week 9 Individual Journal Week 9 Class Journal Week 9
Week 10 Individual Journal Week 10 Class Journal Week 10
Week 11 Individual Journal Week 11 Class Journal Week 11
Week 12/13 Individual Journal Week 12/13 Class Journal Week 12/13
Week 15 Individual Journal Week 15 Class Journal Week 15

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

  • When Baggerly and Coombes were approached to try to recreate the results proposed by Potti, they used the publicly published data to do so. Baggerly and Coombes ran the same test but used known data to obtain "expected" results. When comparing these results to the results gathered from Potti's data, Baggerly and Coombes found data that was not consistent with the expected. It's as if Potti's team just kept moving on even after receiving results that were not expected. So many data points were "off by one" or completely opposite of that was expected.
  • The main thing Baggerly talked about that would go agianst the best practices as stated by DataONE had a lot to do with data organization. Duke's paper was unorganized, making the data difficult to read. There was one plot that Baggerly showed that had data points that were so poorly mislabelled that the data was almost impossible to interpret.
  • Baggerly claims that many common mistakes pertain to mislabelling samples and the lack of documentation. Both of these are important for the reproducibility of data, which was difficult due to the lack of organization provided by Duke.

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

  • Dr. Baggerly recommends literate programming, reusing templates, report structure, executive summaries, and appendices. As these points mention many ways to adapt to the standards of the community (such as publishing code, using correct formatting, etc.) this is congruent with DataONE supporting researchers using a set of best practices so that other community members are able to understand the research with ease. As for Dr. Baggerly and his team, they are using Sweave to easily be able to reproduce data by running code that will be able to show the same results as gathered originally. They are stepping away from anything private and want their information to be available to the public to ensure credible research.

What best practices did you perform for this week's assignment?

  • By adding my username to the Excel file that was uploaded, it will be easy for those trying to view my work to differentiate my data from the the data of my homework partners. Additionally, by adding commit comments to every edit made, a tracklist of what has been done to this page is accessible to those interested in following along with the journal's development.

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

  • I am shocked to see how much information was mostly ignored by Duke. So many results showed the opposite of what was expected and they continued to run the clinical trials. After seeing the falsely interpreted data in front of me, I am even more shocked than last week about how Duke handled this. Additionally, the time frame Baggerly gave during his talk was not ehat I expected. I would expect Duke to shut down clinical trials immediately until an investigation was over, but it took them many months to even start an investigation, then eventually restarted the trials. This case gets more shocking the more I learn about it.

Marmas (talk) 20:43, 16 October 2019 (PDT)

Marcus Avila's Answers

Links

User Page

Template:mavila9

Assignment Page Individual Journal Entry Class Journal Entry
Week 1 Week 1 (User page) Shared Journal Week 1
Week 2 Mavila9 Week 2 Shared Journal Week 2
Week 3 Gene Page Week 3 Shared Journal Week 3
Week 4 Journal Entry Page Week 4 Shared Journal Week 4
Week 5 RNAct Database Page Week 5 Shared Journal Week 5
Week 6 Journal Entry Page Week 6 Shared Journal Week 6
Week 7 Journal Entry Page Week 7 Shared Journal Week 7
Week 8 Journal Entry Page Week 8 Shared Journal Week 8
Week 9 Journal Entry Page Week 9 Shared Journal Week 9
Week 10 Journal Entry Page Week 10 Shared Journal Week 10
Week 11 Sulfiknights Team Page Shared Journal Week 10
Journal Entry Page Week 11
Week 12/13 Journal Entry Page Week 12 Shared Journal Week 11
Week 12/13 Sulfiknights DA Week 12/13 Shared Journal Week 12
N/A Sulfiknights DA Week 14
  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • Baggerly and Coombs identified that the data was one unit off and also that some data samples were input more than once to increase statistical significance. They also found that the labels of "resistant" and "sensitive" were switched for the data samples. The best practices in DataONE that are violated include using consistent codes in each column. Instead, Potti et al mixed up the sample labels, mixed up the gene labels, and mixed up the group labels. Baggerly and Coombs considered these simple mistakes which are the most common.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Baggerly recommends using literate and consistent programming, reusing templates, then including report structure, executive summaries, and appendices. DataONE expresses the need for consistent and complete data entry, and also promotes the storage of data in a consistent format.
  3. What best practices did you perform for this week's assignment?
    • I maintained consistent notes of what I did for the assignment through including edit summaries. Also, while working with others we maintained transparency of our contributions.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I found it even more disturbing that the scientific community disregarded clear evidence that showed Potti's work was flawed.

Kaitlyn Nguyen's Response

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • In the beginning of the video, the issues that were noted were a common pattern "off-by-one" indexing error, which in this case, referenced for genes that weren't involved. They then took the p-values to see if there were any additional errors, and like previously, it was off-by-one as well. When looking further into the software used, the first row stated gene names, but there must not be a header row. In the data and analysis of Duke's medical trials, Baggerly and Coombs correlated the 59 vectors of the gene, and 43 samples were mislabeled, others not matching at all. The data was not reproducible (they could not figure out how to recreate it). Column names, order of columns, as well as the location of information, were violated in accordance to DataONE. Other common issues that Dr. Baggerly claims are mixing up sample labels, gene labels, group labels, etc.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • For better reproducible research, Dr. Baggerly recommends scientists to label the columns ("tell us which samples are which"), and provide code. DataONE similarily recommends to create descriptive column names and file names, and to enter complete lines of data.
  3. What best practices did you perform for this week's assignment?
    • First and foremost, there were no instances of plagiarism (without citing the source). Other practices performed were acknowledging my partners and my contributions, as well as, relating any material that is relevant to the course and this week's assignment.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • No - I do not have any further reaction(s) to this case.

User Page

User:knguye66

Template Page

Template:knguye66


Table of all assignments and journal entries for BIO-367-01

Week Individual Journal Entry Shared Journal
Week 1 - Class Journal Week 1
Week 2 knguye66 Week 2 Class Journal Week 2
Week 3 ILT1/YDR090C Week 3 Class Journal Week 3
Week 4 knguye66 Week 4 Class Journal Week 4
Week 5 DrugCentral Week 5 Class Journal Week 5
Week 6 knguye66 Week 6 Class Journal Week 6
Week 7 knguye66 Week 7 Class Journal Week 7
Week 8 knguye66 Week 8 Class Journal Week 8
Week 9 knguye66 Week 9 Class Journal Week 9
Week 10 knguye66 Week 10 Class Journal Week 10
Week 11 knguye66 Week 11 FunGals
Week 12/13 knguye66 Eyoung20 Week 12/13 FunGals
Week 15 knguye66 Eyoung20 Week 15 Class Journal Week 15

Knguye66 (talk) 21:41, 16 October 2019 (PDT)