User talk:Rlegaspi

From LMU BioDB 2015
Jump to: navigation, search

2015-12-07

  • A new file with the split data has been uploaded to your team's files page: UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx
    • Note that this file is based on "UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH.xlsx". I still found an error in the other version of the file that there was a gene called "Gene ID" on the CompiledRawData sheet. This led to a missing gene on the MasterSheet and a discrepancy in the data for the scaling and centering between the two files in the fourth decimal place.

You now need to do the following:

  1. Average together the replicate data from the two spots that are now split. This means that you need to average the "Log2FC-C0-rep1-scaledandcentered" in cell C2 with the value in cell AG2, for example.
  2. Copy and paste special > paste values into a new sheet called "statistics".
  3. Compute the average of the biological replicates for each treatment and timepoint. For example, average together all four biological replicates for Log2FC-C0. Repeat for each timepoint.
  4. Compute the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
  5. Perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. Use the equation:
=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
This will return the p value. Send me the link to the file at this point so I can check the results. You can also perform the sanity check. Let me know how it goes.

Kdahlquist (talk) 16:07, 7 December 2015 (PST)

2015-12-04 Discrepancies between partner data

Hi Ron and Emily,

I've spent some more time looking at the files you both sent me and there are multiple issues that need to be addressed before we can move on with the analysis.

  1. Emily, the contents of your most recent file I downloaded don't seem to match the one that I used to split your data. For example, in the old file, in the CompiledRawData, MasterSheet, and ScalingCentering sheets, it had data from the C1, C5, C20, and C60 timepoints, but the new file has data for F1, F5, F20, and F60, which is extremely odd.
  2. The total set of timepoints that you need are C0, C5, C20, C60, F5, F20, and F60. This means that you don't need C1 and that you need to add C0 and that you don't need F1. It appears that Ron's data has the correct set of timepoints in his CompiledRawData sheet. However, there are other issues that need to be addressed.
  3. In the CompiledRawData sheet you both have 11520 genes. The very last gene in Ron's sheet has the phrase "Gene ID" instead of the actual ID, which should be SO4357, according to Emily's data. Then in the next sheet called MasterSheet, then Ron's data is missing one row, presumably because he deleted the row with "Gene ID" as the ID. You need to recover that data.
  4. When I actually looked at the numerical data for what should have been the LogFC for the same treatment, timepoint, and replicate, the data does not match between Ron and Emily.

Before we can go any further, we need to sort out these discrepancies and come up with a consistent set of data that you both can work on. The two of you need to get together and do the following:

  • Re-create the CompiledRawData sheet with the C0, C5, C20, C60, F5, F20, and F60 data and with all of the correct IDs. You need to come up with one consistent set of data for the log fold changes so that you are each not working on different values. It is clear that there are differences in how you each computed the log fold changes and you need to decide which set of numbers to proceed with.
  • You just need to recreate the CompiledRawData sheet at this point. When you have that, please upload it to the wiki and send me the link. By the way, you should be linking these files to your team's file page so I can just go there to find them. Please name this new file something different than the previous files you have both worked on. I will process it to the splitting step at this point myself and send it back to you when I've done that.
  • There were some further issues with the calculations that Emily did after that point, but I will address that once we have a CompiledRawData sheet that we know is correct.

There is the possibility that I can be online tonight after ~9:30PM, so if you can send me something by then I could potentially look at it tonight. My next window of opportunity is tomorrow at 2:00 PM. To get on track with what needs to be done, it would be really good to try and get me something by tomorrow. If that's not really possible, Sunday at 2:00 is my next window.

Kdahlquist (talk) 16:15, 4 December 2015 (PST)

Week 12 Feedback

  • I have reviewed the work that Emily and you did to compile your raw data this last week and want to acknowledge the hard work that you both put in to organize the files and perform the calculations. I'm putting the feedback here on your page and will put a note on Emily's talk page to refer her to this page.
  • In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
    1. All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
    2. Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
      • Type a "1" in cell B2 and a "2" in cell B3.
      • Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
    3. Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
    4. The next set of manipulations should be performed in a new sheet called "MasterSheet".
    5. Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
    6. Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
    7. Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. You will perform the scaling and centering operations like you did for the Vibrio cholerae data.
      • Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
    8. Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
    9. You will average the technical replicate spots for each sample to get one value for each sample.
    10. You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
    11. You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
    12. You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for Vibrio. Instead you will use the TTEST function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
    13. After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
  • Let me know if you have any questions.

Kdahlquist (talk) 13:34, 24 November 2015 (PST)

Week 6 Feedback

I’ve chosen to issue partial feedback sooner than complete feedback later, in case it will help you address issues with Week 8.

Best Practices

  • Individual work was submitted on time.
  • Shared work was not submitted at all.
  • Requisite links to and from the user page as well as page categories are all present.
  • Electronic notebook was partially maintained, but not finished: the section where you supply the sed command sequences for preparing the data files is provided without any notes, yet there is an indication (“Explanation here.”) indicating you had something to say. You should record your notebook as you work so that you can avoid missing these in the end.
  • Work was submitted in appropriate frequencies with corresponding summaries.

Database Exercises

Work in progress.

Dondi (talk) 17:19, 25 October 2015 (PDT)

Week 4 Feedback

  • Individual work was not submitted on time, with the final edit landing at 12:48pm on 9/29.
  • Good-habit items are mostly fulfilled:
    • All expected links and categories were noted.
    • Electronic notebook content was seen, but detail drops off significantly in the second and third questions.
    • Summaries were consistently provided over a good number of edits, with notably detailed descriptiveness.
  • For the exercises, the following issues were seen; all others were correct:
    • A consistent typo appears in all of your answers: the correct notation for six repeated symbols is (.){6}, and not (.)6 as submitted. This caused some initial issues when I was checking your work because I had to track down the offending command.
    • Another consistent typo appears in the command sequence from the stop codon onward: there is a missing space in the line break substitution after the <\/start_codon> pattern. This breaks all subsequent output if the command sequence is run as-is. Again, a lot of effort went into tracking down the problem. Absolute precision is needed when transcribing from the command line into the wiki.
    • There is an inconsistency in your mRNA answer (which also manifests the aforementioned typo): your previous answer on the transcription start site is correct, and the command supplied with your mRNA answer is also correct (assuming the typo is fixed), but the output you provide shows the wrong transcription start site. Now, given that there is insufficient detail in the lab notebook portion here, it cannot be determined whether this was just a typing/copy-paste error, a coding mistake, or some other issue. This is why an electronic notebook is valuable.
    • Your translation answer also exhibits significant transcription problems. Not only are the previous typos present, but the deletion portion in the end (second- and third-to-last lines) shows distinct overlap and redundancy, plus there is a missing pipe symbol (|) on the 9th line. Again, the lack of electronic notebook commentary does not shed light on your full intent or understanding of these commands, and this once more hurts the quality of your answer.
  • Shared responses were provided and they came in on time.

Dondi (talk) 17:59, 7 October 2015 (PDT)

Week 3 Feedback

  • Individual work was submitted on time, but not enough time remained to finish the shared journal entry on time. Yes, it was only late by 26 seconds, but see what you can do about giving yourself greater margin for error.
  • Aside from punctuality all other good-habit items are fulfilled:
    • All expected links and categories were noted.
    • You were able to phase your work well, and consistently supplied a change summary to all of them.
    • You accompanied your work with electronic notes and processes.
  • What follows is my feedback for the answers you provided:
    • Complement was exactly right.
    • Your 6 reading frames produced the right result, but not in the best way. Specifically, to “clean up” the remaining base pairs, you “pre-cut” them from the end. This does produce the correct result but requires foreknowledge of exactly how long the nucleotide sequence is. What if you were given a gene consisting of thousands of bases? There is a way to process the sequences without having to know how many bases to “chop” at the end.
    • All of your xmlpipedb-match answers were correct except for the last one. The difference in numbers did not have to do with capitalization, but something else. I will it at that for now to give you an opportunity to figure it out initially. If you want to know directly, just ask me.
  • Your initial experiences with the command line definitely reflect what many others experience when they learn this new approach to interacting with a computer. Keep at it and keep practicing. Repetition helps here, and, as you wrote, gradual expansion of your understanding of each command is also worth consciously building toward. Fortunately there are many resources on the Internet, calibrated for various skill levels, available to help you in this area.

Dondi (talk) 00:35, 26 September 2015 (PDT)

Week 2 Feedback

  • Although, the Week 2 scores have not yet been posted, I want to give you feedback on the assignment that you can incorporate to your your Week 3 submission.
  • First, thank you for submitting your assignment on time.
  • Your translations are correct.
  • You did not include anything by the way of an electronic notebook for this assignment. Although this assignment was pretty straightforward, you still need to document the process of what you did to arrive at the answers, not just supply the answers. Please be sure to do this for your Week 3 submission.
  • You wrote something in the Summary field for 35/35 contributions between the Week 1 and Week 2 deadlines, keep up the good work!
  • You did include the category on your individual journal entry. However, you should actually put this on your template page, instead of adding it separately to the journal entry page. This way, you will never forget to add it as long as you invoke your template on your journal entry page.
  • With regards to your comments on your shared journal entry, as you can see, others found the Kaji and Kaji article difficult to understand. It's often the shortest scientific articles that are the most difficult!

Kdahlquist (talk) 23:17, 18 September 2015 (PDT)

Week 2 Feedback Response

  • Thank you for your feedback Dr. Dahlquist.
  • I will definitely include an "electronic notebook" for the Week 3 assignment and I will also include one for the Week 2 assignment since I have the work I did on paper.
  • I will fix my template to include the categories for the the assignments from here on out.
  • Week 3 assignment is a bit more tricky since I'm not acquainted with code and the command line, but Prof. Dionisio did a good job of explaining a lot of commands and will attempt to finish the individual assignment by the end of tonight.

Rlegaspi (talk) 17:19, 20 September 2015 (PDT)


Week 1 Feedback

  • I answered your question on my User talk page.
  • The scores have not been posted yet, but I wanted to give you feedback on your Week 1 Assignment.
  • Your individual assignment was late (submitted at 03:01), your questions on our talk pages were late (submitted at 02:38 and 02:42), and your shared assignment was late (submitted at 02:45). In the future, make sure to give yourself enough time to complete the assignment so that you do not submit your work late.
  • Your assignment is complete except for the items that I list below. I particularly like how you took advantage of external links in your work history and other areas of your page. Your work history could be improved by making further sub-bullets after you give the location and dates of your employment. You wrote something in the Summary field 100% of the time, good work!
  • Missing items:
    1. You did not send an e-mail to myself and Dr. Dionisio answering the questions whether you had any concerns or whether there was anything else you wanted us to know. Please e-mail us with your answers, even if your answers are "no".
    2. You need to upload an image and use it on your page.
    3. You need to upload another type of file and link to it on your page.
  • You will have the opportunity to make up some of the points you missed by completing the tasks listed above by the Week 2 deadline, midnight on September 15.

Kdahlquist (talk) 11:43, 8 September 2015 (PDT)

Week 1 Feedback Response

  • Thank you for answering the question.
  • I apologize for the late submissions. I didn't expect to work so much over the Labor Day Weekend and was only able to finish my page after the closing shift. I will upload an image of myself and a few images relating to the content on my page; in addition to another type of file.
  • Again, I appreciate the feedback. I am going to do a better job of spreading out my tasks and staying true to my weekly study plan.

Rlegaspi (talk) 13:59, 8 September 2015 (PDT)


I’ve answered your question on my talk page.

Dondi (talk) 15:56, 12 September 2015 (PDT)