Week 9

From LMU BioDB 2017
Jump to: navigation, search

This journal entry is due on Tuesday, October 31, at 12:01 AM PDT. šŸŽƒ

Objectives

The objective of this assignment is to lay additional computer science groundwork for the semesterā€™s research project by:

  • spending some hands-on time with GRNsight
  • learning how to work with and explore a web service API and its documentation
  • looking at the ā€œfavorite geneā€ pages created by the class

Individual Journal Assignment

  • Store this journal entry as "username Week 9" (i.e., this is the text to place between the square brackets when you link to this page).
  • Invoke your template on your journal entry page so that you:
    • Link from your journal entry page to this Assignment page.
    • Link from your journal entry to your user page.
    • Add the "Journal Entry" category to the end of your wiki page.
  • Because you have invoked your template on your user page, you should also have a:
    • Link from your user page to this Assignment page.
    • Link to your journal entry from your user page.
  • Include both the Acknowledgments and References section as specified by the Week 1 assignment.
  • For your assignment this week, the electronic laboratory notebook you will keep on your individual wiki page retains a crucial role. Because most of this weekā€™s activities are exploratory, it is imperative that these explorations are recorded in sufficient detail so that:
    • You do not lose important information that you will need later on.
    • You do not unnecessarily repeat something that you already tried this week.
    • Readers can get a clear idea of what you tried and didnā€™t try, what worked and didnā€™t work.

Homework Partners

Homework partners for this week are listed below. The particular GRNsight functional area and web service API that you and your partner will work on are also indicated below. You are expected to consult with your partner, sharing your domain expertise, in order to complete the assignment. However, each partner must submit his or her own work as the individual journal entry (direct copying of each other's work is not allowed). You must give the details of the interaction with your partner in the Acknowledgments section of your journal assignment.

Hands-On with GRNsight

Each homework pair has been assigned one subset of the GRNsight client-side testing protocol for the current beta version of GRNsight. Follow this protocol and report the results of your tests in the electronic journal. Homework partners have one testing subset each so that you can talk to each other about the requested tests, but the testing itself should still be done and reported individually, in the spirit of seeking reproducible results.

  1. Each feature is to be tested in combination will all three formats that GRNsight can read (Excel workbook, SIF, GraphML). This is already specified in the testing document. Choose one file for each of these formats from this web page for use in your tests and specify them in your electronic notebook. In order to have a basis for comparison, homework partners should use the same test files for their individual test sequences.
    • For the Excel workbooks (.xlsx) in the linked collection above, click on the file then click Download to save the file to your computer.
    • For the SIF (.sif) and GraphML (.graphml) files, click on the file, click on the Raw button, then either copy-paste or save the resulting file to your computer. The .sif and .graphml file extensions are not very well known so the files may end up with .txt added to them; go ahead and remove that from the files after they are downloaded, confirming to the computer that you know what youā€™re doing.
    • Alternatively, you can right-click on the Raw button and choose the Save Link asā€¦ menu item (exact phrasing varies per browser) to jump right to the Save dialog.
  2. Each test specifies a sequence of actions to perform, followed by their expected results. Use the latter to determine whether GRNsight passed a particular test. Report the result of each test in your electronic notebook.
    • The version of GRNsight that you are testing is a beta version, so results that diverge from the expected ones are certainly possible.
    • If the observed result is the same as the expected result, indicate that GRNsight passed that particular test.
    • If the observed result is not the same as the expected result, indicate that GRNsight failed that particular test and document what was different. For many tests, a screenshot will be the most effective way to document this difference, so do not hesitate to supply one.
  3. If you see any other behavior that appears incorrect, erroneous, or confusing, please report those observations in a section of your electronic notebook as well.
  4. As always, make sure to document and acknowledge your interactions with your homework partner in the Acknowledgments section of your individual journal.

Web Service API Exploration

Each homework pair has been assigned one of the four gene-related web services that we have used for the ā€œfavorite gene pageā€ assignments (Ensembl, NCBI, UniProt, SGD/YeastMine). Because there are only four such services, two homework pairs will be working on the same service, so if you wish, you may join ā€œfourcesā€ (sorry) to explore the same web service together. Still, you must write up your findings individually in your own respective words.

Your Mission

For the web service that has been assigned to you, use the information given on this page to discover how to take a gene name/symbol (e.g., ACT1, BRO1, SPT15, etc.) and find your way to its full ā€œdata profileā€ within that service. This process may require multiple web service calls and will involve ā€œreadingā€ web service data formats such as JSON or XML.

Your foundational knowledge for this exercise begins with what you have learned from working with ā€œyour favorite geneā€ and from using the servicesā€™ corresponding websites. Furthermore, the final URLs that lead to the full gene data are already known to you: they are in the ajax-starter files from the Week 7 assignment. You will want to use a combination of a web browser and curl, with a code-savvy editor like Atom or Visual Studio Code to help make any received data more readable to you.

The Deliverable

Upon determining how to go from a gene name/symbol to that geneā€™s individual data record (as shown in the Week 7 ajax-starter files), write up this process as a reproducible ā€œrecipeā€ in your electronic journal. In general, this recipe will consist of:

  • The URLs to access in order to retrieve the desired data
  • Any portions in these URLs that need to be substituted for specific queries, such as the gene name or ID within that web service
  • Specific instructions on how to interpret the data returned by each URL so that you can extract exactly the information you need in order to proceed to the next step

This exercise is somewhat unusual in that the work lies in the process of figuring out how to use the web service. Once the steps are known, actually performing these steps is quite straightforward. Thus, although the prospect of doing this may be quite intimidating to those who are new to it, please rest assured that the journey itself is the reward here and it is the very open-endedness of this exploration that weā€™d like you to experience in this exercise.

That said, it is again imperative that you take good notes about the things you try, and their results, so that you donā€™t go around in circles and eventually narrow down your exploration the the desired set of steps.

Per-Service Hints

Finally, Dr. Dionisio has some curated notes to help you get started with each web service. These tidbits are chosen based on one or more of the following criteria:

  • They lead to the information youā€™ll need to work out the entire ā€œrecipe,ā€ but not in an obvious, dead-giveaway manner
  • They involve information that would otherwise be very difficult to look up or figure out if you donā€™t have a lot of experience in this area
  • They document the final URLs for the gene/protein data, as given in the ajax-starter files

Use this information well!

UniProt
  • Relevant documentation:
  • Supplementary websites:
  • Technical information:
    • You will encounter redirects in these web services; web browsers handle this automatically, but if using curl make sure to add the -L switch (i.e., curl -L ā€¦)
    • Your URLs will include ampersands (&), which will need special handling with curl: in these cases, enclose the URL in apostrophes (e.g., curl -L 'http://www.uniport.org?query=this&type=that')
    • UniProt primarily provides results in XML format; in one relevant step, the data can be provided in tab-delimited format, which might be easier to work with
  • Miscellaneous information:
    • You will encounter the need for a taxon ID, which identifies a specific organism; the taxon ID for our strain of S. cerevisiae is 559292
NCBI
SGD/YeastMine
  • Relevant documentation:
  • Technical information:
  • Miscellaneous information:
    • The data type of interest to us here is Gene
    • YeastMine does not abbreviate the gene name symbol field; it spells it out fully as symbol
    • Recall how SGD IDs look: they begin with a capital S followed by nine digits (e.g., ā€œS000003664ā€) ā€”YeastMine calls this the primaryIdentifier
Ensembl

One Lifeline Question per Homework Pair

Because of the varying levels of quality in the web servicesā€™ documentation and the relative unfamiliarity of most of the class with this kind of exercise, the possibility of your getting ā€œstuckā€ still looms large despite these hints and supplementary information. To accommodate this, each homework pair is allowed to ask Dr. Dionisio one lifeline question between now and the assignmentā€™s due date. The question must be precise and provide indications that you have done some good-faith exploration on your own. For instance, ā€œHow do I access the full protein entry in UniProt given a gene name?ā€ is not a valid lifeline question (obviously, I hope). So choose your question well!

To ask a lifeline question:

  1. Post the question to Dr. Dionisioā€™s talk page (remember that?)
  2. Send Dr. Dionisio and Dr. Dahlquist an email notifying us of the question (our wiki does not send notifications for new talk page postings)

Posting the question to the wiki is done in case other students will find the answer helpful for their own explorations. This continues the classā€™s themes of open science, data sharing, and reproducible results.

Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 9 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first šŸ‘šŸ¼)
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: You can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).

Review

Look at the gene pages that the class has collectively created via the Week 4 and Week 7 assignments:

Decide

On the shared journal page:

  1. Identify two (2) gene pages that you particularly like.
  2. State what you like about each gene page.
    • Format these as separate answers, for a total of three distinct responses in this weekā€™s shared journal.