Week 5

From LMU BioDB 2013
Jump to: navigation, search

The individual journal entry (UniProt exercise) is due on Friday, September 27, at midnight PDT. (Thursday night/Friday morning)

The shared journal entry, database wiki page, and PowerPoint slides for your presentation are due on Tuesday, October 1, at midnight PDT. (Monday night/Tuesday morning)

A note on the grading for this assignment:

  • The individual journal entry and shared journal entries are worth a total of 10 points. Students will be graded on an individual basis for this portion of the assignment.
  • The database wiki page is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.
  • The presentation is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.

Contents

Individual Journal Assignment

  • Store this journal entry as "username Week 5" (i.e., this is the text to place between the square brackets when you link to this page).
  • Link from your user page to this Assignment page.
  • Link to your journal entry from your user page.
  • Link back from your journal entry to your user page.
  • Don't forget to add the "Journal Entry" category to the end of your wiki page.
    • Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.

UniProt Exercise

For this exercise, you will read and follow the links in Chapter 4: Using Protein and Specialized Sequence Databases of the book Bioinformatics for Dummies (on MyLMU Connect). We are delving quite deeply into UniProt in particular because the gene databases that you will generate later in the semester in your final project are going to be derived from UniProt.

For this assignment, you will keep an electronic laboratory notebook on your individual journal page that records the steps you carried out in exploring the UniProtKB.

  • Since the publication of this book in 2003, the SWISS-PROT database has become the UniProt Knowledgebase. The underlying data are the same, but the scope and user interface for the database have been updated. Thus, some of the exact instructions of the chapter have to be changed to reflect the change to UniProt. These changes are noted below by page number.
  • Page 123:
    1. The URL for the UniProtKB/SWISS-PROT server is now http://www.uniprot.org.
    2. The Quick Search field is now found at the middle top of the page.
    • The information described in subsequent pages can all be found, but will be in a different order on the page. There is a set of navigation links near the top of the page to help you jump to each section.
  • General information about the entry (bottom of page 123):
    • This information is found under the header "Entry information" and is near the bottom of the web page, instead of the top.
  • Name and origin of the protein (page 124) is near the top of the page.
  • The References (page 126) are near the middle of the page.
  • The Comments (page 126) is now known as "General annotation (comments)".
  • The Cross-References (page 128) are even more extensive and are organized by sub-categories of databases.
    • In particular, click on a sample cross-reference link for each of the following databases, and for each, state what type of information is found there:
      • EMBL
      • InterPro
      • PDB
      • Pfam
      • RefSeq
      • GeneID
  • The Keywords (page 130) are now found listed under "Ontologies".
  • The Features (page 131) are now listed as "Sequence annotation (Features)".
  • In the section "Finding Out More about Your Protein" (page 135-139), some of the databases are defunct, highlighting how biological databases are a moving target (this book was first published in 2003).
  • A new feature of the UniProt interface is that you can view the data in several different formats. Click on the buttons on the top-right of the page to view the data as:
    • TXT: flat file text data, the original format of the SWISS-PROT data (even before it was put in a relational database)
    • XML: text data structured with tags (like you praacticed with for last week's assignment)
    • RDF/XML: a semantic web format
    • GFF: a specialized format for genomic information
    • FASTA: a basic text format for sequence information
  • Write a one-paragraph summary of what you have learned about the human EGFR protein from this exercise.
  • Reflect and answer the following questions on your individual journal page:
    1. What was the purpose of this exercise?
    2. What did I learn from this exercise?
    3. What did I not understand (yet) about this exercise?

Additional UniProt Resources

NAR Exercise and Presentation

Each year, the journal Nucleic Acids Research (NAR) devotes the first issue in January to biological databases. The goal of this assignment is to dive into the deep end of the pool and experience the breadth and depth of biological databases available on the Web:

For this exercise, you will work with an assigned buddy. Choose a database from this issue and answer the following questions about that database. Each pair should choose a different database to profile. So, to claim your first choice, go to the Class Journal Week 5 page and stake your claim to a database. When you are choosing your database, look at the other students' entries to make sure you are not doing the same one. The buddy assignments are:

  • Hilda - Mitchell
  • Kurt - Kevin Meilek
  • Lena - Miles - Tauras
  • Viktoria - Kevin McGee
  • Gabriel - Katrina
  • Stephen - Alina
  • Lauren - Dillon

Database Wiki Page

For your assignment, create a new wiki page to profile your database. There will be one page per group; both partners will contribute to the same page.

  • Link to your database page from the Class Journal Week 5 page. These pages will be a resource for the class as we move forward with this unit of the course.
  • Link to your database page from your user page.
  • Link from your database page to the Class Journal Week 5 page.
  • Link from your database page to your user pages.

Read the article about the database from the Nucleic Acids Research journal and then go online to the database itself. When you answer the questions below, provide a hyperlink to the page that you got the information from.

  1. What database did you access? (link to the home page of the database)
  2. What is the purpose of the database?
  3. What biological information does it contain?
  4. What species are covered in the database?
  5. What biological questions can it be used to answer?
  6. What type (or types) of database is it (sequence, structure model organism, or specialty [what?]; primary or “meta”; curated electronically, manually [in-house], manually [community])?
  7. What individual or organization maintains the database?
  8. What is their funding source(s)?
  9. Is there a license agreement or any restrictions on access to the database?
  10. How often is the database updated? When was the last update?
  11. Are there links to other databases?
  12. Can the information be downloaded?
    • In what file formats?
  13. Evaluate the “user-friendliness” of the database.
    • Is the Web site well-organized?
    • Does it have a help section or tutorial?
    • Run a sample query. Do the results make sense?

Some Definitions

  • Electronic curation occurs when someone writes a program to add information to a database record from another database.
  • Manual curation occurs when a human reviews the information being added to a record to validate it as true.
    • In-house is when the human works for the database organization.
    • Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.

PowerPoint Presentation

Each group will prepare and give a 10-15 minute PowerPoint presentation based on their chosen database.

  • Four groups will present on Tuesday 10/1 and three groups will present on Thursday, 10/3. The order of presentations will be determined in class on Thursday 9/26.
  • Please follow the Presentation Guidelines for how to format your slides.
  • You will need to prepare ~10-15 slides (assume 1 slide per minute of presentation).
  • You need to present the information you gathered about your database that you listed in your wiki above, but organized as a presentation.
  • You may give a live demo of the database if you wish, but practice carefully so that you can do the presentation in 15 minutes.
    • Alternately, you may choose to show screen shots instead of the live demo.
  • Your PowerPoint slides must be uploaded to the wiki page you created for your database, by midnight Monday/Tuesday, even if your group is scheduled to present on Thursday.
    • You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.
  • Your presentation (both the slides and the oral presentation) will be evalutated by the instructors using the guidelines shown here.
  • Your presentation will also be evaluated by your fellow classmates (anonymously) who will answer the following questions:
    1. What is the speaker's take-home message (one short sentence)?
    2. What are the best points about the presentation's content, organization, clarity of visuals, and presentation style? Please give at least 2 specific examples.
    3. What points need improvement? How would you improve them? Please give at least 2 specific examples.

Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 5 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).

Reflect

The following is a list of core competencies for scientific data literacy. After completing the all of the exercises in this assignment, answer the following questions on the shared Class Journal Week 5 page:

  1. Which of these core competencies (if any) were you familiar with before taking this class? How did you become familiar with them?
  2. Which of these core competencies (if any) did you gain a deeper understanding of by doing this exercise? What about the exercise taught you about them?
  3. Which of these core competencies (if any) do you want to know more about? Why?

Scientific Data Literacy Core Competencies

  1. Databases and Data Formats
    • Understand how to query relational databases, and be familiar with data types and formats for the discipline.
  2. Discovery and Acquisition of Data
    • Locate and utilize disciplinary data repositories, and identify appropriate data sources
  3. Data Management and Organization
    • Understand the lifecycle of data, and use data management plans to track subsets of processed data.
  4. Data Conversion and Interoperability
    • Migrate data from one format to another, and understand the benefits of standard data formats.
  5. Quality Assurance
    • Use metadata and screening procedures to recognize artifacts, incompletion, or corruption of data sets.
  6. Metadata
    • Interpret metadata from external sources, and annotate data so it can be used by external users.
  7. Data Curation and Re-use
    • Recognize the role of curation throughout the data lifecycle in its value in effective reuse of data.
  8. Cultures of Practice
    • Know the practices, values, and norms of discipline as they relate to managing, sharing, and curating data.
  9. Data Preservation
    • Understand the technology, resource, and organizational components of preserving data.
  10. Data Analysis
    • Understand the basic analysis tools of their discipline including workflow management tools.
  11. Data Visualization
    • Use visualization tools of discipline, and understand the advantages of the different types of visualization.
  12. Ethics, including citation of data
    • Understand intellectual property, privacy, and the ethos of the discipline around sharing and citing data.


Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox