Taur.vil Week 5

From LMU BioDB 2013
Jump to: navigation, search

Week 5 Individual Journal

  • Traveled to the uniprot website
  • Queried P00533, page for the epidermal growth factor receptor(EGFR) appeared
  • Found the general information for the entry at the bottom of the page, including name, accension number, entry status, and annotation process.
  • Found the name and origin near the top of the page along with three older names for the protein and a species taxonomy.
  • Found references near the middle of the page (also observed a bar at the top of the page with each section title, a very helpful tool). Currently 84+ references instead of the 19 mentioned in the book.
  • Found the comments section under the general annotations field which included information about enzymatic activity, function, subunit structure, regulation, and involvement in disease.
  • Found the cross-references section which linked to sequence, 3D structure, interactions, PTM, polymorhpism, 2D gel, proteomic, genome annotation, species specific, phylogenetic, gene expression, and family-domain databses.
    • EMBL: I visited the European Nucleotide Archive EMBL-EBI. This database contained the mRNA transcript and protein sequence for EGFR along with general information, taxonomy, and main features of the gene.
    • InterPro: The example of an InterPro site I found for EGFR was also hosted by EMBL-EBI. This database contained a general description of the gene and it's close relatives and identified it as a Cellular Component, directing me to the GO:0016020 membrane page. If it had found any, this database would have also provided information about the domain relationships.
    • PDB: PDB databases provided information for the 3D structure of the protein. Several of these were theoretical structures which were no longer available due to a change in curation procedure (ex 1DNQ. Still available were experimentally determined structures like 1IVO. This database contained a PDB structure file, a 3D viewer for the protein, a summary of structure origin, the quaternary structure of the protein, and a summary of similar structures.
    • Pfam: The Pfam database brought me to a page for furin-like protein families managed by the Sanger Institute. This database provided information on one of the subunits for EGFR, explaining it had a cysteine-rich region similar to furine. This databse linked to other web pages about this type of region and other proteins containing it.
    • RefSeq: This was another genome database, containing information for the EGFR. This server is maintained by the NIH and contained the DNA transcript, a list of publications, information about exon sites and other miscellaneous features coded for in the DNA.
    • GeneID: GeneID is another NIH site that contains a summary information (similar to what was found in UniProt), the genomic location of the gene, commonly occurring polymorphisms, links to articles in pubmed, list of phenotypes and variants, interactions with other proteins, general gene and protein information, related sequences, and links to other pages about the same or similar proteins.
  • Found the keywords under the ontology title (a system that makes little sense). Some of the keywords were ATP-binding, Tumor Suppressor, and Alternative Splicing.
  • In the sequence annotation area (or features), I found information about the regions, natural variations, mutagenesis experiments, natural AA modifications, and secondary structure of the protein.
  • I looked at the other file formats and found that my computer had trouble opening many of them. For those that did open, they were often difficult to read and the normal website proved to be the easiest interface to work with. However, I recognized a lot of the code in the html format and see how that would be useful if I was importing the data into another database or working with it in the putty command line.


During this exercise, I feel like I developed a very basic understanding of how UniProt is structured and learned how to navigate through other databases. I feel like it was very useful to go through each section, but only a very basic level of information was gained. It was a valuable first learning experience, but more in depth work is needed to really understand how the databases work and what they can be used for.


The purpose of this exercise was to become familiar with using biological databases to find information about genes, particularly in UniProt which, looking at other databases and the literature papers, appears to be one of the leading protein databases. One thing I was surprised about was how linked the databases were. UniProt seemed to be less concerned with actually providing information as it was to referring people to original research and more specialized databases. One thing that seems very useful was a list of variants, mutations, and phenotypes that can show how minor changes/mutations to the gene alter the organism. One thing that I really don't understand yet is the individual specialties of the different websites. I saw some information appeared in most of the databases, but some of it seemed more specialized and even appeared differently in the same type of databases (ex: EMBL and RefSeq contained a lot of the same information, but exhibited overlap and specialization). I feel a lot of this confusion will be clarified as I work with these databases more and in a more focused manner.


By Tauras Vilgalys

As part of Biological Databases


Please Remember the Harassing of Deities is Strictly Prohibited

Never Forget Samson

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox