Troque Week 15

From LMU BioDB 2015
Jump to: navigation, search

User Page        Bio Databases Main Page       


Export Information (Re-imported) Build 2

Version of GenMAPP Builder:

  • gmbuilder-3.0.0-build-5

Computer on which export was run:

  • Front of the room, 3rd computer from the right.

Postgres Database name:

  • Shigella_flexneri_20151208

UniProt XML filename (give filename and upload and link to compressed file):

  • UniProt XML version (The version information can be found at the UniProt News Page): UniProt release 2015_11
  • UniProt XML download link: Click here
  • Time taken to import: 4.43 minutes
    • Note:

GO OBO-XML filename (give filename and upload and link to compressed file):

  • GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the GO Download page has been unzipped): Version created on 11/19/2015 (at 2:24 AM)
  • GO OBO-XML download link: Click here to download.
  • Time taken to import: 6.84 minutes
  • Time taken to process: 5.49 minutes
    • Note:

GOA filename (give filename and upload and link to compressed file):

  • GOA version (News on this page records past releases; current information can be found in the Last modified field on the FTP site): Version released on .
  • GOA download link: Click here to download.
  • Time taken to import: 0.06 minutes
    • Note:

Name of .gdb file (give filename and upload and link to compressed file): Sf-Std 20151208.gdb

  • Time taken to export: 1 hour, 38 minutes, 42 seconds
    • Start time: 4:30:59 PM PDT
    • End time: 6:09:41 PM PDT
    • Note:

The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.

Using TallyEngine

  • The database used is the same one described in the section above: Shigella_flexneri_20151208
  • Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.

Shigella flexneri tallyEngine results build 2.png

Using XMLPipeDB match to Validate the XML Results from the TallyEngine

Regex1 OTS.png


Regex2 OTS.png

  • When added together, the results becomes 7566 + 3 = 7569.

Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine

  • The following command in PostGreSQL resulted in 7567 entries:
select value from genenametype where type = 'ordered locus' and value ~ '(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?';
  • The following command resulted in 214 entries:
select value from genenametype where type = 'ORF' and value ~ '(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?';

OriginalRowCounts Comparison

Ms access originalrowcounts.png

  • The OrderedLocusNames row seems to report on the same number of IDs as our previous builds

Visual Inspection

Perform visual inspection of individual tables to see if there are any problems.

  • Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
    • Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.
  • Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
    • Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).

Excel Inspection

Export Information (final)

  • Date: 12/14/15

Name of .gdb file (give filename and upload and link to compressed file): Sf-Std 20151208.gdb

  • Time taken to export: 1 hour, 38 minutes, 42 seconds
    • Start time: 9:35 PM PDT
    • End time: PM PDT

Powerpoint Presentation Meetings

  • Our group met on 12/14/15 in order to complete the slides by midnight.