Lena Project Notebook

From LMU BioDB 2013
Jump to: navigation, search

Contents

Week 12

Gabriel and I performed an import/export cycle on 11/14/2013.

Export Information

Uniprot: 7.12 minutes
Version: UniProt release 2013_10 - October 16, 2013
File:UniprotXML Leishmania 05112013 Gabe Lena.xml
GO OBO: 6.32 minutes
Version: Monday, ‎November ‎04, ‎2013, ‏‎2:03:38 AM
File:Leishmania 05112013 Gabe Lena.obo-xml.gz
GOA: 4.54 minutes
Version: 14 November, 2013
12-Nov-2013 11:47 3.0M
http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/21780.L_major.goa
File:LeishmaniaGOA 19112013 Lena Gabe.goa
Name of .gdb file
Leishmania_05112013_Lena_Gabe.gdb
File:Leishmania 05112013 Lena Gabe.gdb

Tally Engine

Screen shot 2013-11-07 at 10.49.17 AM.png

Using XMLPipeDB match to Validate the XML Results from the TallyEngine

Original Row Counts Comparison

  • Uniprot has 8041 which is the same the tallycount.
  • There were 0 ordered locus, which is the same as the tallycount.
  • There were 8315 hits for RefSeq, which is 2 fewer than was seen in the tallycount.
  • There were 8315 hits for GeneID, which is 2 fewer than was seen in the tallycount.

Note:Leishmania major does not have "ordered locus names," instead they are tagged as "ORF."

File Management

  • To keep the names of our files consistent, we agreed to name files with 1.) an identifier of what the file contains (such as "Uniprot") 2.) Leishmania 3.) the date in format MMDDYYYY 4.) names of Team members.
  • An example of our file titles is: UniprotXML Leishmania 05112013 Gabe Lena.xml
  • All files are stored on the main page for Leishmania major for ease of access. On the computer, the files are stored in the Downloads file.

System IDs

OrderedLocusPattern: LmjF##.#### or LmjF_##_#### or LmjF.##.####
Taxon ID: 5664

Week 13

  • I updates the export files. The files now have the proper identification information to be found again. The GOA file had to re-downloaded to a more updated version and now has to be imported.
  • The reason why no Ordered Locus Names turned up on the tally engine is that Leishmania major's Ordered Locus Names are tagged with ORF instead. Gabe re-coded to account for this.
  • Built new database called Leishmania_major_18112013 and ran a new import/export cycle with the updated files.
  • My target for this week was to get to know System IDs, and characterize regular expression patterns to detect the IDs. I found ID pattern to be: LmjF##.#### or LmjF_##_#### or LmjF.##.####
  • A customized database was built for Leishmania major File:Leishmania major 19112013 Dist.zip

Export Information

Uniprot: 7.42 minutes
Version: UniProt release 2013_10 - October 16, 2013
File:UniprotXML Leishmania 05112013 Gabe Lena.xml
GO OBO: 5.96
Gene Ontology Processing: 4.48 minutes
Version: Monday, ‎November ‎04, ‎2013, ‏‎2:03:38 AM
File:Leishmania 05112013 Gabe Lena.obo-xml.gz
  • NOTE:When we went to import this file we got an error message. The file that should be used was called Leishmania_05112013_Gabe_Lena.obo-xml. This file type is not uploadable to the wiki, and so only the zipped version is available. Just remember to use unzipped version for uploads.
GOA: 0.04
Version: 14 November, 2013
12-Nov-2013 11:47 3.0M
http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/21780.L_major.goa
File:LeishmaniaGOA 19112013 Lena Gabe.goa
Name of .gdb file: LeishmaniaGDB_19112013_LenaLGabe/gdb
Leishmania_05112013_Lena_Gabe.gdb
File:Leishmania 05112013 Lena Gabe.gdb

11/21/13: The new database was labeled as generic when we tried to open it before. Today we tried to open a OrderedLocusName in GenMAPP but the gene was not found. The database has to be re-coded and re-exported. I edited the database in eclipse. When we exported to GenMAPP it finally recognized the database. We saved as Media:LeishmaniaGDB 21112013 Lena Gabe.gdb.

Week 14

  • Configured Leishmania_major-18112013 in GenMAPP Builder, and ran Tally engine.

Tally Engine

Capture.PNG

  • Finally, the numbers are matching, but the Ordered Locus names are still missing. This may be because Ordered Locus Names are called ORFs for Leishmania major.

XMLpipedb Match

  • We XML pipedb Match query to see if we could find the missing terms. "ORF" yielded 40 results; "ordered locus name" yielded 0 results.

Leishmania Postgres Query ORF 26112013 1.PNG

  • Used Match in the command line. Found 33 matched to lmjf_##_####. Now we have to figure how to get the computer to get either an underscore or period in the name.

Commandline leishmania.png

Week 15

GenMAPP Expression Dataset Manager

Database: LeishmaniaGDB_26112013_Lena_Gabe.gdb
File: LeishmaniaCompiledStatAnalysis(A).txt
Errors: 14,000 and counting
Started over and cleaned up the gene IDs in the excel report so that the names followed the pattern. Ran Expression Dataset Manager again.
Errors: 1820
Had to tweak the database with new customizations and re-export.
8354 Ordered Locus names in Microsoft Access. All IDS now have two formats LmjF.##.#### and LmjF_##_####.
Reran Expression Dataset Manager and found 1820 errors.
The exceptions file was posted to the Leishmania wiki page

Tally Engine

Links

Lena Hunt
Leishmania major
Quality Assurance
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox