Running GenMAPP Builder

From LMU BioDB 2013
Revision as of 17:32, 16 June 2015 by Kdahlquist (Talk | contribs)

Jump to: navigation, search

This tutorial will take you through all of the steps for running GenMAPP Builder for the first time.

Contents

Pre-requisites

This tutorial assumes that you are working in a Windows environment. To run GenMAPP Builder under the Mac OS X or Linux, you need to use a Windows virtual machine. The end product, a GenMAPP-compatible Gene Database (.gdb), can only be used with the GenMAPP program, which can only be run on Windows.

The Windows machines in the Keck Lab Annex have all of the software below loaded. If you wish to run GenMAPP Builder and perform the quality control tests on your own computer, you will need to set up your working environment with:

  1. Any tool that can unpack .gz and .zip files
    • We use 7-zip
    • Note that we have found that the native Windows utility cannot reliably unpack .gz files or .zip files containing .jar files.
  2. PostgreSQL on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
    • This tutorial was written using PostgreSQL 9.2.4.
  3. GenMAPP Builder (http://sourceforge.net/projects/xmlpipedb/files/GenMAPP%20Builder/)
    • Requires 32-bit Java JDK or JRE version 6 or higher (http://java.com/en/download/manual_v6.jsp)
    • This particular application may get updates during the project, if groups catch issues with their specific datasets which require changes to the import/export process. Thus, it is worthwhile to know how to download new versions of GenMAPP Builder as needed.
  4. GenMAPP 2 (http://genmapp.org)
    • GenMAPP 2 is now called “GenMAPP Classic” and can be downloaded here.
  5. XMLPipeDB match utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
  6. Microsoft Access or any other tool that can read .mdb files

Download and Extract Data Source Files

Follow these instructions to download UniProt XML and GOA files.

UniProt XML

  1. Go to the UniProt Complete Proteomes page.
  2. Browse to the complete proteome download page for your species of interest. For example, to get to Vibrio cholerae page, first click on the link to "List all Bacteria" under the Complete Proteome heading.
  3. Click through the results until you get to this page.
  4. Click on the link for “complete proteome set” or “complete reference set” for the organism of interest, e.g. Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961).
  5. Click the orange Download link in the upper right-hand corner of the page.
  6. Click to download the complete proteome set in XML format (make sure that you are saving it to your local hard drive).

GOA

  1. Go to the UniProt-GOA Downloads page.
  2. The current and previous UniProt-GOA files can be downloaded from the UniProt-GOA ftp site.
  3. In the directory that appears, click the link to the “proteomes” directory.
  4. Find your organism of interest and right-click on the link to download the GO annotations and select “Save target as” or “Save link as” and save the GOA file. For example, this is the link for Vibrio cholerae.
    • Note: Since the GOA file is a text file, your browser will not automatically download it when you left-click on the link. Instead, it will try to open the file in your browser window. Because it is a large file, this could take a long time if your internet connection is slow.
    • The version information can be found on displayed in the ftp file directory under the “Last modified” column. You should record the version information for “GOA Proteome Sets” and the date they were released as this is your original data source.
  • Note:Current directions are not working. Follow these instructions for your respective species
  • From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
  • Was given an error message. Changed url from "ftp" to "http" at beginning.
  • Was entered, was taken to Index of/pub/database/GO/goa
  • Clicked on "proteomes" folder
  • Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
  • Note: R. meliloti is an alternative name to S. Melitoti.


Connectivity issue 2013-10-21: Direct downloads to the above site are currently not working, for as yet unknown reasons. As a temporary workaround, the specific V. cholerae GOA file has been uploaded to this wiki. For now, download from here instead: 46.V_cholerae_ATCC_39315.goa

GO OBO-XML

  1. Download the GO OBO-XML formatted file from the Gene Ontology download page. Click on the link for obo-xml.gz.
    • Note that Gene Ontology has announced that they are making changes to the page listed above and that users should use a beta page here.
    • This file is updated daily, the time used to be stated near the top of the page under “Current ontology statistics” as Pacific Standard Time (PST), although it does not appear to be there right now. You can get the day/time file was created from the file properties after you have unzipped the file.
  2. Extract the UniProt XML and GO OBO-XML .gz files using 7-zip or other utility.

Create New Database in PostgreSQL

Note: if you have already performed this step and want to use GenMAPP Builder functions with a database you previously created in PostgreSQL, you can skip this step.

These steps might also feel familiar, and you’d be right—you did very similar things for the Week 6 assignment.

  1. Launch pgAdmin III.
  2. Double-click on PostgreSQL 9.2 (localhost:5432) on the upper left hand side of the window.
    • This is the equivalent of connecting you to the server and you may be asked for a password at this point.
  3. Right click on Databases and select New Database...
  4. Give the database a name in the Name field and click OK.
  5. Click on your new database name in the treeview on the left.
  6. Click on the SQL icon in the toolbar at the top of the window.
    • The SQL Editor tab will be open and there may be leftover query text in the upper pane. Delete this text. You are now going to use an XMLPipeDB query to create the tables in the database.
  7. Click on the Open File icon in the toolbar (the yellow folder with an arrow).
  8. Navigate to the folder in which you unzipped GenMAPP Builder.
  9. Open the sql folder and open the file gmbuilder.sql. You should see SQL code appear in the SQL Editor tab.
  10. Click the Execute Query icon which looks like a green “Play” triangle button.
  11. You should get a series of NOTICE messages in the Messages tab at the bottom of the window, concluding with a message like “Query returned successfully with no result in 15583 ms” in the end. This query now created all the tables in the database (although there is still no data in them).
  12. Close the query window (you don’t need to save the query because you have already run it).
  13. To double check that all is OK, click the + sign for the database, then the + sign for Schemas, then finally the + sign for public. Under the Tables section, you should see a count of 159 in parentheses.

Download or Update GenMAPP Builder

  1. Visit the GenMAPP Builder folder on SourceForge (http://sourceforge.net/projects/xmlpipedb/files/GenMAPP%20Builder/).
  2. If you do not yet have GenMAPP Builder, or if there is a more recent version of GenMAPP Builder than the one that you have, click on the release folder for that version.
  3. Download the .zip file for that version of GenMAPP Builder.
  4. Extract the GenMAPP Builder folder using 7-zip or other utility.

Configure GenMAPP Builder to Connect to your PostgreSQL Database

  1. Launch gmbuilder-32bit.bat
    • If the program does not detect a database configuration, you will see a message window to this effect and the configuation dialog will open automatically once you close the message window. Otherwise:
  2. Select the menu item File > Configure Database...
  3. Under the Database Connections tab the Database Driver defaults to PostgreSQL. Enter information in the following fields:
    • Host or address: localhost
    • Port number: 5432
    • Database name: <enter the name of the PostgreSQL database you created above>
    • Username: <enter the username of the PostgreSQL database you created above>
    • Password: <enter the password of the PostgreSQL database you created above>
  4. Click the OK button.

Import Data into the PostgreSQL Database

  1. Select File > Import UniProt XML...
    • Navigate to the UniProt XML file that you extracted previously and click the Open button.
    • This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine. When the process has completed, record the elapsed time from the message window that appears.
  2. Select File > Import GO OBO-XML...
    • Navigate to the GO OBO-XML file that you extracted previously. Click the Open button.
    • This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine. When the process has completed, record the elapsed time from the message window that appears.
  3. Click OK to the message asking you to process the GO data.
    • This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine. When the process has completed, record the elapsed time from the message window that appears.
  4. Select File > Import GOA...
    • Navigate to the GOA file that you downloaded previously and click the Import button. This process should only take a minute or so.

Export a GenMAPP Gene Database (.gdb)

  1. Select File > Export to GenMAPP Gene Database...
  2. Type a name in the Owner field (or else it won’t let you export).
  3. GenMAPP Builder scans your PostgreSQL database to see what species are available. Click on the species that you would like to export, then click Next to continue.
  4. Create GenMAPP Database: click on the Save GenMAPP Database File As... button. A default folder and file name are provided; modify these as needed then click on Save.
  5. Click the Next button. This starts the import process.
    • Record the starting and ending times from the black console window. This will take 1-2 hours for a typical bacterial genome, depending on the size of the database, the processor speed, and available memory. Large eukaryotic genomes (like Arabidopsis thaliana) or genomes with many GO annotations (like Saccharomyces cerevisiae) can take much longer, in the range of 12-24 hours. Note: The progress bar is not accurate.

Check the Quality of your Exported Gene Database

Now you need to check the quality of your exported Gene Database to make sure that all of the data from the XML files made it into the PostgreSQL database and was then exported to the GenMAPP Gene Database. We have created a Gene Database Testing Report Sample to help guide you through this process.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox