Coder

Guild Members

Milestones

Milestone 1

Set up the working environment
- PostgreSQL on Windows (http://www.postgresql.org)
- GenMAPP Builder (http://sourceforge.net/projects/xmlpipedb)
  - Current version is gmbuilder2.0-b71
  - Requires Java 6 or 7 runtime environment
    - Note that you need to use 32-bit Java
- GenMAPP (http://genmapp.org)
  - We will be using GenMAPP and MAPPFinder version 2.1. This software is already installed on the Windows machines in the Keck lab annex and in the Seaver 120 computer lab.
    - This version is now called "GenMAPP Classic" and can be downloaded from this page.
    - Follow the instructions in the installer.
    - During installation, the installer will open a window called the GenMAPP Data Acquisition Tool. It will not function because it cannot connect to the server. This is OK.
- XMLPipeDB match utility for counting IDs in XML files
- Microsoft Access or any other tool that can read .mdb files
Set up the development environment
- Java developer tools: JDK 6 or 7 and ant
- Subversion (for checking code in/out of SourceForge)
- One or more members in group need to get Java running
- Any tool that can unpack .gz and .zip files (we are using 7-zip on the Keck lab Windows machines
- XMLPipeDB Match utility
- Development environment: while any will do, Eclipse is the specific one that most XMLPipeDB developers have used
(with QA) Get a full import-export cycle done.
(with QA) Decide on a file/version management scheme/system.
Document/log all work done, problems encountered, and how they were resolved.

Milestone 2

Get a SourceForge account and pass it to Dr. Dionisio so that you can be added as a developer of the XMLPipeDB project.
- Once you are set up as a developer, you can download the GenMAPP Builder source code.
- You can then use ant to build your own copy of GenMAPP Builder from scratch.
- As needed, coders should arrange for a walkthrough or other help session with Dr. Dionisio.
Additional milestones will depend on how the rest of the project goes, and the bugs/features generated by that work.

Milestone 3

Add a species profile to the GenMAPP Builder code base.
Customize the species profile with the species name in the OrderedLocusNames record of the Systems table.
Customize the Link field in the OrderedLocusNames record of the Systems table to hold a URL query with ~ standing in for the gene ID.
- The URL would need to be determined first, of course.
Based on observations from the GenMAPP User and QA, determine and document (as thoroughly as possible) any other modified export behavior that GenMAPP Builder will have to manifest for this species.
(probably more like milestone 4 than 3) Implement this export behavior.
When ready, commit your species profile to SourceForge and release a new version of GenMAPP Builder.

GenMAPP Builder Project Setup and Initial Build

This section of the page seeks to provide a guide for building new versions of GenMAPP Builder.

While there are many ways to update and maintain GenMAPP Builder code, for uniformity these instructions will assume the use of Eclipse for viewing, modifying, and updating GenMAPP Builder. The main benefit of Eclipse is that it is largely a one-stop shop for performing all of these tasks.

The instructions listed in this Setup section need only be performed once. Once done correctly, you will primarily be doing what is described in the Common Tasks section.

Software to Install

While you can only run GenMAPP Builder on Windows, you can build it from any platform: Windows, Linux, or Mac OS X. If you are using Windows, you need to download and install the Java Development Kit from http://www.oracle.com/technetwork/java/javase/downloads. You will want the Java Platform, Standard Edition JDK (which, at this writing, is JDK 7u45). Linux and Mac OS X computers typically already come with JDK included.
Download and install Eclipse from its download web site. Either Eclipse IDE for Java Developers or Eclipse IDE for Java EE Developers will work.
Download and install the Subclipse plug-in for Eclipse (this is the software that is needed for accessing the source code in SourceForge). Instructions for downloading and installing Subclipse can be found on the Subclipse home page.

Initial Code Checkout

Get an account from http://sourceforge.net and send your account name to Dr. Dionisio, for inclusion as an XMLPipeDB developer.
Run Eclipse.
Go to Eclipse’s menu bar’s Window > Open Perspective and choose SVN Repository Exploring either from the Open Perspective submenu if it is there, or from Other.... If you don't see this, double-check your installation of Subclipse.
Define a new Subversion repository by clicking on the Add Repository button (this is the icon with the little yellow canister with small SVN and + badges to its right).
Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then click Finish.
- Remember to accept the security certificate for the SVN repository
https://svn.code.sf.net/p/xmlpipedb/code should now appear in the list. Double-click on it to see its contents.
Double-click on trunk.
Right-click on gmbuilder and choose Checkout....
Choose Check out as a project configured using the New Project Wizard then click FInish.
If you are asked for a username and password, enter your SourceForge username and password.
In the New Project dialog that opens, choose Java Project from the list.
Click Next > .
You may enter any Project name: that you like. gmbuilder or xmlpipedb-gmbuilder isn’t bad, for example.
Click Finish. You should end up in the Java perspective, with your project appearing as a top-level folder in the Package Explorer tab.

Java Project Setup

Make sure that Eclipse is using a JDK (Java Development Kit) and not a JRE (Java Runtime Environment). To verify this, go to Window > Preferences, click on Java, click on Installed JREs, and make sure that the checked environment has JDK in it. If not, you may need to add the environment (on Windows, it lives in C:\Program Files\Java) then check on it.
Upon a successful checkout, you should have a gmbuilder Java project in Eclipse. Make sure that you are in the Java perspective by choosing Window > Open Perspective > Java (or choosing Java from Other... if Java is not already in the Open Perspective submenu).
Double-click on the gmbuilder folder (or whatever you called it) to view its contents.
Double-click on the lib folder.
Shift-click and select all files in the lib folder and then control-click (or, on a Mac, Command-click) on every file inside the lib folder whose name does not end in .jar.
Once all of these files are selected, right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
The src folder should look different from the other folders in that it has a little brown square badge on its upper-right corner. If not, right-click on it and choose Build Path > Use as Source Folder from the popup menu that comes up.
Do the same to the test folder: right-click on it and choose Build Path > Use as Source Folder from the popup menu that comes up.
If you see any red x icons appear, something has not been set up right. Contact other guild members or Dr. Dionisio for troubleshooting if you get stuck.

Adding a Species Profile to GenMAPP Builder

All of this work happens in the Java perspective, so switch to that first if you’re not already there.

Create the Species Profile

Expose the contents of the src folder.
Right-click on the edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles package and choose New > Class from the popup menu.
In the dialog that appears, enter the following:
- Name: name-of-your-species-without-spacesUniProtSpeciesProfile (no spaces, capitalizing the first letters of each word)
- Superclass: edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtSpeciesProfile (you can also click on Browse... to navigate to this if you don’t feel like typing)
Click Finish. There should now be a new .java file within the edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles package (the one you just created).

Customize the Species Profile

Open the file that you have just created. It should appear in the editor area of Eclipse.
Override the method that supplies the name of the species and the description of the profile: add the following constructor block right below the public class line in the new file. Remember to customize according to your particular species; the portions that need to be customized are highlighted in asterisks.

public ***NameOfYourSpecies***UniProtSpeciesProfile() {
    super("***Genus species***",
        ***taxonIDOfYourSpecies***,
        "This profile customizes the GenMAPP Builder export for " +
            "***Genus species***" +
            " data loaded from a UniProt XML file.");
}

To customize the species profile with the species name in the OrderedLocusNames record of the Systems table as well as a link query for that same record, add the following method block right below the constructor block that you added above. Again, the key information to customize is highlighted in asterisks.

@Override
public TableManager getSystemsTableManagerCustomizations(TableManager tableManager, DatabaseProfile dbProfile) {
    super.getSystemsTableManagerCustomizations(tableManager, dbProfile);
    tableManager.submit("Systems", QueryType.update, new String[][] {
        { "SystemCode", "N" },
        { "Species", "|" + getSpeciesName() + "|" }
    });

    tableManager.submit("Systems", QueryType.update, new String[][] {
        { "SystemCode", "N" },
        { "Link", "***species-specific-database-link***" }
    });

    return tableManager;
}

Note the species-specific-database-link placeholder above. This is a species-specific URL that returns a web page describing a gene for that species. It should look like a standard URL, with the tilde (~) standing in for the gene ID. For example, the link for Vibrio cholerae is http://cmr.jcvi.org/tigr-scripts/CMR/shared/GenePage.cgi?locus=~. The link for Plasmodium falciparum is http://plasmodb.org/plasmo/showRecord.do?name=GeneRecordClasses.GeneRecordClass&project_id=PlasmoDB&source_id=~. Work with your GenMAPP User and/or QA to determine the appropriate URL for your species.
Your code may have a red error badge at this point; assuming you typed everything in exactly, the fix for this is to choose Organize Imports from the Source menu. If the red error badge persists, make sure that you typed everything in correctly.
Save the file and see if these changes worked (see below).

Additional customization, particularly with regard to the exported data, will depend on the species. Communicate with your QA to see if additional customization is needed. If the additional customization is not too complicated, you might be able to do the work yourself with some instructions. However, if the customization is too difficult, Dr. Dionisio will probably be the one to do the work.

Customize the IDs that the Tally Engine Counts

This step is technically optional, in that it does not affect the overall import/export process. However, it does help you to get an idea of how well the IDs from the UniProt XML file are being brought into the relational database.

First, determine which IDs (outside of the defaults that the tally engine already counts) you would like to count. At a minimum, this includes at least the ordered locus IDs from the gene/name tag in the UniProt XML file. There may be more; consult with your QA.
For each of these IDs, determine the following:
- Where in the XML file they can be found, in terms of which XML tags
- Where in the relational database they can be found, in terms of which relational tables
Under edu.lmu.xmlpipedb.gmbuilder.resource.properties, open gmbuilder.properties.
Locate the block of text below (it’s near the bottom). You will insert the customizations that will be described right above this block.

#
# wizard.properties
#

First, mark out the section that denotes the customization for your species:

# Species name

Next, rewrite your species name without spaces and all lowercase (e.g., Plasmodium falciparum becomes plasmodiumfalciparum). Specify the number of additional custom IDs to count as follows, where speciesname is your no-space, all-lowercase species name, and # represents the actual number of IDs:

speciesname_level_amount=#

Now, for each custom ID, you need to specify three things: an element, a query, and a name. Each of these items is numbered, starting from 0. Each item number is called a level.
1. The element states where you expect an ID to be found in the UniProt XML file. It starts with uniprot/entry, then continues with additional tags as needed. After the tag, you may specify, separated by ampersands (&s), any specific attributes that you would like to choose.
2. The query states the SQL query that you would use to count the IDs in the relational database. The query would be exactly as you would type it if you were entering it directly into the relational database.
3. The name is a simple label: this is how you would like to identify this ID in the final Tally Engine table.
You can write these in any order, though existing customizations group them by element, query, and name. For example, if your species is speciesname and you only need to count ordered locus IDs, you would add:

# Species name
speciesname_level_amount=1

speciesname_element_level0=uniprot/entry/gene/name&type&ordered locus

speciesname_query_level0=select count(*) from genenametype where type = 'ordered locus';

speciesname_query_level0=Ordered Locus

Note how the element ends with name&type&ordered locus, because the name tag in the UniProt XML file will have different types (e.g., “primary”, “ORF”, “synonym”, “ordered locus”, etc.). For ordered locus IDs, we only want to count the name IDs whose type is “ordered locus”.

Once you are done with these customizations, you can test your work by building a new version of GenMAPP Builder, connecting to a relational database that already has imported data (or importing data first if needed), then running the Tally Engine. The resulting table should include, in addition to the defaults that you have seen before, the new IDs that you have added.

Add the Species Profile to the Catalog of Known Species Profiles

The last step involves actually making GenMAPP Builder know that your new species profile exists. This involves a change in an existing file:

Under edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles, open UniProtDatabaseProfile.java.
Near the top of the file is a block that looks like this:

super("org.uniprot.uniprot.Uniprot",
    "This profile defines the requirements "
        + "for any UniProt centric gene database.",
    new SpeciesProfile[] {
    new EscherichiaColiUniProtSpeciesProfile(),
    new ArabidopsisThalianaUniProtSpeciesProfile(),
    new PlasmodiumFalciparumUniProtSpeciesProfile(),
    new VibrioCholeraeUniprotSpeciesProfile() });

What you want to do is add the species profile that you just created to this block. If your species profile is called MySpecialUniProtSpeciesProfile, your modified code should look like this:

super("org.uniprot.uniprot.Uniprot",
    "This profile defines the requirements "
        + "for any UniProt centric gene database.",
    new SpeciesProfile[] {
    new EscherichiaColiUniProtSpeciesProfile(),
    new ArabidopsisThalianaUniProtSpeciesProfile(),
    new PlasmodiumFalciparumUniProtSpeciesProfile(),
    new VibrioCholeraeUniprotSpeciesProfile(),
    new MySpecialUniProtSpeciesProfile() });

Essentially, you need to add an item to the comma-separated list, beginning with new, followed by the species profile name, finally followed by ().
Save your changes, do Organize Imports to eliminate any red errors, and try a test build!

Build, Test, and Possibly Commit

Create a new distribution of GenMAPP Builder based on Creating a Distribution.
Perform a new export run with this version of GenMAPP Builder (you can skip the import steps and use the same PostgreSQL database if it’s available).
Check the Systems table in the resulting .gdb to see if it contains the custom information:
- Open the .gdb in Microsoft Access, then open the Systems table.
- Look for the record for OrderedLocusNames. Your species name should appear under the Species column and your link URL should appear under the Link column.
If all goes well, commit your code as described in Updating and Committing Code. You have now officially contributed to the XMLPipeDB project :)

Common Tasks

The tasks in this section reflect the typical development cycle.

Updating and Committing Code

Right-click on the gmbuilder project folder and choose Team Synchronize... from the popup menu.
You will be switched to the Team Synchronization perspective.
The presence of blue-arrowed files means that the server has new updates for you to download. Right-click on the gmbuilder project folder and choose Update from the popup menu.
It is good “developer etiquette” to build a new distribution from scratch when you’ve received updates prior to committing your own changes. Thus, after the update, return to the Java perspective, do a build.xml > clean followed by a build.xml > dist.
If everything works out, do Team Synchronize... again. If there are new updates (in the tiny amount of time since you last updated!), test things again.
Eventually, you will see a Team Synchronize... with no incoming code. At this point, go ahead and commit the gray-arrowed files by right-clicking on them and choosing Commit....
Just like with the wiki, it is good developer etiquette to describe briefly the nature of the changes that you are committing.
Even if you have nothing to commit, it is still a good idea to invoke Team Synchronize... regularly so that you are kept up-to-date with regard to files that others may be committing.

Creating a Distribution

To create your own version of GenMAPP Builder based on the code you have in Eclipse (which may contain some new changes/customizations that you would like to test), follow these steps:

Switch to Eclipse’s Java perspective.
Within the gmbuilder Java project is a file called build.xml. It should have an icon that appears to include an ant.
Right click on build.xml and choose Run As > Ant Build... (the one with the ellipses) from the popup menu that appears.
In the Edit Configuration dialog that appears, check on the clean and dist items in the Targets tab. The Target execution order section near the bottom of the dialog should say clean, dist.
Click the Run button. The computer will work for a bit.
When it is done, right-click on the gmbuilder project folder and choose Refresh (F5 is its keyboard shortcut).
You should see a dist folder appear inside the gmbuilder project folder.
This is your personally-built copy of GenMAPP Builder. Its contents correspond to the extracted contents of the gmbuilder-2.0b59.zip file that was downloaded in class.
Run pgAdmin III and start a database, then run this copy of GenMAPP Builder as you would the “released” copy. The program should behave just like the one that you downloaded and have been using.

Releasing to SourceForge

Build a new version of GenMAPP Builder (ant dist).
Rename the dist folder as gmbuilder-2.0b## where ## is the next available version number.
Compress the gmbuilder-2.0b## folder as a .zip file.
Login to SourceForge.
Go to the xmlpipedb project site.
Go to file releases.
Create a folder for the new release.
Upload the .zip into the folder.
Upload the release notes.
Mark the release notes as belonging to the .zip.
Mark the current release as the default Windows download by selecting the small i button next to .zip.

Coder

Contents