Difference between revisions of "Coder"

Revision as of 01:42, 2 November 2015

Gene Database Project Links
Overview	Deliverables	Reference Format	Guilds	Project Manager	GenMAPP User	Quality Assurance	Coder
Overview	Deliverables	Reference Format	Teams	Heavy Metal HaterZ	The Class Whoopers	GÉNialOMICS	Oregon Trail Survivors

The coder is the resident expert on the technology being used—assorted software, file management, version control, some troubleshooting, some programming. He or she coordinates with Drs. Dahlquist and Dionisio in extending GenMAPP Builder code and making new versions. GenMAPP Builder is written in Java and is built on open source pure-Java libraries. Source code is hosted on GitHub and built using Apache’s ant utility.

Guild Members

Species 1:
Species 2:
Species 3:
Species 4:

Milestones

Milestone 0: Working Environment Setup

Because the machines in the Seaver 120 computer lab have already been set up for this process, the information below is listed primarily for documentation and troubleshooting purposes.

PostgreSQL (http://www.postgresql.org)
GenMAPP Builder (https://github.com/lmu-bioinformatics/xmlpipedb)
- Current version is 3.0.0 build 5
- Requires Java 8 Runtime Environment
GenMAPP (http://genmapp.org)
- We will be using GenMAPP and MAPPFinder version 2.1.
  - This version is now called "GenMAPP Classic" and can be downloaded from GitHub.
  - Follow the instructions in the installer.
  - During installation, the installer will open a window called the GenMAPP Data Acquisition Tool. It will not function because it cannot connect to the server. This is OK.
XMLPipeDB match utility for counting IDs in XML files
Microsoft Access or any other tool that can read .mdb files

Milestone 1: Version Control Setup

Get a GitHub account and pass it to Dr. Dionisio so that you can be added as a developer of the XMLPipeDB project on GitHub.
- Once you are set up as a developer, you can clone and push your GenMAPP Builder source code.
Create a GitHub branch of xmlpipedb for your team.
- The easiest way to do this is via the Branch dropdown menu on the GitHub project website for XMLPipeDB.
(with QA) Commit and push relevant source data to the GenMAPP Gene Databases folder of your GitHub branch
- You can always verify what is publicly visible on your branch by visiting the XMLPipeDB GitHub website, choosing your branch from the Branch dropdown menu, then inspecting the code that is visible there.

Milestone 2: “Developer Rig” Setup and Initial As-Is Build

Install core software for developing, building, and testing prototype versions of GenMAPP Builder:
- Java developer tools: JDK 8 (which, at this writing, is JDK 8u65)
- A git client (for interacting with GitHub)
- Any tool that can unpack .gz and .zip files (we are using 7-zip on the Seaver 120 machines)
- XMLPipeDB Match utility
- Development environment: while any will do, Eclipse is the specific one that most XMLPipeDB developers have used:
  1. Download and install Eclipse from its download web site. Either Eclipse IDE for Java Developers or Eclipse IDE for Java EE Developers will work.
  2. Eclipse includes ant so you do not need a separate ant installation unless you plan to build GenMAPP Builder outside of Eclipse
  3. If you want to use ant outside Eclipse, please visit http://ant.apache.org.
Follow the instructions in the GenMAPP Builder Project Setup and Initial Build section of this wiki page in order to:
- Set up a functioning Eclipse development environment for your branch of GenMAPP Builder.
- Build your own copy of GenMAPP Builder from scratch.
(with QA) Get a full import-export cycle done.
(with QA) Decide on a file/version management scheme/system.

As needed, coders may arrange for a walkthrough or other help session with Dr. Dionisio if there are any issues with the procedures on this guild page.

Milestone 3: Species Profile Creation

Add a species profile to the GenMAPP Builder code base.
Customize the species profile with the species name in the OrderedLocusNames record of the Systems table.
Customize the Link field in the OrderedLocusNames record of the Systems table to hold a URL query with ~ standing in for the gene ID.
- (with QA) The URL would need to be determined first, of course.

Milestone 4: Species Export Customization

Based on observations from the GenMAPP User and QA, determine and document (as thoroughly as possible) any other modified export behavior that GenMAPP Builder will have to manifest for this species.
Implement this export behavior.
As needed, commit and push your work to your GitHub branch.
Additional milestones will depend on how the rest of the project goes, and the bugs/features generated by that work.
Document/log all work done, problems encountered, and how they were resolved.
When your work is complete, issue a GitHub pull request to merge your branch into the main development line.

GenMAPP Builder Project Setup and Initial Build

This section of the page seeks to provide a guide for building new versions of GenMAPP Builder. You can only run GenMAPP and MAPPFinder on Windows, but you can build and run GenMAPP Builder on any platform that supports PostgreSQL and JDK 8.

Although there are many ways to update and maintain GenMAPP Builder code, for uniformity these instructions will assume the use of Eclipse for viewing, modifying, and updating GenMAPP Builder. The main benefit of Eclipse is that it is largely a one-stop shop for performing all of these tasks.

The instructions listed in this Setup section need only be performed once. Once done correctly, you will primarily be doing what is described in the Common Tasks section.

Prerequisites

Make sure that you have already accomplished the version control setup milestone (Milestone 1).
Make sure that you have already downloaded and installed the software mentioned in Milestone 2 (first item).

GitHub Repository Clone Setup

Determine the desired location (on your development computer) for your local copy of the XMLPipeDB GitHub repository.
cd to this location.
Clone the repository:

git clone https://github.com/lmu-bioinformatics/xmlpipedb.git

Eclipse Workspace Setup

Run Eclipse.
Establish an Eclipse workspace for the XMLPipeDB repository.

Java Project Setup

Make sure that Eclipse is using a JDK (Java Development Kit) and not a JRE (Java Runtime Environment). To verify this, go to Window > Preferences, click on Java, click on Installed JREs, and make sure that the checked environment has JDK in it. If not, you may need to add the environment (on Windows, it lives in C:\Program Files\Java) then check on it.
Upon a successful checkout, you should have a gmbuilder Java project in Eclipse. Make sure that you are in the Java perspective by choosing Window > Open Perspective > Java (or choosing Java from Other... if Java is not already in the Open Perspective submenu).
Double-click on the gmbuilder folder (or whatever you called it) to view its contents.
Double-click on the lib folder.
Shift-click and select all files in the lib folder and then control-click (or, on a Mac, Command-click) on every file inside the lib folder whose name does not end in .jar.
Once all of these files are selected, right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
The src folder should look different from the other folders in that it has a little brown square badge on its upper-right corner. If not, right-click on it and choose Build Path > Use as Source Folder from the popup menu that comes up.
Do the same to the test folder: right-click on it and choose Build Path > Use as Source Folder from the popup menu that comes up.
If you see any red x icons appear, something has not been set up right. Contact other guild members or Dr. Dionisio for troubleshooting if you get stuck.

Adding a Species Profile to GenMAPP Builder

All of this work happens in the Java perspective, so switch to that first if you’re not already there.

Create the Species Profile

Expose the contents of the src folder.
Right-click on the edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles package and choose New > Class from the popup menu.
In the dialog that appears, enter the following:
- Name: name-of-your-species-without-spacesUniProtSpeciesProfile (in camel case: no spaces, capitalizing the first letters of each word)
- Superclass: edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtSpeciesProfile (you can also click on Browse... to navigate to this if you don’t feel like typing)
Click Finish. There should now be a new .java file within the edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles package (the one you just created).

Customize the Species Profile

Open the file that you have just created. It should appear in the editor area of Eclipse.
Override the method that supplies the name of the species and the description of the profile: add the following constructor block right below the public class line in the new file. Remember to customize according to your particular species; the portions that need to be customized are highlighted in asterisks.

public ***NameOfYourSpecies***UniProtSpeciesProfile() {
    super("***Genus species***",
        ***taxonIDOfYourSpecies***,
        "This profile customizes the GenMAPP Builder export for " +
            "***Genus species***" +
            " data loaded from a UniProt XML file.");
}

To customize the species profile with the species name in the OrderedLocusNames record of the Systems table as well as a link query for that same record, add the following method block right below the constructor block that you added above. Again, the key information to customize is highlighted in asterisks.

@Override
public TableManager getSystemsTableManagerCustomizations(TableManager tableManager, DatabaseProfile dbProfile) {
    super.getSystemsTableManagerCustomizations(tableManager, dbProfile);
    tableManager.submit("Systems", QueryType.update, new String[][] {
        { "SystemCode", "N" },
        { "Species", "|" + getSpeciesName() + "|" }
    });

    tableManager.submit("Systems", QueryType.update, new String[][] {
        { "SystemCode", "N" },
        { "Link", "***species-specific-database-link***" }
    });

    return tableManager;
}

Note the species-specific-database-link placeholder above. This is a species-specific URL that returns a web page describing a gene for that species. It should look like a standard URL, with the tilde (~) standing in for the gene ID. For example, the link for Vibrio cholerae is http://bacteria.ensembl.org/Multi/Search/Results?species=all;idx=;q=~;site=ensemblunit. The link for Plasmodium falciparum is http://plasmodb.org/plasmo/showRecord.do?name=GeneRecordClasses.GeneRecordClass&project_id=PlasmoDB&source_id=~. Work with your GenMAPP User and/or QA to determine the appropriate URL for your species.
Your code may have a red error badge at this point; assuming you typed everything in exactly, the fix for this is to choose Organize Imports from the Source menu. If the red error badge persists, make sure that you typed everything in correctly.
Save the file and see if these changes worked (see below).

Additional customization, particularly with regard to the exported data, will depend on the species. Communicate with your QA to see if additional customization is needed. If the additional customization is not too complicated, you might be able to do the work yourself with some instructions. However, if the customization is too difficult, Dr. Dionisio will probably be the one to do the work.

Customize the IDs that the Tally Engine Counts

This step is technically optional, in that it does not affect the overall import/export process. However, it does help you to get an idea of how well the IDs from the UniProt XML file are being brought into the relational database.

First, determine which IDs (outside of the defaults that the tally engine already counts) you would like to count. At a minimum, this includes at least the ordered locus IDs from the gene/name tag in the UniProt XML file. There may be more; consult with your QA.
For each of these IDs, determine the following:
- Where in the XML file they can be found, in terms of which XML tags
- Where in the relational database they can be found, in terms of which relational tables
Under edu.lmu.xmlpipedb.gmbuilder.resource.properties, open gmbuilder.properties.
Locate the block of text below (it’s near the bottom). You will insert the customizations that will be described right above this block.

#
# wizard.properties
#

First, mark out the section that denotes the customization for your species:

# Species name

Next, rewrite your species name without spaces and all lowercase (e.g., Plasmodium falciparum becomes plasmodiumfalciparum). Specify the number of additional custom IDs to count as follows, where speciesname is your no-space, all-lowercase species name, and # represents the actual number of IDs:

speciesname_level_amount=#

Now, for each custom ID, you need to specify three things: an element, a query, and a name. Each of these items is numbered, starting from 0. Each item number is called a level.
1. The element states where you expect an ID to be found in the UniProt XML file. It starts with uniprot/entry, then continues with additional tags as needed. After the tag, you may specify, separated by ampersands (&s), any specific attributes that you would like to choose.
2. The query states the SQL query that you would use to count the IDs in the relational database. The query would be exactly as you would type it if you were entering it directly into the relational database.
3. The name is a simple label: this is how you would like to identify this ID in the final Tally Engine table.
You can write these in any order, though existing customizations group them by element, query, and name. For example, if your species is speciesname and you only need to count ordered locus IDs, you would add:

# Species name
speciesname_level_amount=1

speciesname_element_level0=uniprot/entry/gene/name&type&ordered locus

speciesname_query_level0=select count(*) from genenametype where type = 'ordered locus';

speciesname_query_level0=Ordered Locus

Note how the element ends with name&type&ordered locus, because the name tag in the UniProt XML file will have different types (e.g., “primary”, “ORF”, “synonym”, “ordered locus”, etc.). For ordered locus IDs, we only want to count the name IDs whose type is “ordered locus”.

Once you are done with these customizations, you can test your work by building a new version of GenMAPP Builder, connecting to a relational database that already has imported data (or importing data first if needed), then running the Tally Engine. The resulting table should include, in addition to the defaults that you have seen before, the new IDs that you have added.

Add the Species Profile to the Catalog of Known Species Profiles

The last step involves actually making GenMAPP Builder know that your new species profile exists. This involves a change in an existing file:

Under edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles, open UniProtDatabaseProfile.java.
Near the top of the file is a block that looks like this:

super("org.uniprot.uniprot.Uniprot",
    "This profile defines the requirements "
        + "for any UniProt centric gene database.",
    new SpeciesProfile[] {
    new EscherichiaColiUniProtSpeciesProfile(),
    new ArabidopsisThalianaUniProtSpeciesProfile(),
    new PlasmodiumFalciparumUniProtSpeciesProfile(),
    new VibrioCholeraeUniprotSpeciesProfile() });

What you want to do is add the species profile that you just created to this block. If your species profile is called MySpecialUniProtSpeciesProfile, your modified code should look like this:

super("org.uniprot.uniprot.Uniprot",
    "This profile defines the requirements "
        + "for any UniProt centric gene database.",
    new SpeciesProfile[] {
    new EscherichiaColiUniProtSpeciesProfile(),
    new ArabidopsisThalianaUniProtSpeciesProfile(),
    new PlasmodiumFalciparumUniProtSpeciesProfile(),
    new VibrioCholeraeUniprotSpeciesProfile(),
    new MySpecialUniProtSpeciesProfile() });

Essentially, you need to add an item to the comma-separated list, beginning with new, followed by the species profile name, finally followed by ().
Save your changes, do Organize Imports to eliminate any red errors, and try a test build!

Build, Test, and Possibly Commit

Create a new distribution of GenMAPP Builder based on Creating a Distribution.
Perform a new export run with this version of GenMAPP Builder (you can skip the import steps and use the same PostgreSQL database if it’s available).
Check the Systems table in the resulting .gdb to see if it contains the custom information:
- Open the .gdb in Microsoft Access, then open the Systems table.
- Look for the record for OrderedLocusNames. Your species name should appear under the Species column and your link URL should appear under the Link column.
If all goes well, commit your code as described in Updating and Committing Code. You have now officially contributed to the XMLPipeDB project :)

Common Tasks

The tasks in this section reflect the typical development cycle.

Updating and Committing Code

Right-click on the gmbuilder project folder and choose Synchronize Repository... from the popup menu.
You will be switched to the Team Synchronization perspective.
The presence of blue-arrowed files means that the server has new updates for you to download. Right-click on the gmbuilder project folder and choose Update from the popup menu.
It is good “developer etiquette” to build a new distribution from scratch when you’ve received updates prior to committing your own changes. Thus, after the update, return to the Java perspective, do a build.xml > clean followed by a build.xml > dist.
If everything works out, do Team Synchronize... again. If there are new updates (in the tiny amount of time since you last updated!), test things again.
Eventually, you will see a Team Synchronize... with no incoming code. At this point, go ahead and commit the gray-arrowed files by right-clicking on them and choosing Commit....
Just like with the wiki, it is good developer etiquette to describe briefly the nature of the changes that you are committing.
Even if you have nothing to commit, it is still a good idea to invoke Team Synchronize... regularly so that you are kept up-to-date with regard to files that others may be committing.

Creating a Distribution

To create your own version of GenMAPP Builder based on the code you have in Eclipse (which may contain some new changes/customizations that you would like to test), follow these steps:

Switch to Eclipse’s Java perspective.
Edit the GenMAPPBuilder.java source code to identify the distribution that you are about to create by setting the VERSION string (located at approximately line 83) to a sufficiently descriptive value.
Within the gmbuilder Java project is a file called build.xml. It should have an icon that appears to include an ant.
Right click on build.xml and choose Run As > Ant Build... (the one with the ellipses) from the popup menu that appears.
In the Edit Configuration dialog that appears, check on the clean and dist items in the Targets tab. The Target execution order section near the bottom of the dialog should say clean, dist.
Click the Run button. The computer will work for a bit.
When it is done, right-click on the gmbuilder project folder and choose Refresh (F5 is its keyboard shortcut).
You should see a dist folder appear inside the gmbuilder project folder.
This is your personally-built copy of GenMAPP Builder. Its contents correspond to the extracted contents of the gmbuilder-3.0.0-build-5.zip file that was downloaded in class.
Run pgAdmin III and start a database, then run this copy of GenMAPP Builder as you would the “released” copy. The program should behave just like the one that you downloaded and have been using.

Gene Database Project Links
Overview	Deliverables	Reference Format	Guilds	Project Manager	GenMAPP User	Quality Assurance	Coder
Overview	Deliverables	Reference Format	Teams	Heavy Metal HaterZ	The Class Whoopers	GÉNialOMICS	Oregon Trail Survivors

@@ Line 82: / Line 82: @@
 === Prerequisites ===
-# Make sure that you have already accomplished the [[#Milestone 1: Version Control Setup|version control setup milestone]].
+# Make sure that you have already accomplished the [[#Milestone 1: Version Control Setup|version control setup milestone (Milestone 1)]].
 # Make sure that you have already downloaded and installed the software mentioned in [[#Milestone 2: “Developer Rig” Setup and Initial As-Is Build|Milestone 2]] (first item).

Difference between revisions of "Coder"

Revision as of 01:42, 2 November 2015

Contents