Difference between revisions of "Coder"

From LMU BioDB 2015
Jump to: navigation, search
(Work-in-progress: Eclipse/git setup from scratch.)
(Prerequisites: Expressly point out Milestone 1.)
Line 82: Line 82:
 
=== Prerequisites ===
 
=== Prerequisites ===
  
# Make sure that you have already accomplished the [[#Milestone 1: Version Control Setup|version control setup milestone]].
+
# Make sure that you have already accomplished the [[#Milestone 1: Version Control Setup|version control setup milestone (Milestone 1)]].
 
# Make sure that you have already downloaded and installed the software mentioned in [[#Milestone 2: “Developer Rig” Setup and Initial As-Is Build|Milestone 2]] (first item).
 
# Make sure that you have already downloaded and installed the software mentioned in [[#Milestone 2: “Developer Rig” Setup and Initial As-Is Build|Milestone 2]] (first item).
  

Revision as of 01:42, 2 November 2015

Gene Database Project Links
Overview Deliverables Reference Format Guilds Project Manager GenMAPP User Quality Assurance Coder
Teams Heavy Metal HaterZ The Class Whoopers GÉNialOMICS Oregon Trail Survivors

The coder is the resident expert on the technology being used—assorted software, file management, version control, some troubleshooting, some programming. He or she coordinates with Drs. Dahlquist and Dionisio in extending GenMAPP Builder code and making new versions. GenMAPP Builder is written in Java and is built on open source pure-Java libraries. Source code is hosted on GitHub and built using Apache’s ant utility.

Guild Members

  • Species 1:
  • Species 2:
  • Species 3:
  • Species 4:

Milestones

Milestone 0: Working Environment Setup

Because the machines in the Seaver 120 computer lab have already been set up for this process, the information below is listed primarily for documentation and troubleshooting purposes.

Milestone 1: Version Control Setup

  1. Get a GitHub account and pass it to Dr. Dionisio so that you can be added as a developer of the XMLPipeDB project on GitHub.
    • Once you are set up as a developer, you can clone and push your GenMAPP Builder source code.
  2. Create a GitHub branch of xmlpipedb for your team.
  3. (with QA) Commit and push relevant source data to the GenMAPP Gene Databases folder of your GitHub branch
    • You can always verify what is publicly visible on your branch by visiting the XMLPipeDB GitHub website, choosing your branch from the Branch dropdown menu, then inspecting the code that is visible there.

Milestone 2: “Developer Rig” Setup and Initial As-Is Build

  1. Install core software for developing, building, and testing prototype versions of GenMAPP Builder:
    • Java developer tools: JDK 8 (which, at this writing, is JDK 8u65)
    • A git client (for interacting with GitHub)
    • Any tool that can unpack .gz and .zip files (we are using 7-zip on the Seaver 120 machines)
    • XMLPipeDB Match utility
    • Development environment: while any will do, Eclipse is the specific one that most XMLPipeDB developers have used:
      1. Download and install Eclipse from its download web site. Either Eclipse IDE for Java Developers or Eclipse IDE for Java EE Developers will work.
      2. Eclipse includes ant so you do not need a separate ant installation unless you plan to build GenMAPP Builder outside of Eclipse
      3. If you want to use ant outside Eclipse, please visit http://ant.apache.org.
  2. Follow the instructions in the GenMAPP Builder Project Setup and Initial Build section of this wiki page in order to:
    • Set up a functioning Eclipse development environment for your branch of GenMAPP Builder.
    • Build your own copy of GenMAPP Builder from scratch.
  3. (with QA) Get a full import-export cycle done.
  4. (with QA) Decide on a file/version management scheme/system.

As needed, coders may arrange for a walkthrough or other help session with Dr. Dionisio if there are any issues with the procedures on this guild page.

Milestone 3: Species Profile Creation

  1. Add a species profile to the GenMAPP Builder code base.
  2. Customize the species profile with the species name in the OrderedLocusNames record of the Systems table.
  3. Customize the Link field in the OrderedLocusNames record of the Systems table to hold a URL query with ~ standing in for the gene ID.
    • (with QA) The URL would need to be determined first, of course.

Milestone 4: Species Export Customization

  1. Based on observations from the GenMAPP User and QA, determine and document (as thoroughly as possible) any other modified export behavior that GenMAPP Builder will have to manifest for this species.
  2. Implement this export behavior.
  3. As needed, commit and push your work to your GitHub branch.
  4. Additional milestones will depend on how the rest of the project goes, and the bugs/features generated by that work.
  5. Document/log all work done, problems encountered, and how they were resolved.
  6. When your work is complete, issue a GitHub pull request to merge your branch into the main development line.

GenMAPP Builder Project Setup and Initial Build

This section of the page seeks to provide a guide for building new versions of GenMAPP Builder. You can only run GenMAPP and MAPPFinder on Windows, but you can build and run GenMAPP Builder on any platform that supports PostgreSQL and JDK 8.

Although there are many ways to update and maintain GenMAPP Builder code, for uniformity these instructions will assume the use of Eclipse for viewing, modifying, and updating GenMAPP Builder. The main benefit of Eclipse is that it is largely a one-stop shop for performing all of these tasks.

The instructions listed in this Setup section need only be performed once. Once done correctly, you will primarily be doing what is described in the Common Tasks section.

Prerequisites

  1. Make sure that you have already accomplished the version control setup milestone (Milestone 1).
  2. Make sure that you have already downloaded and installed the software mentioned in Milestone 2 (first item).

GitHub Repository Clone Setup

  1. Determine the desired location (on your development computer) for your local copy of the XMLPipeDB GitHub repository.
  2. cd to this location.
  3. Clone the repository:
git clone https://github.com/lmu-bioinformatics/xmlpipedb.git

Eclipse Workspace Setup

  1. Run Eclipse.
  2. Establish an Eclipse workspace for the XMLPipeDB repository.

Java Project Setup

  1. Make sure that Eclipse is using a JDK (Java Development Kit) and not a JRE (Java Runtime Environment). To verify this, go to Window > Preferences, click on Java, click on Installed JREs, and make sure that the checked environment has JDK in it. If not, you may need to add the environment (on Windows, it lives in C:\Program Files\Java) then check on it.
  2. Upon a successful checkout, you should have a gmbuilder Java project in Eclipse. Make sure that you are in the Java perspective by choosing Window > Open Perspective > Java (or choosing Java from Other... if Java is not already in the Open Perspective submenu).
  3. Double-click on the gmbuilder folder (or whatever you called it) to view its contents.
  4. Double-click on the lib folder.
  5. Shift-click and select all files in the lib folder and then control-click (or, on a Mac, Command-click) on every file inside the lib folder whose name does not end in .jar.
  6. Once all of these files are selected, right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
  7. The src folder should look different from the other folders in that it has a little brown square badge on its upper-right corner. If not, right-click on it and choose Build Path > Use as Source Folder from the popup menu that comes up.
  8. Do the same to the test folder: right-click on it and choose Build Path > Use as Source Folder from the popup menu that comes up.
  9. If you see any red x icons appear, something has not been set up right. Contact other guild members or Dr. Dionisio for troubleshooting if you get stuck.

Adding a Species Profile to GenMAPP Builder

All of this work happens in the Java perspective, so switch to that first if you’re not already there.

Create the Species Profile

  1. Expose the contents of the src folder.
  2. Right-click on the edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles package and choose New > Class from the popup menu.
  3. In the dialog that appears, enter the following:
    • Name: name-of-your-species-without-spacesUniProtSpeciesProfile (in camel case: no spaces, capitalizing the first letters of each word)
    • Superclass: edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtSpeciesProfile (you can also click on Browse... to navigate to this if you don’t feel like typing)
  4. Click Finish. There should now be a new .java file within the edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles package (the one you just created).

Customize the Species Profile

  • Open the file that you have just created. It should appear in the editor area of Eclipse.
  • Override the method that supplies the name of the species and the description of the profile: add the following constructor block right below the public class line in the new file. Remember to customize according to your particular species; the portions that need to be customized are highlighted in asterisks.
public ***NameOfYourSpecies***UniProtSpeciesProfile() {
    super("***Genus species***",
        ***taxonIDOfYourSpecies***,
        "This profile customizes the GenMAPP Builder export for " +
            "***Genus species***" +
            " data loaded from a UniProt XML file.");
}
  • To customize the species profile with the species name in the OrderedLocusNames record of the Systems table as well as a link query for that same record, add the following method block right below the constructor block that you added above. Again, the key information to customize is highlighted in asterisks.
@Override
public TableManager getSystemsTableManagerCustomizations(TableManager tableManager, DatabaseProfile dbProfile) {
    super.getSystemsTableManagerCustomizations(tableManager, dbProfile);
    tableManager.submit("Systems", QueryType.update, new String[][] {
        { "SystemCode", "N" },
        { "Species", "|" + getSpeciesName() + "|" }
    });

    tableManager.submit("Systems", QueryType.update, new String[][] {
        { "SystemCode", "N" },
        { "Link", "***species-specific-database-link***" }
    });

    return tableManager;
}

Additional customization, particularly with regard to the exported data, will depend on the species. Communicate with your QA to see if additional customization is needed. If the additional customization is not too complicated, you might be able to do the work yourself with some instructions. However, if the customization is too difficult, Dr. Dionisio will probably be the one to do the work.

Customize the IDs that the Tally Engine Counts

This step is technically optional, in that it does not affect the overall import/export process. However, it does help you to get an idea of how well the IDs from the UniProt XML file are being brought into the relational database.

  1. First, determine which IDs (outside of the defaults that the tally engine already counts) you would like to count. At a minimum, this includes at least the ordered locus IDs from the gene/name tag in the UniProt XML file. There may be more; consult with your QA.
  2. For each of these IDs, determine the following:
    • Where in the XML file they can be found, in terms of which XML tags
    • Where in the relational database they can be found, in terms of which relational tables
  3. Under edu.lmu.xmlpipedb.gmbuilder.resource.properties, open gmbuilder.properties.
  4. Locate the block of text below (it’s near the bottom). You will insert the customizations that will be described right above this block.
#
# wizard.properties
#
  • First, mark out the section that denotes the customization for your species:
# Species name
  • Next, rewrite your species name without spaces and all lowercase (e.g., Plasmodium falciparum becomes plasmodiumfalciparum). Specify the number of additional custom IDs to count as follows, where speciesname is your no-space, all-lowercase species name, and # represents the actual number of IDs:
speciesname_level_amount=#
  • Now, for each custom ID, you need to specify three things: an element, a query, and a name. Each of these items is numbered, starting from 0. Each item number is called a level.
    1. The element states where you expect an ID to be found in the UniProt XML file. It starts with uniprot/entry, then continues with additional tags as needed. After the tag, you may specify, separated by ampersands (&s), any specific attributes that you would like to choose.
    2. The query states the SQL query that you would use to count the IDs in the relational database. The query would be exactly as you would type it if you were entering it directly into the relational database.
    3. The name is a simple label: this is how you would like to identify this ID in the final Tally Engine table.
  • You can write these in any order, though existing customizations group them by element, query, and name. For example, if your species is speciesname and you only need to count ordered locus IDs, you would add:
# Species name
speciesname_level_amount=1

speciesname_element_level0=uniprot/entry/gene/name&type&ordered locus

speciesname_query_level0=select count(*) from genenametype where type = 'ordered locus';
speciesname_query_level0=Ordered Locus
  • Note how the element ends with name&type&ordered locus, because the name tag in the UniProt XML file will have different types (e.g., “primary”, “ORF”, “synonym”, “ordered locus”, etc.). For ordered locus IDs, we only want to count the name IDs whose type is “ordered locus”.

Once you are done with these customizations, you can test your work by building a new version of GenMAPP Builder, connecting to a relational database that already has imported data (or importing data first if needed), then running the Tally Engine. The resulting table should include, in addition to the defaults that you have seen before, the new IDs that you have added.

Add the Species Profile to the Catalog of Known Species Profiles

The last step involves actually making GenMAPP Builder know that your new species profile exists. This involves a change in an existing file:

  • Under edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles, open UniProtDatabaseProfile.java.
  • Near the top of the file is a block that looks like this:
super("org.uniprot.uniprot.Uniprot",
    "This profile defines the requirements "
        + "for any UniProt centric gene database.",
    new SpeciesProfile[] {
    new EscherichiaColiUniProtSpeciesProfile(),
    new ArabidopsisThalianaUniProtSpeciesProfile(),
    new PlasmodiumFalciparumUniProtSpeciesProfile(),
    new VibrioCholeraeUniprotSpeciesProfile() });
  • What you want to do is add the species profile that you just created to this block. If your species profile is called MySpecialUniProtSpeciesProfile, your modified code should look like this:
super("org.uniprot.uniprot.Uniprot",
    "This profile defines the requirements "
        + "for any UniProt centric gene database.",
    new SpeciesProfile[] {
    new EscherichiaColiUniProtSpeciesProfile(),
    new ArabidopsisThalianaUniProtSpeciesProfile(),
    new PlasmodiumFalciparumUniProtSpeciesProfile(),
    new VibrioCholeraeUniprotSpeciesProfile(),
    new MySpecialUniProtSpeciesProfile() });
  • Essentially, you need to add an item to the comma-separated list, beginning with new, followed by the species profile name, finally followed by ().
  • Save your changes, do Organize Imports to eliminate any red errors, and try a test build!

Build, Test, and Possibly Commit

  1. Create a new distribution of GenMAPP Builder based on Creating a Distribution.
  2. Perform a new export run with this version of GenMAPP Builder (you can skip the import steps and use the same PostgreSQL database if it’s available).
  3. Check the Systems table in the resulting .gdb to see if it contains the custom information:
    • Open the .gdb in Microsoft Access, then open the Systems table.
    • Look for the record for OrderedLocusNames. Your species name should appear under the Species column and your link URL should appear under the Link column.
  4. If all goes well, commit your code as described in Updating and Committing Code. You have now officially contributed to the XMLPipeDB project :)

Common Tasks

The tasks in this section reflect the typical development cycle.

Updating and Committing Code

  1. Right-click on the gmbuilder project folder and choose Synchronize Repository... from the popup menu.
  2. You will be switched to the Team Synchronization perspective.
  3. The presence of blue-arrowed files means that the server has new updates for you to download. Right-click on the gmbuilder project folder and choose Update from the popup menu.
  4. It is good “developer etiquette” to build a new distribution from scratch when you’ve received updates prior to committing your own changes. Thus, after the update, return to the Java perspective, do a build.xml > clean followed by a build.xml > dist.
  5. If everything works out, do Team Synchronize... again. If there are new updates (in the tiny amount of time since you last updated!), test things again.
  6. Eventually, you will see a Team Synchronize... with no incoming code. At this point, go ahead and commit the gray-arrowed files by right-clicking on them and choosing Commit....
  7. Just like with the wiki, it is good developer etiquette to describe briefly the nature of the changes that you are committing.
  8. Even if you have nothing to commit, it is still a good idea to invoke Team Synchronize... regularly so that you are kept up-to-date with regard to files that others may be committing.

Creating a Distribution

To create your own version of GenMAPP Builder based on the code you have in Eclipse (which may contain some new changes/customizations that you would like to test), follow these steps:

  1. Switch to Eclipse’s Java perspective.
  2. Edit the GenMAPPBuilder.java source code to identify the distribution that you are about to create by setting the VERSION string (located at approximately line 83) to a sufficiently descriptive value.
  3. Within the gmbuilder Java project is a file called build.xml. It should have an icon that appears to include an ant.
  4. Right click on build.xml and choose Run As > Ant Build... (the one with the ellipses) from the popup menu that appears.
  5. In the Edit Configuration dialog that appears, check on the clean and dist items in the Targets tab. The Target execution order section near the bottom of the dialog should say clean, dist.
  6. Click the Run button. The computer will work for a bit.
  7. When it is done, right-click on the gmbuilder project folder and choose Refresh (F5 is its keyboard shortcut).
  8. You should see a dist folder appear inside the gmbuilder project folder.
  9. This is your personally-built copy of GenMAPP Builder. Its contents correspond to the extracted contents of the gmbuilder-3.0.0-build-5.zip file that was downloaded in class.
  10. Run pgAdmin III and start a database, then run this copy of GenMAPP Builder as you would the “released” copy. The program should behave just like the one that you downloaded and have been using.
Gene Database Project Links
Overview Deliverables Reference Format Guilds Project Manager GenMAPP User Quality Assurance Coder
Teams Heavy Metal HaterZ The Class Whoopers GÉNialOMICS Oregon Trail Survivors