Jwoodlee Week 3

Electronic Lab Notebook

ssh into my.cs.lmu.edu using your username, and enter your password.

Complement of a Strand

locate the file in ~dondi/xmlpipedb/data, and enter the following command:

cat prokaryote.txt | sed “y/actg/tgac”

This will yield prokaryote.txt’s complementary DNA strand.

Reading Frames

These sets of commands are more complicated than Complement of a Strand. This is essentially what I had to accomplish:

take sequence file, replace the t’s with u’s, break up the sequence into groups of 3, use genetic-code.sed as the translation “chart”, and then eliminate extra nucleotides if there are any. For the different reading frames I will just delete the first one or two nucleotides

After lots of googling I came up with this basic outline in terminal:

cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed

For different reading frames, insert sed “s/^.//g” or "s/^..//g" after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output. Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/.

+1
cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

+2
cat prokaryote.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

+3 
cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')

-1
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

-2
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

-3
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

Checked with Expasy translation tool.

For the XMLPipeDB utility I used the wiki provided on the course website. The first command I found on the wiki after scrolling down to the "Running Command-Line Java Programs" section. I entered the commands into the command prompt window under the directory, ~dondi/xmlpipedb/data, this allowed me to use the XMLPipeDB utility.

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
- Using the wiki page I found this command and ran it in ~dondi/xmlpipedb/data
- java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
- How many unique matches are there?
- I read the output of the function, and wrote it here.
  - Total Unique Matches: 3
- How many times does each unique match appear?
  - More reading of output brought me to this spot, and I wrote it down after clicking on "edit" on this wiki page.
  - go:0007: 113
  - go:0006: 1100
  - go:0005: 1371
Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- One such occurrence: <dbReference type="GO" id="GO:0005777">
- Describe how you did this.
  - I entered grep "GO:000[567]" 493.P_falciparum.xml | more, as a command and then picked out a random occurrence. I then edited the wiki page and wrote it down.
- Based on where you find this occurrence, what kind of information does this pattern represent?
  - The ID of the gene ontology within a database, or an identifier of a gene ontology term. I found this out on the match utility wiki page.
What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
- Entered this command into terminal, have not changed directories.
- java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
- How many unique matches are there?
  - Read output and jotted it down here.
  - 3
- How many times does each unique match appear?
  - "yu b.": 1
  - "yu k.": 228
  - "yu m.": 1
- What information do you think this pattern represents?
  - I used grep on the same pattern to try to figure this out and based on what I found, I would say it is somebody's name.
Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- What answer does Match give you?
  - atg: 830101
  - Total unique matches: 1
- What answer does grep + wc give you?
  - 502410 502410 35671048 from left to right: lines, words, bytes
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)
  - grep + wc counts the lines with at least one occurrence of "ATG" while the Match utility counts each individual instance of "ATG". Therefore grep+wc gets a lower number for word count.

BIOL 367, Fall 2015, User Page, Team Page

Weekly Assignments

Individual Journal Pages

Shared Journal Pages

Jwoodlee Week 3

Contents

Electronic Lab Notebook

Complement of a Strand

Reading Frames

XMLPipeDB Match Practice

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools