Malverso Week 3

From LMU BioDB 2015
Jump to: navigation, search

The Genetic Code, By Computer

I used putty.exe and logged in to my account on my.cs.lmu.edu in order to access prokaryote.txt.

Complement of a Strand

At first, I tried using the command cat prokaryote.txt | sed “s/atcg/tagc/g”, which was incorrect. I revisited my notes to see that it is actually:

cat prokaryote.txt | sed "y/atgc/tacg/" 

Reading Frames

I checked my work using the ExPASy Translate Tool.

+1

Using prokaryote.txt again, I used sed “y/t/u/” to replace all the t's with u's. I then re-read Introduction to the Command Line to find that sed –f <file with rules> is the technique I should use to use to harness the helpfullness of the genetic_code.sed file.

I tried cat prokaryote.txt | sed “y/t/u/” | sed –f genetic-code.sed, but when I checked that answer on the ExPASy Translate Tool it was wrong. Then I tried adding a space between each set of three letters with sed “s/…/& /g” which produced the correct answer but with some left over bases on the end.I added sed “s/[augc]//” to get rid of the left over bases, and then sed "s/ //g" to get rid of the spaces between the letters. This looked like:

cat prokaryote.txt | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/[augc]//g" | sed "s/ //g"

which had the output:

STIFQ-VRWPKKTILNLKRCLIPCSAYNPAASSAGGIL

+2

For this strand, the only change I had to make in my code was to get rid of the very first character of prokaryote.txt. At first, I added the command sed "s/^[agtc]//g", but realized that it also worked to use sed "s/^.//g". I chose the second choice which looked like:

cat prokaryote.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/[augc]//g" | sed "s/ //g"

which had the output:

LLYFNRYDGQRRQY-T-NVA-YHVPRITQPPVPLAAF- 

+3

All I did was add a "." to the sed command to delete the front two letters instead of just one:

cat prokaryote.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/[augc]//g" | sed "s/ //g"

which proved to be successful:

YYISIGTMAKEDNIELETLPNTMFRV-PSRQFRWRHFN

-1

At first I thought all I had to do was add rev prokaryote.txt to the beginning of the previous three. I was mistaken. I realized that I needed to first change the bases to their complements, which I did with the code I wrote for the first homework question. My code:

cat prokaryote.txt | rev prokaryote.txt | sed “y/agtc/tacg/” | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/[aguc]//g" | sed "s/ //g"

which produced:

VKMPPAELAAGLYAEHGIRQRFKFNIVFFGHRTY-NIV

-2

Then I just took off the first character of the line, and used the code as follows:

cat prokaryote.txt | rev prokaryote.txt | sed "s/^.//g" | sed "y/agtc/tcag/" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/[aguc]//g" | sed "s/ //g"

This was the result:

LKCRQRNWRLGYTRNMVLGNVSSSILSSLAIVPIEI--

-3

And finally, I took off two characters from the beginning of the line instead of one:

cat prokaryote.txt | rev prokaryote.txt | sed "s/^..//g" | sed "y/agtc/tcag/" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/[aguc]//g" | sed "s/ //g"

Which produced:

-NAASGTGGWVIRGTWY-ATFQVQYCLLWPSYLLKYSR

XMLPipeDB Match Practice

I used putty.exe to access the match program, and used the Using the XMLPipeDB Match Utility page as a resource.

#1

Here is the Match command:

java -jar xmlpipedb-match-1.1.1.jar "GO:000[567]" < 493.P_falciparum.xml
  • There are 3 unique matches.
  • Match appearances:
    • go:0007: 113
    • go:0006: 1100
    • go:0005: 1371

#2

I used the command:

grep “GO:000[567]” 493.P_falciparum.xml

This gave me a list of all the occurrences of the pattern. I can infer that this pattern represents the database location numbers. An example of the pattern in a line of code is:

<dbReference type="GO" id="GO:0005875"> 

#3

Match command:

 java -jar xmlpipedb-match-1.1.1.jar "\"Yu.*\"" < 493.P_falciparum.xml
  • There are 3 unique matches.
  • Matches appearances:
    • "yu b.": 1
    • "yu k.": 228
    • "yu m.": 1

Using grep to search, I found that the appearances of the pattern were in lines such as:

<person name="Yu K."/>

From which I can infer that the information is names of people.

#4

Using Match with this code:

java -jar xmlpipedb-match-1.1.1.jar "ATG" < hs_ref_GRCh37_chr19.fa

I found that there is 1 unique match and 830101 matches total.

To use grep and wc, I used the code:

grep "ATG"  hs_ref_GRCh37_chr19.fa | wc

This showed me that there were 502410 lines, 502410 words, and 35671048 characters - which is a very different answer than what match provided. They provided different answers because grep locates the pattern on each line. since the sequence was not separated by spaces, grep interpreted each line as a word. Match, on the other hand, counts the number of pattern matches, rather than the number of lines the pattern appeared on.




Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries