Emilysimso Week 3

From LMU BioDB 2015
Jump to: navigation, search

Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.

Sequence used: 5'-gcattaggcaac-3'
  • Used sed "y/atgc/tacg/" to perform complimentary base pairing
Resulting sequence: 3'-cgtaatccgttg-5'


Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence.

Sequence used: 5'-gcattaggcaac-3'
  • Used sed "y/t/u/" to change all t's to u's
Resulting sequence: 5'-gcauuaggcaac-3'

+1 Reading Frame: 5'-gca uua ggc aac-3'

  • Used "sed "s/gca/A/g" to replace the first codon with A
Resulting sequence: 5'-AuuagAac-3'
  • This is not the desired result
  • Used "sed "s/^gca/A/g" to replace only the first codon with A
Resulting sequence: 5'-Auuaggcaac-3'
  • Used "sed "s/uua/L/g" then sed "s/ggc/G/g" then sed "s/aac/N/g" to get the final result
+1 Reading Frame Amino Acids: ALGN

+2 Reading Frame: 5'-g cau uag gca ac-3'

  • Used sed "s/^g/ /g" then sed "s/cau/H/g" then "s/uaggcaac/ stop/g"
+2 Reading Frame Amino Acids: H stop

+3 Reading Frame: 5'-gc auu agg caa c-3'

  • Used sed "s/^gc/ /g" then sed "s/auu/I/g" then sed "s/agg/R/g" then sed "s/caa/Q/g" then sed "s/c$/ /g"
+3 Reading Frame Amino Acids: IRQ

===-1 Reading Frame: 5'-guu gcc uaa ugc-3'

  • Used see "y/t/u/" to change t's to u's from complementary strand (3'-cgtaatccgttg-5' to 3'-cguaauccguug-5')
  • Used echo "cguaauccguug" | rev" to reverse the strand
Resulting Stand: 5'-guugccuaaugc-3'
  • Used sed "s/guu/V/g" then sed "s/gcc/A/g" then "s/uaa/ stop/g" then sed "s/ugc/ /g"
-1 Reading Frame Amino Acids: VA stop 

-2 Reading Frame: 5'-g uug ccu aau gc-3'

  • Used sed "s/^g/ /g" then sed "s/uug/L/g" then "s/ccu/P/g" then "s/aau/N/g" then sed "s/gc/ /g"
-2 Reading Frame Amino Acids: LPN

-3 Reading Frame: 5'-gu ugc cua aug c-3'

  • Used sed "s/^gu/ /g" then sed "s/ugc/C/g" sed "s/cua/L/g" sed "s/aug/M/g" then sed "s/c/ /g"
-3 Reading Frame Amino Acids: CLM


XMLPipeDB Match Practice

What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?

  • Used the line: java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
There are 3 unique matches
go:0007 appears 113 times, go:0006 appears 1100 times, go:0005 appears 1371 times

Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.

  • Used grep "GO:0006" 493.P_falciparum.xml to find the lines containing the sequence
  • This did not give context
  • Used grep "...GO:0006" 493.P_falciparum.xml to find context
Result: id="GO:0006506
  • These are ID numbers of some kinds, presumably

What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?

  • Used java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
There are 3 unique matches
yu b. appears 1 time, yu k. appears 228 times, yu m. also appears once
  • These are most likely names

Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.

  • Used java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
Total matches: 830101
  • This means that ATG appears together 830,101 times in the code
  • Used grep "ATG" hs_ref_GRCh37_chr19.fa | wc
Result: 502410  502410  35671048
  • This means that ATG appeared in 502,410 lines and contained 35,671,048 characters

The two counts are different because ATG may have appeared multiple times on in a line (the 502,410), explaining why this is a lower number.


Main Page

Weekly Assignment Information

User: Emilysimso

Assignments

Individual Journal Entries

Class Journal Entries

Group Project

Heavy Metal HaterZ