Lenaolufson Week 3

From LMU BioDB 2015
Jump to: navigation, search

The Genetic Code, by Computer

Complement of a Strand

Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand. In other words, fill in the question marks:

cat sequence_file.txt | sed "y/atgc/tacg/" tcgccatatg

Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:

The sequence used was "agcggtatac"

  • +1

cat sequence_file.txt | sed "s/ .../&g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/ [acgu] / /g" SGI

  • +2

cat sequence_file.txt | sed "s/^./ /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/ [acgu]/ /g" AVY

  • +3

cat sequence_file.txt | sed "s/^../ /g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g" RY

  • -1

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu] / /g" VYR

  • -2

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^./ /g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu] / /g" YTA

  • -3

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^../ / g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu] / /g" IP

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
    • java -jar xmlpipedb-match-1.1.1. jar GO:000[567] < 493.P_falciparum.xml
    • How many unique matches are there?
      • 3
    • How many times does each unique match appear?
      • GO:007 : 113
      • GO:006 : 1100
      • GO:008 : 1371
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • example: <dbReference type="GO" id="GO:0005622">
    • Describe how you did this.
      • grep "GO:000[567]" 493.P_falciparum.xml | more
    • Based on where you find this occurrence, what kind of information does this pattern represent?
      • This represents the gene ontology ID of a gene.
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
      • java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
    • How many unique matches are there?
      • 3
    • How many times does each unique match appear?
      • "Yu b." : 1
      • "Yu k." : 228
      • "Yu m." : 1
    • What information do you think this pattern represents?
      • possibly a name
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
      • java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
      • unique matches: 1
      • number of matches: 830101
    • What answer does grep + wc give you?
      • grep "ATG" hs_ref_GRCh37_chr19.fa | wc
      • lines: 502410
      • words: 502410
      • characters: 35671048
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)
      • From my understanding and solving, grep searches for the pattern ATG as it appears in lines. Each line is counted as a word since each of the lines the pattern appears in has no spaces. Conversely, match searches for the three exact letters in the sequence and counts the number of times that pattern appears throughout the file.