Laurmagee: Week 3

From LMU BioDB 2013
Jump to: navigation, search

Complement of a Strand

  • The appropriate processing commands are the following: cat sequence_file | sed "y/atgc/tacg/"
  • This will turn a nucleotide sequence, "agcggtatac", into "tcgccatatg", it's compliment.

Reading Frames

  1. First Reading Frame (+1)
    • cat sequence_file | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  2. Second Reading Frame (+2)
    • cat sequence_file | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  3. Third Reading Frame (+3)
    • cat sequence_file | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  4. Fourth Reading Frame (-1)
    • rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  5. Fifth Reading Frame (-2)
    • rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  6. Sixth Reading Frame (-3)
    • rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed

XMLPipeDB Match Practice

  1. You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "GO:000916." <493.P_falciparum.xml. to find the occurrences of the pattern in the file.
    • The MATCH command finds two unique solutions.
    • The pattern "go:0009165" appears twice and "go:0009168" appears once.
    • The pattern "GO:000916" seems to be an ID. It probably relates to a much longer sequence, because it includes a large number of identification, 000916. I assume there are other points in the sequence that are identified with other identification numbers as well.
  2. You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
    • The MATCH command finds two unique solutions.
    • The pattern "james k.d." appears 8238 times and "james a.a." appears once.
    • I think the pattern "\"James.*\"" stands for someone's name. The name could appear in a book, movie script, or any other piece of writing.
  3. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • MATCH gives the answer of 830101.
    • grep/wc give the answer of 502410.
    • I would suspect that these answers do make sense, because grep/wc only count the amount of lines that contain ATG, but MATCH will also take into account the number times ATG occurs in each line.

Laurmagee (talk) 20:39, 12 September 2013 (PDT)

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox