Ajvree Week 3

From LMU BioDB 2013
Jump to: navigation, search

Week 3 Individual Assignment

Notes:

sed review
& = "repeat what you found" /Wisconsin is still better than &/

Shortcuts-

  • cd to change directories, ls to view file content
  • up and down arrows to view command history, or type history, !number to redo that command
  • CTRL R for reverse search- type in part of search, will recall past commands
  • tab to fill in file name
  • grep- text finder - looks for pattern: "ACTG" filename
  • grep is case sensitive
  • A followed by T with multiple things in between:
  • . = "wildcard" "A......T"
  • indicate beginning of line: ^ "^A......T"
  • end of line: $ "A......T$"
  • use previous command | wc to find word count for previously used file
  • command|command
  • wc- word count
  • enter lines, then CTRL D
    1. lines, # words, #characters

To use xmldb match, enter java -jar xmlpipe.db-match-1.1.1.jar FIRST to give file, insert < sign in front
java -jar xmlpipedb-match-1.1.1.jar "A......T" < hs_ref_GRCh37_chr19.fa

1) "What Match command..."
-2 unique matches
-2,1
-what does info represent?

2) double quote w/in a double quote: "\"James.*\"" asterisk= zero or more
-unique 2
-2,1
-what info?

Reading frames -break into triplets s/.../&space/g and sed"s/t/u/g" | sed -f genetic-code.sed -convert into genetic code s/cgu/L/g s/aug/M/g USE -F -drop between 0-2 characters s/^.//g -3-5- reverse sequence rev



Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:

+1:
cat sequence_file | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
+2:
cat sequence_file | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
+3:
cat sequence_file | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
-1:
rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
-2:
rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
-3:
rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../& ?g"| sed "s/t/u/g" | sed -F genetic-code.sed


XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

1. What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file?
java -jar xmlpipedb-match-1.1.1.jar "GO:000916" < 493.P_falciparum.xml
How many unique matches are there?
-2
How many times does each unique match appear?
-2,1
What information do you think the pattern GO:000916. represents?
I'm not entirely sure, but it looks like a type of identification tag for a protein.

2.What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file?
java -jar xmlpipedb-match-1.1.1.jar "\"James.*\" < 493.P_falciparum.xml
How many unique matches are there?
-2
How many times does each unique match appear?
-8231,1
What information do you think the pattern \"James.*\" represents?
It probably represents a reference to a person's name listed in the database.
3.Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
What answer does Match give you?
830101
What answer does grep/wc give you?
502410
Do the answers make sense? Explain your response.
The answers don't really make sense, since the two values are completely different. The two different mechanisms must read the sequence different ways.

Ajvree (talk) 08:48, 12 September 2013 (PDT)
User Page Week 3

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox