Week 3 E-notes Eyanosch

From LMU BioDB 2015
Jump to: navigation, search

In order to produce the complementary strands nucleotide sequence

cat sequence_file | sed "y/atgc/tacg/"

this follows the rule of y/<original characters>/<new characters>/

IF afcggtatac is contained in sequence_file

the output would be tcgccatatg

  • basically I wanted the computer to read the file and replace each individual letter with its corresponding base, A with T (and vise versa), and C with G (and vise versa)

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence.

When looking at this problem, there are a few things that need to be done.

The nucleotides sequence must be established and converted into RNA. This can be done by replacing the T's with U's.

Then the nucleotide sequence must be broken into its codon components, probably starting with the +1 reading frame.

Next the codons must be read and converted into the specific Amino acids which we need to use Dondi's ~dondi/xmlpipedb/data directory in which genetic-code.sed has the conversions already written. I'm not entirely sure how t o invoke the command so I took a look at my partner Brandons page for help, this is the part that I have been stuck on. The code written matches sed -f <file with rules>.

  • When adding using a +2 or +3 reading frame some nucleotides might not be translated and left behind. The amino acids are coded in uppercase letters so if I were to search for all the [actg]'s with sed "s/[actg]//g" they would be erased and only the Amino Acids coded for would be left.

also when creating the + 2 or + 3 reading frame command line, just use the ^ to indicate the start of the command line and erase the next letter or next two letters.


  • When creating the -1 through -3 reading frames, the complementary bp of each nucleotide was produced through sed "y/actg/tgac/" and the strands were altered by removing the first, or second two units of the command line.