Task 11: Adding a new Workflow Step¶
To actually figure out what a DNA sequence says, a further analysis has to be conducted. This includes figuring out which RNA is generated from a given DNA sequence and translating the RNA into amino acids.
To emulate this we need a new kind of sequencer: An RNA sequencer. While it has the same capability to analyze samples as the DnaSequencer how this analysis works internally is completely different.
If you are interested: Learn how DNA to RNA encoding works in principle
11a: Even more Generalization¶
Create a new class
Sequencer that acts as a superclass to the
All sequencers have the
serial_number attributes in common, as well as the
Hint: Sometimes it is not useful to give an implementation for a method in a superclass yet. The
passkeyword can be used to indicate that no implementation will be given:
11b: A new Tool in the Box¶
Create a new class
This is a subclass of
Sequencer and thus inherits all attributes and methods.
Consider how to properly implement the
11c: Find and Replace¶
An RNA sequence is made up of the bases
U (instead of
Add a method
dna_to_rna(…) to the RNA sequencer.
It should be given a template strand DNA sequence and create a matching RNA sequence by replacing the bases as follows:
The resulting RNA strand should be returned as a string similar to the DNA strand.
Make sure to test your new method a bit.
For example the input
"CATCATCAT" should give you
Hint: Discuss whether you could also use a static or class method.
The RNA we generate here is actually mRNA. We skip the following tRNA step for simplicity.
11d: Three of a Kind¶
In the next step, the newly generated RNA sequence needs to be chopped into triplets.
Add a new method
extract_triplets(…) to the
The input is a string made up from the RNA bases.
It is to be cut into pieces which each are 3 bases (i.e. letters) long.
All pieces should be collected together in one list.
Bases that are left over in the end get dropped.
The output is this list of strings where each of those is exactly 3 bases (letters) long.
For testing, the input
"GUAGUACC" should yield
["GUA", "GUA"], (the remaining
"CC" is left out).
11e: Tying it all together¶
analyze_rna(self, sample)-method to the
It takes a
Sample, obtains its template strand DNA and applies the helpers we created before:
The resulting list of triplets is to be stored again in the
Sample, so you will have to add a new attribute