One of the basic problems we want to address in this course is to
find a pattern —such as a word, a motif, a gene, or a protein
domain— into a larger text —such as a novel, a genome or a
protein. For example, we would like to know where we find the word
Sancho in the file Don_Quixote.txt.
Your mission is to write a function (in any reasonable
computer language) that takes two inputs, pattern and
text, and returns the set of locations where
pattern occurs in text.
For example, if pattern="RB" and
text="ABRACADABRA", then your function should return 2 and
9. (In some languages, such as C++, Java and Python, indices start at 0,
so in that case the result is 1 and 8).
- Write the function, and test with a FASTA file, and with
Don_Quixote. Try with several patterns. - How long does your function takes to find all matching places? What factors affect the execution time?