Sunday, April 30, 2017

DNA Matching in Prolog

Leave a Comment

I am attempting to learn basic Prolog. I have read some basic tutorials on the basic structures of lists, variables, and if/and logic. A project I am attempting to do to help learn some of this is to match DNA sequences.

Essentially I want it to match reverse compliments of DNA sequences.

Example outputs can be seen below:

?- dnamatch([t, t, a, c],[g, t, a, a]). true 

While it's most likely relatively simple, being newer to Prolog I am currently figuring it out.

I started by defining basic matching rules for the DNA pairs:

pair(a,t). pair(g,c). etc... 

I was then going to try to implement this into lists somehow, but am unsure how to make this logic apply to longer lists of sequences. I am unsure if my attempted start is even the correct approach. Any help would be appreciated.

3 Answers

Answers 1

Since your relation is describing lists, you could opt to use DCGs. You can describe the complementary nucleobases like so:

complementary(t) -->    % thymine is complementary to   [a].                  % adenine complementary(a) -->    % adenine is complementary to   [t].                  % thymine complementary(g) -->    % guanine is complementary to   [c].                  % cytosine complementary(c) -->    % cytosine is complementary to   [g].                  % guanine 

This corresponds to your predicate pair/2. To describe a bonding sequence in reverse order you can proceed like so:

bond([]) -->            % the empty sequence   [].                   % doesn't bond bond([A|As]) -->        % the sequence [A|As] bonds with   bond(As),             % a bonding sequence to As (in reverse order)   complementary(A).     % followed by the complementary nucleobase of A 

The reverse order is achieved by writing the recursive goal first and then the goal that describes the complementary nucleobase to the one in the head of the list. You can query this using phrase/2 like so:

   ?- phrase(bond([t,t,a,c]),S). S = [g,t,a,a] 

Or you can use a wrapper predicate with a single goal containing phrase/2:

seq_complseq(D,M) :-   phrase(bond(D),M). 

And then query it:

   ?- seq_complseq([t,t,a,c],C). C = [g,t,a,a] 

I find the description of lists with DCGs easier to read than the corresponding predicate version. Of course, describing a complementary sequence in reverse order is a relatively easy task. But once you want to describe more complex structures like, say the cloverleaf structure of tRNA DCGs come in real handy.

Answers 2

A solution with maplist/3 and reverse/2:

dnamatch(A,B) :- reverse(B,C), maplist(pairmatch,A,C). 

Answers 3

If you want to avoid traversing twice you can also maybe do it like this?

rev_comp(DNA, RC) :-     rev_comp(DNA, [], RC).  rev_comp([], RC, RC). rev_comp([X|Xs], RC0, RC) :-     pair(X, Y),     rev_comp(Xs, [Y|RC0], RC). 

Then:

?- rev_comp([t,c,g,a], RC). RC = [t, c, g, a]. 

This is only hand-coded amalgamation of reverse and maplist. Is it worth it? Maybe, maybe not. Probably not.

Now that I thought about it a little bit, you could also do it with foldl which reverses, but now you really want to reverse so it is more useful than annoying.

rev_comp([], []). rev_comp([X|Xs], Ys) :-     pair(X, Y),     foldl(rc, Xs, [Y], Ys).  rc(X, Ys, [Y|Ys]) :- pair(X, Y). 

But this is even less obvious than solution above and solution above is still less obvious than solution by @Capellic so maybe you can look at code I wrote but please don't write such code unless of course you are answering questions of Stackoverflow and want to look clever or impress a girl that asks your help for exercise in university.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment