Sequence Alignment using the Hirschberg Global Alignment Algorithm in C#

This post is about some work I did in 2006 regarding the Hirschberg Global Alignment algorithm for aligning large strings (of DNA). The main reason I’m blogging about this is because, even though I do not work in the field of bioinformatics, and I am far from being a proper bioinformatician, I’m curious to know if C# (and .NET or mono) is used by others in the field of bioinformatics. So, if you read this, drop me a line!

What is the Hirschberg Global Alignment algorithm?

Some descriptions can be found at the following sites:

What can the algorithm do?

The algorithm allows for finding the best alignment of sequences, such as DNA or protein sequences. More importantly, it can do it in n space, rather than in n^2 space, which obviously is important for long sequences. Some examples, that the implementation, referred to below, produces:

ATG-
-TGA
 
 
ACGCTG-A
-CGCTGAA
 
 
TTTTG—-GGG
TTTTAAAAGGGG
 
 
GTCGGGA-GACC-TTA—-GGACGT
AT–G-ATGACCCTTAAAAAG–C-C
 

Why did I do it?
In September 2006, I started an academic course on “Algorithms for Genomes” (A4G) at the IBIVU Amsterdam (http://www.ibi.vu.nl/teaching/a4g/). Part of this excellent course consisted of implementing the Hirschberg Global Alignment algorithm. As I had experience with C#, I chose to implement it in C# .NET. Afterwards, I got laughed at by my unix oriented bioinformatics teacher for choosing C#. I never got over it. 🙂

In order to get my implementation evaluated for grading, I had to make sure it ran on the mono framework. I used the 1.1.12 version at the time, but I haven’t checked since if the code is still compatible with the latest version of the mono framework.

Where can you find the code?
http://www.codeplex.com/HirschbergCSharp. It even comes with quite a number of unit tests. If you run those, you’ll get the output like above in the test result output window for all the unit tests. Have a look!

I recently converted the projects to .NET 4.0, and I changed the implementation to use the Parallel Extensions for .NET. The divide-and-conquer nature of the algorithm lends itself particularly well to use it. It was significant how easy it was to convert the existing implementation to use the Parallel Task Library:

// From the main function:
Task mainTask = AlignSubMatrix(TopLevelRectangle, topLeftScore, bottomRightScore);
mainTask.Wait();

// InternalAlignSubMatrix calls AlignSubMatrix
// Since we're using TaskCreationOptions.AttachedToParent, a structured task tree is built...
private Task AlignSubMatrix(Rectangle sequenceRectangle, AlignmentScore topLeftScore, AlignmentScore bottomRightScore)
{
  return Task.Factory.StartNew(() =>
      InternalAlignSubMatrix(sequenceRectangle, topLeftScore, bottomRightScore),
      TaskCreationOptions.AttachedToParent);
}

Note that the mainTask completes only when the entire tree has completed, so the Task.Wait() call will block until all tasks have completed.

Visual Studio 2010 also comes with great improvements in the area of visualizing and debugging threads and parallel tasks. See below for a couple of screenshots, taken during different times in execution of the above task tree:

Tasks1

Tasks2

You can see a number of tasks ready for execution (scheduled), a number of tasks currently executing, and the hierarchy relation between the tasks. Cool!

Another link worth mentioning here is the Microsoft Biology Foundation.

Advertisements

About Bram Veldhoen

I'm an independant senior software development consultant, working mostly in the area of integration using Microsoft BizTalk Server, Windows Communication Foundation and .NET.
This entry was posted in C# and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s