80-fold software performance improvement


aptagen

Description

2017 COMPUTER SCIENCE, SOFTWARE RE-ENGINEERING, BIOINFORMATICS

Identifying and fixing the bottleneck for reducing the client’s program execution time. With this improvement client achieved a 1,000x reduction of an embedded algorithm execution time and 80x reduction for the whole client’s software.

Client

A biotechnology company offers synthetic antibody products and services such as research reagents, diagnostic and biomarker discovery tools for use in drug discovery and targeted delivery for therapeutics, and bioindustrial applications.

Challenge

In the process of DNA sequencing, the client receives lots of data. Previously, it took 48 hours for the client’s software to process the data. The company turned to Azati with a request for increasing their software performance within short period.

Solution

Client’s software processes data in multiple steps; on one of them it uses an outside package FASTAptamer. FASTAptamer is a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections.

It was revealed that one of the FASTAptamer’s  programs – clusterization – took most of the time of the whole software performance. The great deal of time of the clusterization program, in turn, was devoted to calculating edit time (the level of similarity between the biological sequences). This calculation is based on the Levenshtein algorithm.

Although, the algorithm itself is appropriate, its implementation in Perl was rather poor and took a lot of time to process the submitted sequence data. Our team decided to rewrite the code into C++ language, as the difference in programming paradigms and execution models affected the time of execution.

Benefits

Our team optimized the software within 1 day. Our client ran the updated pipeline on his data that they had recently analyzed and compared the results:

  • outputs were the same
  • 1’000 quicker performance of the Levenshtein algorithm
  • 30.5 minutes with our improvements compared to 2460.5 minutes (~48 hours) under the original software for results generation

We proposed an accelerated version of the Levenshtein algorithm and the FASTAptamer contributors added it to the official package in the subsequent release v1.0.12.


Technologies:

  • C++
  • Perl

Some detailed information not disclosed due to NDA restrictions