Bioinformatics Search Tools Creation


Bioinformatics Search Tools Creation

Description

2016 BIOINFORMATICS, ENGINEERING, SEQUENCE ANALYSIS

A Life Science Portal was endowed with extensive search capabilities with a view to helping the scientific community in their analyses and research activities, providing the functionality to search with multiple entry sequences across various nucleotide and peptide databases.

Client

Biotechnology Corporation

Challenge

Biological scientists, engaged in searching for complementarity determining regions (CDRs), fusion/chimeric constructs, recombinant plasmid constructs and bispecific antibodies, were in need of a reliable tool, which would enable search for multiple nucleotide (or protein) queries against the nucleotide (or protein) database. The challenge was risen due to the question of the project feasibility, as the tool for searching with several query sequences at that time hadn’t been introduced before.

Solution

Azati team designed, developed and released a general purpose Multiple Sequence Search (MSS) tool, than can accept and search on up to six query sequences.

The MSS tool can be used to find and list documents that contain CDR sequences of interest. Also, one may perform a simultaneous search on multiple CDRs contained within the same patent document, taking into consideration that the CDRs might be present in different claims. The tool uses our enhanced version of the Smith-Waterman algorithm in order to produce not only accurate and comprehensive results, but also do it in a considerably shorter time (30 to 50 times than the standard Smith-Waterman algorithm). The interface makes it easy to track and see whether there is one or more than one similar sequence that might have been claimed within a single document. An advanced scoring system was designed to assign a higher score to those documents that contain a greater number of CDRs matching the search query, which produces more relevant results. The system also provides an option to show combined alignment. We also implemented a functionality to generate the reports (of 4 different file formats) based on the search results, allowing the users to clearly identify which query sequences are being aligned to the subject ones.

The created Multiple Sequence Search algorithm fully satisfies the requirements set and brings a higher degree of sophistication to the researchers.

Technologies:

  • C/C++
  • NVIDIA® CUDA® Toolkit

Some detailed information not disclosed due to NDA restrictions