Machine Learning in Bioinformatics: 4 Challenges to Solve in 2018

Machine Learning in Bioinformatics: 4 Challenges to Solve in 2018

Machine Learning is not a new technology. However, the successful implementations of machine learning systems we can see only today. That article describes the possibilities machine learning can change the bioinformatics industry.

Artificial intelligence in general and machine learning, in particular, helps scientists to process data more accurately, and finally deliver the results faster. Azati had already solved several complex challenges in Life Sciences. Machine learning can help bioinformatics scientists in their routine work.

In 2013, a group of bioinformatics professors from across the globe made several meetings at Heidelberg University, Germany. During the meetings, they formulated main bioinformatics challenges of the decade. Scientists decided to share the deliberations with the broader scientific community. Also, they published a series of reports (you may check that reports at US National Library of Medicine).

One of that reports we are considering as the base of this article.

According to one of the reports, the main unsolved challenges in bioinformatics are:
– Data Deluge Issue
– Knowledge Management
– Predicting, not explaining
– Personalized medicine

Can we improve those critical moments with machine learning to bring a new life to the industry? Let’s Discover!


Machine Learning to Solve Data Deluge Issue

Bioinformatics today is about the data. It’s related to huge data deluge. The main problem today is data processing. Scientists discover already discovered facts if they can’t find the data they need.

Today it’s vital to store only useful data. So, the scientific information reduction seems to be unavoidable. There are two ways to solve the Data Overflow Issue:


First way: 

It’s possible to increase the number of data storage and data processing servers. Enable compression. Develop custom data archiving algorithms.

But there comes another problem – as the number of servers, the time needed to find a particular piece of information increases as well. The good news here is that machine learning could speed up the search engine algorithm’s performance.

Huge corporations like Google, Facebook, and Amazon have been using custom search engine algorithms for years. What concerns Google, search engine algorithm is key to the company’s success. It uses machine learning as its core technology to process large string datasets of the world wide web. By the way, we already resolved a task of search engine algorithm improvement for one of our clients.


Scheme: how machine learning search algorithm works


To improve searching capabilities, data scientists and developers usually use vectorization method. According to this method, for every scientific publication, we calculate a vector – for example, three numbers, which are linked with Xx, Xу, Xz coordinate axis. After that moment we have a vast amount of points in the coordinate plane. Finally, we could compare that points and find relationships.


The simple example of vectorization


If it concerns scientific articles – there would be more dimensions, and the scheme would be a little bit more complicated. In fact, when we need to find similar publications, it is needed to calculate its vector and check closest entities.

Second way:

During the meetings, scientists discovered the formula, which can calculate the “value” of the document. The purpose of value calculation is to classify documents by their relevance and delete those with low importance.

The formula should be calculated individually for every group of the documents. It’s close to impossible to do it by hand. Also note the high possibility of making a mistake, especially when a document relates to a new topic undescribed earlier. Such an approach requires a team of qualified experts and much of their precious time.

Machine Learning may help people at document “value” calculation according to the formula. Algorithms could take several documents whose grades were manually processed by a human and perform the graduation for another document in that topic according to the number of factors.

Moreover, automatically check the documents for covering the conterminous topics and finding the similar documents using the vectorization method, that can be merged into the one. Also, it can mark the materials that perform well according to formula with high grade, the others as “potentially” useless.

Finally, we can’t avoid the intelligent search for BioInformatics: it helps us not only to perform fast and accurate searches but also find and merge similar documents.


Machine Learning to Manage Knowledgebase

Today scientists face another problem, even if they find the document they need – it may be quite complicated to extract the information.

Some projects attempt to solve the problem by developing new common standards to decrease the numbers of inconsistencies. However, usual scientists rarely use those standards in their daily work. In fact, newly established standards only bring an additional layer of complexity.

Scientists need a solution for extracting correct data from multiple sources like the flat file, BioMark access or Distributed Annotation Systems.

A solution might be to accept the presence of parallel interfaces while ensuring that new resources are available through as many formats as possible. Its users should benefit from these resources according to their personal preferences.

The real problem is to find necessary data in documents and process it correctly. Machine Learning is perfectly suitable for it: it can easily find complex patterns.

Machine learning is improving digitization of handwritten documents as well. Pattern recognition – the computer science method whall incoming data is processed in search of patterns. For example, if we have the hand-written document we could analyze it in search of headings, content, footers, contact information, and so on. In general, that process is called text data mining. Here is the scheme how it works:


Scheme: Text Data Mining


With Machine Learning, Computer Vision and Artificial Intelligence, publications and archive documents could be processed. There is a great opportunity today to enhance the bioinformatics systems with these technologies.


Machine Learning to Predict Scientific Experiment Results

Traditional scientific order implies that you first create a hypothesis, and after that, you experiment to prove or disprove it. According to modern methodologies, the scientists sometimes develop hypotheses after the experiment. Bioinformaticians do not know the results of the experiment until they conduct it.

Machine Learning can’t formulate the hypothesis on its own, but it may simulate the experiment until it happens. Moreover, if there were similar experiments in the past, Artificial Intelligence may use them as a scratch, and simulate the experiment. Finally, bioinformaticians may consider that simulation as the prediction. Yeah, it may not be 100% accurate, but better that post-factum analysis.  

For the better understanding of the importance of that problem, let’s look at the situation that happened in the middle of the 20th century in Pharmacology. We are talking about the scandal with Thalidomide.

Thalidomide was invented at 1954 in Germany and was sold until 1962 under the brand name Immunoprin. To tell the long story short the medicine was not tested enough and lead to catastrophic consequences.

The use of Thalidomide during pregnancy lead to children abnormalities. It happened because the drug taken by a pregnant woman could pass across the placental barrier and harm the developing fetus. Finally, from 6000 to 12000 children suffered from that disaster.

It would be possible to avoid that situation if the scientists had formulated and adequately tested the hypotheses before synthesizing the medicine. Not vise versa.


Machine Learning to Design Personalized Medicine

Bioinformatics and Pharmacology are moving towards personalized medicine for every disease. Personalized medication is made according to the person’s medical history, genetics, and inclinations.

Understanding the disease leads to its cure. This understanding requires additional and systematic studies of the molecular interactions. In general, scientists are optimistic about personalized medicine.

Such approach has both pros and cons, but cons mostly follow from the lack of data about its impact on the disease flow. The trend for personalized medicine is only growing, and some researches are still not being published due to NDA restrictions.

Many assume that personalized medicine is the future of pharmacology. There are also some issues to consider such as ethical issues and privacy of the patient disease history data. For example, if the information that a client has a high possibility of cancer is made public, it could influence the insurance providers to change the rates.

Scientists need to process large amounts of data from large-scale open access databases of pharmaceutical side-effects: accurate, secure and in the short terms. Machine Learning is perfectly suitable to solve that challenge.


There are many opportunities to use Machine Learning in Bioinformatics from these that we already discussed to those that were not. Machine Learning is suitable both for solving typical and well-known challenges in Bioinformatics as well as for the recently emerged ones.

Still,  Machine Learning is not adopted in BioInformatics widely – mainly because of the misunderstandings and misconceptions about the technology, precisely what stands after it and how it works.

In conclusion, we could say, that machine learning brings endless possibilities to BioInformatics and Pharmacology.

Are we using it right now? Probably not.

Should we at least try it? Definitely yes.

We are sure: machine learning would choose the Bioinformatics in the nearest future.


Are you interested in Machine Learning for Bioinformatics? Want to see proof of concept or MVP? Contact us, and we implement your project as soon as possible. Call +1 (973) 597-1000 or fill the form below.


Your Name (required)

Your Email (required)