Blog

Search Engine Algorithm Improvement

Search Engine Algorithm Improvement

Overview

An employment platform is designed to help recruiters and job seekers in the IT community find each other.

The platform has two types of users: recruiters and job candidates. Each candidate’ profile contains the necessary data, such as personal information, education, work experience, skills, contact details, etc.

While looking for a candidate, a recruiter specifies the skills that a candidate must have for the job. Then, the recruiter gets a list of candidates ordered by the matching rates between the recruiter requirements and the candidate profile.

One of the most important matching criteria is the candidate’s set of skills. The problem was that the original algorithm considered only the perfectly matched skills. Quite often job seekers indicate skills very similar to what a recruiter is looking for, but because of the strict nature of the search engine matching mechanism, those candidates were excluded and didn’t appear among the search results at all. For example, if a recruiter looked for someone with strong knowledge of C++, and a job seeker had indicated that he excelled at “C++ embedded,” the latter wouldn’t be shown among the search results, although the skills are very much alike. Another thing to improve was that the algorithm didn’t consider connections between some skills, which exist among different programming languages and frameworks. For instance, the candidate with skill “Django” is likely to have skill “Python” (as Django is a Python framework), but the original algorithm was unable to take this into account, so relevant candidates would be missed.

Our objective was to improve the skill matching algorithm so it would behave more like an experienced and knowledgeable human recruiter.

Solution

We enhanced the algorithm by implementing fuzzy matching of skills to make it consider similarity between related skills and the same skills under different names.

While checking a candidate for satisfying the recruiter skills requirements, the algorithm compares the required skill and the candidate’s set of skills (so that the entire pool of the required skills is not forced to match exactly that of the candidate). This approach enables to evaluate similarity between the two skills being compared. Thereby the problem can be reduced to finding the extent to which a recruiter entered skill \({s}_{r}\) matches some candidate’s skill \({s}_{c}.\)

The platform allows candidates to import skills from their LinkedIn accounts instead of typing them manually. Such import is possible because the candidates enter same skills as those that can be found on LinkedIn.

It’s worth noting that the similarity between the skills \({s}_{r}\) and \({s}_{c}\) is asymmetric. For example, skill “AngularJS” usually implies having skill “JavaScript” (as AngularJS is a JavaScript framework), but that is not true for the opposite case: knowing JavaScript doesn’t necessarily mean knowing AngularJS.

Let’s denote the set of skills of user \({u}\) as \({skills(u)}.\) Then as an estimate of the match degree between recruiter entered skill \({s}_{r}\) and a candidate skill \({s}_{c}\) we can use conditional probability of having skill \({s}_{r}\) given that the candidate has skill \({s}_{c},\) i.e. \(P\left({s}_{r} \in skills\left(u \right) \mid {s}_{c} \in skills\left(u \right) \right).\)

According to the definition of conditional probability, we have the following equation:
\(P\left({s}_{r} \in skills\left(u \right) \mid {s}_{c} \in skills\left(u \right) \right) = \frac{P\left({s}_{r} \in skills\left(u \right) \cap {s}_{c} \in skills\left(u \right)\right)}{P\left({s}_{c} \in skills\left(u \right)\right)} = \frac{P(\lbrace{s}_{r},{s}_{c}\rbrace\subseteq skills(u))}{P({s}_{c}\in skills(u))}.\)

Thus, in order to calculate match degree between \({s}_{r}\) and \({s}_{c}\) we need to estimate two kinds of probabilities:

  • \(P\left(s \in skills\left(u \right)\right)\) for a given skill \(s.\)
  • \(P(\lbrace{s}_{1},{s}_{2}\rbrace\subseteq skills(u))\) for a given pair of skills \({s}_{1}\) and \({s}_{2}.\)

To estimate these probabilities, we used Existing Tags and data sources where each topic is labelled with one or more tags, so the topics were found relevant to the skills matching. We were able to map a significant part of LinkedIn tags, combining with the following simple techniques:

  • case-insensitive comparison
  • ignoring punctuation marks (e.g. spaces, colons and dashes)
  • abbreviation expansion
  • words normalization using lemmatization
  • full-text search in the topics description

The following considerations explain the calculation of the extent to which candidate skill \({s}_{c}\) matches the skill \({s}_{r}\) entered by a recruiter.

Let’s denote the set of tags of an Existing Tags topic \(t\) as \(tags(t).\)
Let \(T\) be the set of all topics on Existing Tags.

Having the LinkedIn to Existing Tags mapping \(f,\) we can approximate \(P(s \in skills(u))\) by \(P(f(s) \in tags(t))\) for a random topic \(t \in T.\) Similarly, \(P(\lbrace{s}_{1},{s}_{2}\rbrace\subseteq skills(u))\) can be approximated by \(P(\lbrace f({s}_{1}),f({s}_{2})\rbrace \subseteq tags(t))\) for a random topic \(t \in T.\)

\(P(s\prime \in tags(t))\) for a given Existing Tags skill \(s\prime\) and a random topic \(t \in T\) is estimated to be

\(\frac{1}{|T|}\displaystyle \sum_{k \in T} [s\prime \in tags(k)].\) Similarly, \(P(\lbrace {s}_{1}^{\prime}, {s}_{2}^{\prime} \rbrace \subseteq tags(t))\) for given Existing Tags skills \({s}_{1}^{\prime}\), \({s}_{2}^{\prime}\) and a random topic \(t \in T\) is estimated to be \(\frac{1}{|T|}\displaystyle \sum_{k \in T} [\lbrace{s}_{1}^{\prime},{s}_{2}^{\prime}\rbrace \subseteq tags(k)].\)

Therefore, the extent to which candidate skill \({s}_{c}\) matches the skill \({s}_{r}\) entered by a recruiter is calculated by the formula:

\(\frac{\displaystyle\sum_{t \in T} [\lbrace f({s}_{r}),f({s}_{c})\rbrace \subseteq tags(t)]}{\displaystyle\sum_{t \in T} [f({s}_{c}) \in tags(t)]}.\)

The developed smart-matching algorithm provides more reasonable and precise search results. It has greatly improved the recruiters’ ability to find suitable candidates: on average there are 27% more relevant candidates found in searching results than before.