Today we cover another interesting topic: how much does it cost to create the search engine? We, at Azati, develop and deliver commercial search engines. Those engines are entirely different from the regular ones: Google, Yahoo, Bing, Baidu, and others. Many facts about commercial search engines are unobvious and quite hard to understand without a degree in computer science.
So we give you the information about the actual price of commercial search engines in the one article clearly and without information noise.
We describe several main aspects that affect end-user price:
– How search engines differ one from another
– The development price is not the thing we should care about
– The Average Costs of Search Engine Development
Interested? Let us discover the details!
How search engines differ one from another
You won’t be surprised but the web search engines today are similar to the ones from 90th. In fact, there is a spider-bot, which crawls the web pages and evaluates the content according to several factors like keywords, keyword density, meta tags, images, page load speed and so on. There are dozens of different factors that indeed display the page quality. By the way, the Google web-crawler (or Googlebot) processes the entire page about 20 seconds.
How modern web search engines work (simplified)
The commercial search engines are different from the public ones (Google, Yahoo, Bing, and others). Yeah, its crawlers somehow rank the content, but it is a little bit more complicated. If we are talking about commercial search engine development, we probably mean the vast and complex data procession.
Yeah, search algorithms are still looking for patterns, which can describe the data unit, but do it differently. As was said above – we have developed and optimized several search engines, and we could explain how the data is processed.
There are the main ways to process data, let us look at “the traditional search approach”. The best way to explain it – to look at the top of Google search queries: “How to build the search engine for a website”.
Imagine we have a typical website with the blog numbering three hundred HTML web pages. HTML is a kind of understandable text format that can quickly be analyzed with any text-processor. By the way, we are calling any file – a document, for simplification.
To find the data in the document that is related to the user query we should:
– Determine the pattern
– Download the page from the database
– Analyze the page (in search of the pattern)
– Build Search Engine Result Page (also known as SERP)
There are two narrow places here. Both are related to the page size:
– It might take some time to download the page (document)
– It usually takes much time to find the pattern, if you are using the standard search approaches
Typically, it requires about 2ms to process a document (HTML page) to a website that is using WordPress (written in PHP). Now imagine, that we have about 200 pages. It takes 400ms, which equals to a half a second. It seems fast enough.
Now imagine an e-book library, where there are millions (!) of books with hundreds of pages. Yeah, it requires less time to process single page due to we do not need to download the single page form database – we download the whole e-book in the one time, but the page also procession takes time.
After the awareness of this fact, remember about another one: there are many documents, that the search engine can not quickly process – images, videos, encrypted formats, docs with broken encoding and so on.
Have you ever thought: “Why can’t Google show us everything that we want? Can he find relevant information?”
Yeah, from year to year SERP quality provided by search engines increases. As you may remember in 2000th, there were more queries without answers to compare with the pages that had. The search quality improves, but even today, there are o lot of files that cannot be processed. Moreover, search engines are not even close to the half, in case the number of the documents increases exponentially.
This way custom and commercial search engines need closer examination due to they use custom search algorithms to find relevant data. The customers usually require accurate and fast search that solves the business goal.
By the way, we have the impressive case study: we improved the search engine for the talent acquisition system by tuning their search algorithm.
The development price is not the only thing we should care about
From our point of view, the development price is probably the not the first thing customers should care about when we are trying to describe the final costs.
There is another thing that we should take into consideration – maintenance.
If we look at the bigger brother – Google – we see that there are many servers (hundreds of thousands probably) there that process the data in the real time.
Why do they do so? The World Wide Web is a fast-changing thing. There are both static and dynamic pages, all that pages should be recrawled multiple times to track the data changes (if there are any). In this way, Google processes the same data over and over again to make SERP fit user query. It is the best and the most effective way to monitor page changes, mainly if there sextillions of pages.
Internet grows incredibly fast!
This way big search companies use complex search algorithms, that do not search the answer directly in the document. Let’s call the extracted data – “footprint”, for the simplicity.
For example, we do not need to collect all the data about the book, when we can catch the central theses in summary. This way, we build the mentioned above “footprint” that contains the necessary data: author, titles, summary, brief description, keywords, publication data – and store that “footprint” in the separate database.
When the user asks the search engine for something, the search engine firstly looks for the pattern in the “footprint”s database. If it doesn’t find the satisfying answer – performs the deep search. By the way, the deeper we go on Google, and the slower pages are being generated. You may check it yourself – ask for a complex query and compare the SERP page generation time at different pages (usually the first page creates far quicker than the thirtieth).
If there are hundreds of thousands of servers there – so how much it cost? Well, nobody knows the exact numbers. The only thing we know for sure – a lot. Google is now ordering the new, unseen before servers to process the data quicker, more accurate and secure. This way, even complicated and in-depth searches are entirely precise and fast.
We discovered how Google works. Now lets the see the difference if we are talking about commercial and private search engines.
Remember the example about e-library? The difference between e-library and Google is simple: the books are mostly static and do not change its content in the short periods of time.
This way we can use two approaches:
– Develop a lightning fast search engine powered with solid mathematical knowledge, modern databases, SSD drives coded with the fast programming language like C++
– Develop a “footprint” database
That two approaches affect search engine development costs: customers usually prefer the first one to the second one. It is more accurate but slightly more expensive.
The Average Costs of Search Engine Development
If you want to build the search engine from scratch in Python or PHP to power, for example, your website search – you can do it free after looking at some courses at Udemy or EDX. It requires some programming skills. In fact, that costs you up to $100 (if we take into consideration paid courses).
If you want to create the search engine like Google (with a decent search quality) and interested in price – we would say it costs about $100M (for the prototype) from scratch. Including costs for servers, bandwidth, colocation, electricity and so on. By the way, renting servers are not profitable. Maintenance price for the existing cluster costs you about $25M per year.
We would say, that there is no sense in creating another global search engine. There are plenty of search engines: starting from the “independent” DuckDuckGo to the “big brother” Google. If you want to build the search engine – we recommend you to bring something new to the industry.
If you want to create the custom commercial search engine for your business: insurance, bioinformatics, healthcare, e-commerce, finance and other – the development costs you from $10.000 to $60.000, with a low maintenance fee.
Want to develop your own search engine? Call +1 (973) 597-1000 or fill the form below to get the free consultation or estimation!