What "Having Its Own Index" Means when it comes to Search Engines

Share

If you're looking for a search engine to replace Google or Bing, one important factor is whether or not the search engine has its own index.

What "having its own index" means is that the search engine has its own database of the content that appears in the search results page when you search for something, as opposed to just using somebody's else database. This can sound strange, after all, that should be always the case, right?

Yes, that's the case with search engines in general, however, when it comes to web search engines, things get a bit more complicated. That's because the web is just too big. There are trillions of webpages around the web. Any of these webpages can change at any moment, without telling anybody. In order for a search engine to work, it has to keep its database, its index, updated, and this means, essentially, that it has to download the entire Internet not just once, but periodically, and store all that data somehow.

Of course, not even Google or Bing do this. That's just too much.

What search engines do instead is that they only download and store THE TEXT. So if there are images, videos, audio, etc., they generally will ignore it completely, because the text is very small in size as data, but images and other multimedia is not. It's more manageable, but it's still a very complex task at large scales.

Instantly you will notice that this gargantuan task is just impossible for any human being to do. Surely, there is no intern at Google whose job is to download webpages to feed into their index. Instead, search engines use a computer program called a web spider or web crawler that automatically downloads webpages, identifies links on those webpages, and automatically downloads linked pages as well, crawling the "web" of links, one by one, and with this indexing everything that is linked from somewhere.

This means that any search engine that has its own index also has its own crawler, after all, it couldn't create its own index without a program to create it. And if that crawler is a program that has to be run, that means electricity costs, and bandwidth costs for downloading everything. The cost is just too high.

Consequently, many web "search engines" don't have their own index, they don't have their own spider. Instead, when you search on them, they just send the search query to a real search engine and show you the results.

For example, Kiddle is "powered by Google," which means it just uses Google's index to get its results. On top of that, Kiddle also applies its own kid-friendly filters, so it isn't like Kiddle is just another name for Google, it has its own features, but you can't find anything on Kiddle that you wouldn't be able to find on Google. Similarly, Ecosia and Yahoo Search just use Bing results. If you search for something on Bing, on Ecosia, and on Yahoo, the results are going to be the exact same thing, or at least extremely similar.

That's the problem of a search engine not having its own index. You won't be able to find anything that you wouldn't be able to find on Google and Bing already.

The problem is even greater than that. What "not having its own index" truly means is that the search engine also doesn't have its own ranking algorithm. Which means the first 10 results on Bing, is also the first 10 results on Ecosia, and the first 10 results on Yahoo Search. To be honest, I tested it, and on Yahoo they are a little different because Yahoo avoids showing results from the same website multiple times, from what I could tell, but apart from that, it's exactly the same results.

If you don't like the way Google and Bing rank their results, if you don't like which pages frequently get on top of your queries, you need to use an alternative search engine that has its own index, because if you don't, you'll just get the same results anyway.

Examples

See Web Search Engines that have their Own Index for examples.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *