Web Search Engines that have their Own Index

Share

For reference, a list of search engines that have their own index(why is this important?) that I know of, in other words, websites that are not "powered by Google" or "powered by Bing" (except for themselves):

  1. Google (www.google.com).
    Web, image, and reverse image index.
    Its spider is called GoogleBot1.
  2. Bing (www.bing.com).
    Web, image, and reverse image index.
    Its spider is called BingBot2.
  3. Yandex (yandex.com).
    Web, image, and reverse image index.
    Note: Russian language bias.
    Its spider is called YandexBot3.
  4. Brave Search (search.brave.com)
    Web and image index.
    Its spider pretends to be GoogleBot4.
  5. Mojeek (www.mojeek.com).
    Web only.*
    Its spider is called MojeekBot5.
  6. DuckDuckGo (duckduckgo.com)
    Web only.**
    Its spider is called DuckDuckBot6.
  7. Kagi (kagi.com).
    Web, images, and reverse image index.***
    Spider name unknown. :(
  8. Wiby (wiby.me)
    Web only. Has an anti-modern-web mission, so its index contains only non-commercial, hobbyist, and indie websites.
    Its spider is called WibyBot7.

*Mojeek does have an image search feature, but it doesn't search the web for images. It just searches for images uploaded to Pixabay and Openverse.

**DuckDuckGo has its own spider, but it will include results from other search engines, specially Bing, when it doesn't have enough results in its own index. In particular, when you search for an image in image search, it seems the results are always identical to what you would get from Bing, so I'm assuming it has no web image index of its own, and just uses Bing for its image search.

***As Kagi is a paid search engine and I'm not a customer, I can't confirm it provides image and reverse image features, but it seems it does, according to their blog. Most importantly, I haven't been able to confirm whether its bot is called KagiBot.

Besides these, there's also Baidu (www.baidu.com), which probably has its own index, but I can't really use to verify because the whole thing is in Chinese. It's worth mentioning it because Baidu is one of the most used search engines, despite nobody using it outside of China, because China is just that huge in population.

Another search engine that may have its own index is Qwant (www.qwant.com), but whenever I try to access it it just says it's not available in my country and I can't even search for anything, no search box, so as far as I'm concerned Qwant is worse than Baidu.

Search Engines that do not have their Own Index

For the sake of reference and completeness, I'll also talk about search engines that DON'T have their own index in this article and just show results from Google or Bing. I need to do this, because you might be wondering whether they may have one, and I just didn't know about them.

  1. Kiddle (www.kiddle.co)
    Powered by Google8. A child-safe search engine.
  2. Swiggle (swiggle.org.uk)
    Another child-safe search engine powered by Google9.
  3. Ecosia  (www.ecosia.org)
    Powered by Bing and Google10. Plants trees with the revenue from your searches.
  4. Yahoo Search (search.yahoo.com)
    Powered by Bing11. Used to use Google results, but switched after being purchased by Microsoft.
  5. Swisscows (swisscows.com)
    Powered by Bing12. Privacy-focused.
  6. StartPage (www.startpage.com)
    Powered by Bing and Google13.* Privacy-focused.

*StartPage doesn't feel transparent to its users in its relationship with its search partners. Ecosia has two transparency links in its footer that disclaim its results come from Google and Bing. Yahoo Search has a disclaimer on every results page. For StartPage, the links for the "About Us" page and the "Privacy Policy" page make no mention of using Google and Bing results, that it means sending your queries to them, or even the fact the queries are sent to anyone. Instead, the text just emphasizes its service anonymous and privacy-focused. I had to use Google to find its help article that disclaims which search engines it uses.

Besides these there are also countless "metasearch" engines that may or may not be operating with legal agreements with Google and Bing to use their services in this manner. They're generally not worth noting, since the only thing that they offer is a promise of "privacy" that can't really be verified.

So your searches don't go to Google, a huge company that faces tons of scrutiny every day, but they have to pass through this random website that nobody has ever heard of and whose CEO or owner you have no way of knowing. That doesn't sound like it's safer, in my honest opinion. That sounds much more dangerous.

For an example, we have MetaGer (metager.org). It has a page that says it uses results from Bing, Yahoo (which is Bing), Mojeek, Yandex, Brave, and something called Scopia. I couldn't find any information on this Scopia search engine, but somehow they still deliver results from Scopia. MetaGer merges results from all these search engines together when you try to search for something, which why it's called a "meta" search engine.

A more problematic example is SearX. SearX is a project that lets anyone create their own meta search engine website, so it's just like Mastodon but for search. This means that instead of trusting a giant tech company, you just have have to trust some guy with your data. Most importantly, there's nothing stopping the owner of a SearX instance from changing the SearX code that they're using to include whatever malware they want. A similar project is called Whoogle.

References

  1. https://developers.google.com/search/docs/crawling-indexing/googlebot (accessed 2024-04-29)
    "Googlebot is the generic name for Google's two types of web crawlers" ↩︎
  2. https://blogs.bing.com/webmaster/april-2022/Announcing-user-agent-change-for-Bing-crawler-bingbot/ (accessed 2024-04-29)
    "Announcing user-agent change for Bing crawler bingbot" ↩︎
  3. https://yandex.com/support/webmaster/robot-workings/check-yandex-robots.html (accessed 2024-04-29)
    "When the robot accesses the page, your server logs may display the User-agent and version of the browser used for crawling the site. For example, Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.268." ↩︎
  4. https://safe.search.brave.com/help/brave-search-crawler (accessed 2024-04-29)
    "The Brave Search crawler does not advertise a differentiated user-agent because we must avoid discrimination from websites that allow only Google to crawl them." ↩︎
  5. https://www.mojeek.com/bot.html (accessed 2024-04-29).
    "MojeekBot is the web crawler for the Mojeek search engine." ↩︎
  6. https://duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/ (accessed 2024-04-29)
    "DuckDuckBot is the Web crawler for DuckDuckGo." ↩︎
  7. https://wiby.me/submit/ (accessed 2024-04-29)
    "The WibyBot [...] is occasionally rejected by some web servers." ↩︎
  8. Kiddle displays "Google Custom Search" on its homepage (seen 2024-04-29). ↩︎
  9. Swiggle displays "Powered by Google Custom Search" on its homepage (seen 2024-04-29). ↩︎
  10. https://ecosia.helpscoutdocs.com/article/579-search-results-providers#Which-sources-are-used-to-enhance-search-results-C8TWi (accessed 2024-04-29)
    "The search results and search related ads on Ecosia come from our search partners Microsoft Bing and Google" ↩︎
  11. Whenever you search for something in Yahoo Search, it says "Powered by Bing™" at the footer of the SERP (seen 2024-04-29). ↩︎
  12. https://swisscows.com/en/privacy (accessed 2024-04-29)
    "We are currently working with Bing and are very transparent about the cooperation." ↩︎
  13. https://support.startpage.com/hc/en-us/articles/4522435533844-What-is-the-relationship-between-Startpage-and-your-search-partners-like-Google-and-Microsoft-Bing (accessed 2024-04-29)
    "Startpage submits your query to Google and Bing anonymously on your behalf, then returns the results to you" ↩︎

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *