site: Search Operator on Google

Share

By default, Google returns results from every website in its index, but you can make it return results from a specific website in two ways. First, you could try just typing the name of the website, e.g. crafts pinterest, but that will also include results from other websites that mention Pinterest in their webpages. Second, you could use the site: operator, e.g. crafts site:pinterest.com: this will force Google to only show webpages from the pinterest.com domain name. In this article, I'll explain the various ways we can use this operator.

The query nasa site:reddit.com/r/iama typed into Google's search result page, 
showing results only from that subreddit.
Using the site: operator to search posts of a subreddit on Google.

Showing Only Results from One Specific Website on Google

In order to show results from only one specific website on Google, all we need to do is type the code site: immediately followed by the domain name of the website we want to search. For example:

crafts site:pinterest.com

If we search for the query above on Google, that will show us only results from the website at pinterest.com that contain the term "crafts."

Common problems: make sure to type site in lower case, a colon (:), and the domain name, without any spaces between them. If you type site: pinterest.com with a space after site:, Google won't interpret this as the site: operator. If you type Site: with a capital letter, Google also won't do what we want. In both cases, Google will ignore the colon and search for the term site.

You can also type https://www.pinterest.com/ instead of just pinterest.com and it will work. This means you can just copy a website's address from the address bar in your browser, type site:, and paste it right in front of that to search the website's webpages.

Showing Results from Two or More Websites

It's possible to include results from two or more websites by combining the site: operator with the OR operator like this:

crafts site:pinterest.com OR site:youtube.com

The query above will show us webpages that come from pinterest.com or youtube.com, i.e. results that come form "both" sites.

Common problems: you must type OR in upper case for Google to understand it. Alternatively, you can use a vertical pipe (|) instead of the code word OR.

Curiously, it's not "both" in the logical sense. For example, if you remove the OR and search for:

crafts site:pinterest.com site:youtube.com

Google will show you zero results. That's because if a webpage is from pinterest.com, that webpage can't also be from youtube.com. Each webpage belongs to only one website, so there aren't any webpages that are BOTH from one site and the other at the same time. That's why we need the OR.

Showing Results with URLs that Begin in a Specific Way

We can use the site: operator to search for webpages from one specific website that begin with a certain URL path prefix. A common use case for this nowadays is to search for threads posted in a given subreddit on Reddit. For example:

table site:reddit.com/r/woodworking

If we search for the above on Google, Google will show us all webpages from the reddit.com domain, with URL paths that begin with /r/woodworking, and that contain the term table. In practice, this means we're searching for all posts in the /r/woodworking subreddit that talk about tables. However, the way this works is a bit complicated, so let me explain it for a bit.

To begin with, let's see what kind of URL this matches. The first result when I searched for the query above had the URL https://www.reddit.com/r/woodworking/comments/1aqxe0s/my_first_table/. First, let's understand what each part of this URL means.

  1. https:// - this is the protocol.
  2. www - this is the subdomain.
  3. reddit.com - this is the root domain.
  4. /r/woodworking - this is the subreddit. All subreddits begin with /r/ in this website.
  5. /comments/1aqxe0s/my_first_table/ - this /comments/ part (called an endpoint) is what shows you a post, a thread, and its comments. Right after it we have the thread ID, 1aqxe0s. This is what identifies the thread inside Reddit's program. After that we have something called the "slug," my_first_table. This doesn't really do anything, we just put it there so we can tell the title of a webpage from its URL.

Here's what is interesting: if the website's program can get information about a thread by its ID (in this case, 1aqxe0s), and this information includes what subreddit the thread belong to, which is very likely, then the website doesn't to get the subreddit a thread belongs to from the URL, as it can obtain that information through other methods.

The URL could have been just https://www.reddit.com/comments/1aqxe0s/ and Reddit would have been able to show exactly the same thing without problem. In fact, I tried it, and that URL actually works.

In most modern websites that display posts or articles, there will always be one single part of the URL that uniquely identifies the post, and that part will be enough for the website's program to figure out what post it should display. In social media, that's often a number, or a number mixed with letters, and it's the post ID. In some news websites, the slug is the unique identifier: there can't be two articles with the same slug.

For example, in the Ars Technica website, there are categories such as science and gadgets. If you take https://arstechnica.com/science/2024/04/moments-of-totality-how-ars-experienced-the-eclipse/ and replace science with gadgets, it will show you exact the same page. More specifically, it will redirect you to the correct URL. Because the part that identifies the article is the slug moments-of-totality-how-ars-experienced-the-eclipse.

Even though the unique ID is the only thing necessary, many websites put that unique ID after a category prefix, such as /science/ above, or /entertainment/, /news/, and so on. In Reddit's case, that category prefix is the subreddit

In any website that uses a category prefix, we can search for all posts or articles in that category with the site:operator. For example:

eclipse site:https://arstechnica.com/science/

If we type the query above into Google, Google will only show us URLs that have /science/ path prefix, which in this website's case means the "science" category.

Twitter and Instagram also work similarly: an user's post on Instagram has a URL that includes their username, for example: https://www.instagram.com/cristiano/p/C5MJp3yL6lO/. We can see here that the ID of the post is C5MJp3yL6lO, but Instagram's website is programmed to create URLs that also include the user's username. In this case, /cristiano/, which is Cristiano Ronaldo's username.

Therefore, we can search for all his posts on Instagram on Google by doing something like this:

world cup site:instagram.com/cristiano/

And that will give us results about the world cup posted on his Instagram.

Note that Google ignores the slash (/) even in the site: operator, so the query above may include results from other users who usernames begin with cristiano. If someone called themselves cristiano.something, that will show up on Google too. This is generally not a problem, but we could improve it a little by searching for /cristiano/p/, since all posts have a /p/ after the username.

Common problems: site: can't be used to search for a term that appears in the middle of the URL in any random website. In this case, use the operator inurl:. For example world cup inurl:forum will show you several websites that have webpages in a /forum/.

Searching Only a Specific Subdomain of a Website

We can use site: to search for webpages belonging to only a specific subdomain of a website. For example:

magia site:en.wikipedia.com

The query above will search only en.wikipedia.com. Wikipedia uses different subdomains for its different languages (e.g. pt.wikipedia.com is the Portuguese Wikipedia). The en subdomain is for the English Wikipedia, so this has the same effect as searching for English webpages on Wikipedia.

Some websites, like Tumblr, WordPress, Blogger (blogspot.com), and Neocities allow users to publish their webpages in a subdomain of theirs, so this could be useful for searching one of these as well.

Searching Only Websites with a Specific TLD

We can use site: to show only results from websites with with a domain name that ends in a specific TLD, like .com, .net, .info, etc. For most TLDs, called unrestricted TLDs, this isn't useful, since anyone can get any unrestricted TLD they want. However, some TLDs like .edu and .gov are restricted and can only be acquired by specific entities (colleges and the government, for example). So you could search for:

how to do my taxes site:.gov

To get information straight from a governamental website.

The same code can also be used to search for CCTLDs, which are country-related. For example:

linux site:.br

Will search for the term linux on sites that have Brazil's CCTLD, .br. This doesn't mean the sites are Brazilian, the site could be owned by anyone from any country. Nor do all Brazilian websites have a .br CCTLD, nothing stops Brazilians from acquiring domain names without .br. But generally it's an easy way to query national websites from all sorts of countries.

Excluding a Website from The Results

We can combine the site: operator with the minus (-) operator to exclude a website from search results. See the article of the minus operator for some cool ways to use this. An example:

crafts -site:pinterest.com

The query above will remove Pinterest from the results.

Searching ALL Subdomains of a Website

We can combine the site: operator with the asterisk operator to search ALL subdomains of a specific website. For example:

podcast site:*.wordpress.com

The query above will search for the term podcast on all subdomains of wordpress.com.

This is generally completely useless, as with most websites you could have just searched for site:wordpress.com for the same effect.

In WordPress's case, however, it's not completely useless because of the unusual way how the website is programmed.

Most websites have a www subdomain where they put all their webpages, but WordPress does not. Instead of www.wordpress.com, WordPress uses its root domain, which is just wordpress.com, for its pages.

They do this because this is a website that offers web hosting to its customers. When you sign up for a WordPress account, you get your own subdomain something dot wordpress.com, so all subdomains of wordpress.com belong to WordPress' customers, and only the root domain without a subdomain belongs to WordPress itself.

This means that if we search for site:*.wordpress.com, Google will show us results from ALL subdomains (i.e. all of WordPress customers), but won't show us results from WordPress itself as those are in the root domain.

Note: WordPress is confusingly the name of a free CMS (at wordpress.org) and of a paid service (at wordpress.com).

Searching for Only the Root Domain of a Website

We can also search for only the root domain if we exclude all subdomains by combining the above with the minus (-) operator:

podcast site:wordpress.com -site:*.wordpress.com

Searching for a Specific Subdomain on ANY Website

Perhaps one of the most useless tricks I found. You can search for a specific subdomain in any website if you put the asterisk where the second level domain would be. Observe:

down site:status.*

A lot of websites and online services have a status subdomain that they use to report the status of the website. So the query above will try searching for the word "down" on them, which will probably show some websites that were down when Google indexed them. The first result for me is from status.roblox.com, for example.

Searching for a Pattern in the URL Path

For completeness, you can also put an asterisk in the path part of the site: operator. For example:

photo site:wordpress.com/*/attachment

This will include results for like /anything/attachment/whatever.

Note that you can't use site: without the domain part (e.g. wordpress.com). If you want to search only the path part of various websites, use inurl: instead of site:. For example, for the query above, we would type just inurl:attachment.

Navigation

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *