How to Remove AI Images from Google Results

Share
In this article, we'll learn 5 different ways how to remove AI-generated images from Google search, specifically from Google images.

Images generated by AI, also called genAI, or "AI art," keep appearing in the results when we want to find genuine things. There are a few tricks we can use to make Google display fewer AI results. None of them are perfect, however: they have "false positives," you'll also end up excluding some non-AI results. Fortunately, I have collected enough methods that some of them may work better for what you're trying to search for.

Remove AI from Google: a comparison between two queries on Google Images: before: the query "light brown hair male portrait" returns mostly AI results, while the query "light brown hair male portrait -ai -prompt -generate" returns almost no AI-generated images.
Results from Google Images are full of AI-generates images. A comparison between a query and the same query with AI results removed by adding exclusions to the query.

Method 1: Using the "before" keyword

The first method to get rid of AI image results is to simply include before:2022 in your query. For example, if you search for:

beethoven before:2022

That should show you only images from before 2022, which means there will be no results for AI images, that were made after 2022.

Explanation

When Google indexes images, it keeps track of the "date" of the image. Various search engines have the same ability, although I still don't know if the "date" is supposed to be when the image was indexed, or a guess at when it was published. Regardless, in Google Images, you can filter results by date using the search operators before and after. At some point, Google Images became full of AI-generated images, so all we need to do is tell Google to return only results from BEFORE the AI image apocalypse started.

A few GenAI software are Dall-E (built into ChatGPT), which launched in 20211, Midjourney, which launched in 20222, and Stable Diffusion, which launched in 20223. So 2022 sounds like a good year to use with this operator.

False Positives

Naturally, this method also excludes all non-AI results made after 2022, but it's perfect for searching for photos of famous, historic figures.

Method 2: Exclude "ai"

Another way to get rid of AI image results is to type -ai in your query, like this:

illustration camel -ai

This should remove all results that contain the word "AI." We can also try similar keywords. For example, websites full of AI-generated images are likely to have a "generate" button or call-to-action somewhere, so we could try something like this:

illustration camel -ai -generate -"stable diffusion" -midjourney -"dall-e"

Warning: most people don't use the exclusion search operator, so if you use a lot of them like this, Google may find your behavior "unusual" and will give you some captchas to solve to prove that you aren't a bot trying to DDoS Google.

Observation: terms in Google are case-insensitive, so -ai and -AI mean the same thing.

Explanation

Google's exclusion operator removes results that contain a word, and a lot of times the pages that contain AI results will also contain the word "AI." Unfortunately, this doesn't happen every time. For example, sometimes people upload AI-generated images to websites like Pexels, which do not really have any description about what the image is, so there is nowhere to write "this was made with AI."

False Positives

Many artists are against AI-generated images, so they may write, on the same webpage their artwork is embedded, phrases such as "this isn't AI," or "AI can't do this," or they may get accused of using AI in the comments, or there may be, in a sidebar, even, a post promoting the "no-AI" movement.

Google isn't smart enough to tell apart "AI" from "no AI," as in the text, literally. Because the phrase "no AI" contains the word "AI," if you type -ai, Google will also exclude all pages that contain the phrase "no AI."

Method 3: Exclude Generation Parameters

A third method you can try is to exclude terms commonly found in webpages that feature AI-generates images, such as:

  • -prompt
  • -seed
  • -checkpoint
  • -steps
  • -model
  • -CLIP
  • -CFG
  • -sampling
  • -"sampling method"
  • -karras
  • -euler

False Positives

Beware that "model" can exclude all photos that contains a person who is a model, modelling for the photo. "Prompt" could exclude the phrase "he was prompted." "Seed" is probably only a problem with plant-related images. "CLIP" will exclude clip art. The safest seems to be "checkpoint," but that isn't as good for excluding as "prompt."

Explanation

Text-to-image generators use a text "prompt" as input to generate an image output. Normally, you would type something like this:

blonde man eating a hamburger

I do not know how every single one of these generators work. I've tried Stable Diffusion (SD) because I can run it locally with my mid-range graphics card.

In SD, to generate an image, you need something called a "model" or "checkpoint." This is a single file of 4 gigabytes that contains the parameters of the AI model, also called the "weights." It's called a "checkpoint" because you could take one model and train it further, making the weights change, so that would be a different "checkpoint" of the training.

The image generator works in a few stages. First, a program called "CLIP" converts the text prompt into its numeric representation, into tokens. This ensures words like "eating" and "eat" and "man" and "men" are understood similarly by the next stage despite being spelled different. Then the diffuser program takes a base image (such as a photo) and changes its pixels according to the token values.

This change is scaled by a value called "CFG scale."

It does this in multiple sampling "steps," according to some "sampling method," such as "karras" or "euler."

Programs always generate the same output from the same input, so in order for a single prompt to generate different images of a "blonde man eating a hamburger," there must be some input that varies with each generation. For computer programs, this is a randomness comes from a subprogram that outputs a single random number each time it's executed. That's because the input of this subprogram is always the last number it generated. This means the "random" function always generates the same numbers in the same order. What decides the currently random number is simply which number we started with, called the random "seed," and how many numbers we have generated so far. As programs always perform the same tasks from the same input, the number of numbers we will generate is always going to be the same for a given starting seed. The only thing that matters, then, is what the random seed is. By default, the seed could be set based on what the current time is, since that sounds random enough. However, just like in roguelike games, and in Minecraft world generation, it's possible to manually set the seed to a specific number. Done that, the program will always generate the same image from the same prompt, since everything is the same.

In SD's case, the random number is used to generate "noise" that is added to the base image. If you don't have a photo, you would use a blank image, and then the whole thing is noise, the whole thing is random.

You'll notice that in many websites that feature AI-generated images, you can find the parameters above: the prompt, the seed, the sampling method, etc. In some websites, you can also find the checkpoint used to generate the image.

In fact, popular frontends for SD will include the parameters used to generate the image inside the PNG file it generates (inside the PNG metadata fields), so you can take a PNG generated with frontends like Automatic 1111 or Comfy, open the PNG file with Notepad, and just read the prompt from the PNG file's bytes as text.

For the most part, the reason they do this is that if you take all of these parameters, you can generate exactly the same image.

However, there is one huge caveat.

As I mentioned previously, I can run Stable Diffusion in my old, mid-range, 4 GB graphics card. But in order to generate anything, I need the checkpoint. You'll notice that many websites that offer the ability to generate images on the website, for a fee, will not tell you what checkpoint is being used to generate those images. They could be a custom checkpoint they trained themselves (unlikely), they could have taken SD's base checkpoint and trained a bit on top of it, or they could just be using something someone else made. In any case, they don't want to tell you.

So you can find the word "prompt" in some of these websites, but "checkpoint" is less likely to be written.

Method 4: Exclude prompts

A fourth method you can try is to exclude terms common in prompts of AI-generated images. For example:

forest
-masterpiece -4k -8k -wallpaper
-lowres -watermark -blurry -signature

Explanation

Because SD is trained on an absurd amount of images from the Internet, there is zero quality control regarding what images the model was trained with. They basically just did what Google Images does: they took an image from the Internet, they took words from the webpage it was found, and they "trained" the model with that. Just like in our "no AI" example, it's very easy for this to go wrong. There are many images on the Internet that could be in webpages that have nothing to do with the image, so the text is completely irrelevant.

However, because there are so many images, "on average," the text is going to be relevant, so SD's model becomes "approximately," "more or less" correct.

People who generate images do not want their images to look like something between the best images on the Internet and the worst images on the Internet. SD has no idea what "blonde" means. It simply sampled a lot of images that had "blonde" next to it, and statistically that means part of the image near the top is yellow-ish. That's all SD knows. Let's think about what sort of webpage would have this phrase. High quality stock images, I guess. And also news reports with low-quality photos like "blonde man suspected of" or "seen doing," etc.

In order to control for quality, the positive prompt needs to include words that makes SD deviate toward higher quality images. SD also supports negative prompts, which can be used to make it deviate away from low-quality images.

Common keywords found in positive prompts are "masterpiece," "4k," "wallpaper," "high quality," "high resolution," "highres," and so on.

The problem with these is that they're very likely to be found in the non-AI images you want to find.

Common keywords found in negative prompts are "lowres," "low resolution," "low quality," "blurry," These are great because you don't want to find images qualified by these either!

Another important set of negative keywords are those used to remove artifacts from images. For example, I found an image of a forest generated by AI on the Internet. It has this negative prompt:

ugly,
tiling,
poorly drawn hands,
poorly drawn feet,
poorly drawn face,
out of frame,
extra limbs,
body out of frame,
blurry,
bad anatomy,
blurred,
watermark,
grainy,
signature,
cut off,
draft

I'll repeat: this is for an image of a forest. No people in it.

Some of these make sense. No tiling images, like textures. No blurry or blurred images. No watermarks. No grainy images. No signature on the image. This will make SD ignore images that feature these things, assuming the webpages were the images were found actually contained these terms, which sounds strange, because I don''t think someone would write "my blurry, grainy, watermarked photo with my signature on it." Then again, I'm no prompt writer.

Why does this negative prompt for a forest have so many keywords that are obviously about people? Because the people generating the images are too lazy. Nobody is going to write a custom negative prompt for every single thing they are going to generate. That takes too much effort. They just saved a negative prompt that seems to work well, and then they reuse this prompt for everything. And in many cases, they don't even know how it works themselves, they are just copying what someone else did, and maybe even the person who wrote it doesn't know if it really works, and it may not even work with every single checkpoint.

For example, does excluding "poorly drawn hands" actually do anything? I find that hard to believe. Consider all the images on the internet. How many of them have the "poorly drawn hands" explicitly written beside them? Who even says "poorly?" Why not "badly drawn hands"? Can the CLIP program actually convert "poorly" to a similar token as "badly"?

It's a very complicated process. Comfy's UI makes it a bit easier to understand what is happening with nodes, but it's still mostly an undebuggable black box for me.

Method 5: Block Websites

A last method is to exclude websites that contain AI-generated images and offer AI-generation services. In Google, this can be done with the site: operator, like this:

male light brown hair
-site:nightcafe.studio
-site:craiyon.com
-site:starryai.com
-site:lexica.art
-site:stablediffusionweb.com
-site:playground.com

Warning: as mentioned previously, using too many exclusion operators will make Google think you're a bot.

Bonus: Use Another Search Engine

If you depend a lot on search, an alternative would be to use a search engine that lets you block websites from appearing on your search results. The Kagi search engine offers this as a paid service. I think Brave Search, a free search engine, offers this functionality as well through its "goggles," but I'm not sure if there are any goggles for blocking AI websites.

Navigation

References

  1. https://venturebeat.com/business/openai-debuts-dall-e-for-generating-images-from-text/ (accessed 2024-07-20) ↩︎
  2. https://www.pcworld.com/article/820518/midjourneys-ai-art-goes-live-for-everyone.html (accessed 2024-07-20) ↩︎
  3. https://stability.ai/news/stable-diffusion-announcement (accessed 2024-07-20) ↩︎

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *