Soft 404

Share

What is Soft 404?

Soft 404 refers to a 404 NOT FOUND page that doesn't actually include the 404 NOT FOUND HTTP status code that a 404 page should have, and instead has the 200 OK status code that a webpage normally has.

A soft 404 is a problem because bots such as search engine web crawlers and archivers depend on the HTTP status code to understand whether the content of a webpage in a URL is actually the content for the resource specified by the URL or is a human-readable message sent by the web server explaining that an error occurred while fetching the resource.

If the page only has 404 written all over it, but sends a 200 OK, then it will be a page like this one you're reading right now. This page isn't a 404 page. It's a page about 404 pages. So the bots can't reliably tell if this is supposed to be a "not found" page or not. The only way to make this explicit is by sending the correct HTTP status code.

This is generally not a problem that you can solve unless you are a web developer who can modify the program that renders the web pages of your website. For example, if you're using WordPress, without any plugins and the default theme, and Google Search Console tells you you have "soft 404's," that's probably a mistake by Google. Since Google's algorithms can't reliably tell if something is 404 or not, they ALSO can't tell if something is a soft 404 or not. They can only guess based on heuristics, and these heuristics sometimes yield false positives.

For example, I currently have a URL that is flagged as a "soft 404" on Google. This is a URL for a category of posts. I created the category then never added any posts to it, so it's empty. Google found this category's URL through sitemap.xml, since WordPress publishes a sitemap of the categories automatically. When Google bot accessed the category URL, it saw a page that said "category X. No article found." Google interpreted this to mean that this is should have been a 404 page, but it got that wrong: it shouldn't be. It just looks like it should be. This is a perfectly valid 200 OK response for the URL. The URL is for the category page. The category does exist, so you get a 200 OK back. The fact that the category has no actual content in it is doesn't mean the category doesn't exist at the given URL, so it shouldn't be a 404. Although, to be fair, I shouldn't have created a category if I'm not adding any posts to it, and if I'm not adding any posts to it, perhaps I should delete it. Despite that, right now, it does exist, so it's a 200 OK.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *