While learning more about RSS, I noticed that besides RSS feeds for the homepage and for categories, some websites also make available RSS feeds for search pages. This includes WordPress. While this future could be very useful for somebody, search is generally resource intensive compared to merely listing posts by category or tag, specially on WordPress, so I think it's a good idea to disable it.
Removing RSS Autodiscovery
The first thing we need to do is to remove the RSS autodiscovery declaration from search pages. This will remove the <link>
on search pages, but won't disable the feed for people who already know that a URL for it exists.
WordPress checks the result of the filter feed_links_extra_show_search_feed
to decide whether to print the declaration. This happens in wp-includes/general-template.php
, in the function feed_links_extra
. All we need to do is return false for it to stop showing the links. We can do this by passing the utility function name __return_false
instead of writing our own.
add_filter('feed_links_extra_show_search_feed', '__return_false');
Disabling RSS Feeds
Next we need to make WordPress stop rendering the RSS feed for the search page. Here we have a bit of a problem.
Most filters on WordPress are called after WordPress performs the default behavior. WordPress has a separate filter for RSS 0.9, RSS 1.0 (RDF), RSS 2.0, and ATOM feeds. We could add a filter to each one of them that makes WordPress exit, but when these filters are called, WordPress already searched the database and already generated the XML code for the feed, so the resources are already spent.
The code path seems to be wp-includes/template-loader.php
checking is_feed()
and calling do_feed()
, which is in wp-includes/functions.php
.
This function invokes the action do_feed_rss2
, do_feed_atom
, etc,. according to which feed was requested. There is a default handler that generates the feed. They're declared in wp-includes/default-filter.php
.
We could remove all the default handlers, but then we would have no feeds at all. I suggest not doing this. RSS is a great feature that enables everyone to follow websites without social media. It's a better idea to just disable the RSS feed for the search pages specifically.
Before calling do_feed
, template-loader.php
checks if wp_using_themes
and invokes the action template_redirect
, which means if we can stop WordPress before it does anything by handling this action. I'm not sure when wp_using_themes
would be false, but I don't see any other way to do this, so I guess this must be it.
An example of the code we would have to write:
// Removes RSS feeds for search pages.
// Test with (your site.com)/search/the/feed/
function remove_search_rss__template_redirect() {
// We're only handling the feed requests.
if(!is_feed()) {
return;
}
// We're only handling the feed requests
// that are also search requests!
if(!is_search()) {
return;
}
// Set the HTTP status header to 410 GONE.
// This ensures search engines won't index this page
// and will forget about this URL if they had
// already indexed it before.
status_header(410);
// This part isn't necessary but we're doing it anyway.
header("Content-Type: text/html");
?><!doctype html>
<html lang="en-us">
<meta charset="utf-8">
<title>Gone.</title>
<h1><code>410 GONE</code></h1>
<p>We do not support RSS feeds for search.
<p>Sorry!<?php
// Invoking exit is necessary to stop WordPress
// from processing this request normally.
exit;
}
// Removes RSS feeds for search pages.
add_filter('template_redirect', 'remove_search_rss__template_redirect');
With the code above, you RSS feeds for search pages will be disabled in WordPress, but all other feeds will continue working normally.
Observations
Yoast notes that there is also the potential for SEO spam because search feeds are indexed by default by search engines; this would be done using the URL to search for "example.com" and having it appear on the feeds1. WordPress doesn't index search pages by default, declaring robots="noindex"
in their HTML code, which would prevent this. Because RSS feeds aren't HTML, that isn't possible, and the HTTP header X-Robots-Tag
is necessary. It seems there an open issue in WordPress tracker about this2.
It's possible to make search pages less resource intensive. The way WordPress works by default is to just perform a full table scan. There is no index it can use to perform searches because it just searches all columns by default. This can be specially resource intensive when searching the body content in a RDBMS since the body is arbitrarily large so it must be stored off-table. Short texts like titles tend to be stored inside the table, which should speed up sequential reading from HDDs. See [How to Make WordPress Search Only Titles of Posts Instead of Their Whole Content] for how to change the default filter of searches.
Searching by category is always faster, because the database is made keep a list (an index) of which articles each category or tag has associated with it. When you filter by category, the database just looks at this list instead of searching for words inside every article's content.
References
- https://yoast.com/internal-site-search-spam/ (accessed 2024-11-04) ↩︎
- https://core.trac.wordpress.org/ticket/52536 (accessed 2024-11-04) ↩︎
Leave a Reply