How Adding RSS Feeds from Webpage URLs Works

Share

2024-10-23

Some RSS clients support adding an RSS feed using a URL that locates a webpage, while others don't. This means that in some RSS clients, we can give them the URL https://nasa.tumblr.com/ and it will work, while others will tell us it's invalid, only working if we give them the URL https://nasa.tumblr.com/rss. In this article, we'll learn why.

The RSS feeds that you add to an RSS client is a file containing XML code. When you give an RSS client a URL, it downloads a XML file from the URL, and that's the RSS feed that gets added. If the URL you give the RSS client isn't a URL for a XML file, the RSS client generally won't know what to do with it, and will tell you that the URL is invalid.

However, some RSS clients support adding RSS feeds from URLs that locate webpages instead of XML files. How is this possible?

A webpage is a file containing HTML code. When you access a website, your web browser downloads a HTML file located by the URL in your address bar, interprets the HTML code, and from that it shows you something on the screen that we call a "webpage." This HTML code can contain all sorts of metadata that aren't displayed by web browsers but can be used by all sorts of other programs that can also download the HTML code via the same URL. One such metadata declares what RSS feeds are associated with that webpage.

This means that a smart RSS client can download the HTML code of a webpage and interpret it to find what are the URLs for the RSS XML files associated with it.

For example, https://nasa.tumblr.com/ is a URL for a webpage, which means the HTTP response you get from the web server serving this website will be a HTML file for the URL path /. Inside this HTML file, we can find the following HTML code:

<link
 rel="alternate"
 type="application/rss+xml"
 href="https://nasa.tumblr.com/rss"
>

This HTML code declares a relationship between the webpage and another URL which it makes reference. The relationship is that the target URL, https://nasa.tumblr.com/rss, is an alternate version of the webpage. The only thing that species what sort of alternative it is, is the declared MIME type, which is application/rss+xml. In MIME type code, +xml means the file contains valid XML code that can be parsed by any application that can parse XML.

A smart RSS client will search for any <link> element with a rel="alternate" attribute and a type="application/rss+xml" attribute, and use the URL in its href attribute to figure out what is the URL of the RSS feed that it should download.

With that, you can just give the RSS client the URL https://nasa.tumblr.com/ and it will automatically figure out what to do with it.

Without that, you'll have to figure out the RSS URL yourself, which can be very troublesome. Normally, the process would be that the webmaster places an RSS logo icon somewhere that is a link to the RSS feed URL, however, that seldom happens, specially nowadays as more and more people rely on social media to subscribe to things, and so fewer and fewer know about RSS.

There are also cases where a single webpage may have multiple RSS feeds associated with it, specially if it's a homepage, e.g. a feed for all articles, and a feed for the most recent comments posted on the entire website. Some RSS clients, like Akregator, will only load the first RSS feed it can find, and will silently ignore the rest. In this case, a title attribute in the <link> element can be used to distinguish them.

It's worth noting that WordPress, which is used to create millions of websites on the Internet, supports RSS feeds by default. You can access it via /feed on any WordPress website, and this includes RSS feeds for all categories and tags in the website (e.g. /movies/feed). However, that doesn't mean the website will have a visible link to its RSS feed that you can right click to copy its URL. It's possible the website owner doesn't even realize that they have RSS feeds, and they might not even know what RSS feeds are. This means you'll be able to find the URL in te source code of the webpage (accessible via Ctrl+U on Chrome), but not by looking at the webpage itself.

I'm sure there are some browser extensions that can tell you when you're visiting a webpage that has a feed associated with it. Some web browsers, like Vivaldi, have RSS support built-in. When you access a webpage that has a RSS feed in Vivaldi, it shows the RSS logo on the address bar.

A web browser's tab titled "virtual curiosities." Under it, the address bar showing a slashed shield icon, a padlock icon, the address www.virtualcuriosities.com, a RSS icon, a bookmark icon, and a downward arrow. — Vivaldi's address bar with RSS and bookmark icons at the right of the accessed URL.

I recommend using Vivaldi or a browser extension that provides similar functionality because you'll end up discovering that some websites support RSS that you would never know about otherwise. See [How to Use RSS to Subscribe to Search Results on Bing] for one of my findings.

How Adding RSS Feeds from Webpage URLs Works

Comments

Leave a Reply Cancel reply