Content of an Article in an RSS Feed

Share

2024-10-26

In an RSS feed, each declared article has a piece of content associated with it that can be the entire article or a snippet. This means that when your RSS client fetches the RSS feed's XML file from the URL you gave the client, the XML code may contain the whole article, all the paragraphs, links, and images, just as you would see on the webpage associated with the RSS article, or it may contain only a summary of the whole webpage.

This is decided by the website that generates the RSS feed. Different websites have different reasons to support RSS, so sometimes they include everything, sometimes they don't.

Snippets as Article Content

If a website is monetized with ads, it makes money off people viewing webpages on the website, it doesn't earn anything from people reading articles from the RSS feed. This means that they don't have a reason to include the entire article in the RSS feed.

Note that generating the RSS feed and transmitting it to RSS clients costs money. If everyone read the article in an RSS client, the website wouldn't be able to pay for the web server that hosts and serves its content. It would become unprofitable and shut down. Then nobody would have articles to read anymore.

Some RSS clients appear to antagonize webmasters of such websites, including inside of them ad blockers, or even directly scraping the content of articles to display inside the RSS client. It's worth noting that if someone scraped the content of your articles to republish them in a different website for everyone to read, you would have the legal right to sue them for copyright infringement. However, because the RSS client is an application, it seems to get a pass because there is no third-party involved in reproducing the content: it's your PC accessing the website, and that's it. Personally, I don't think this makes a lot of sense. The fact that you're "able" to scrape the content is a technicality. You're also "able" to punch people in the face. The laws of physics don't prevent you from doing that, just like HTTP doesn't prevent you from scraping. That doesn't mean it should be legal. The website has a business model that includes displaying ads, and its content is only available through HTTP because there is no other way to distribute it. Exchanges are made between two people, not between two computers.

Websites that Publish Full Articles as RSS

Websites that are not monetized with ads will gladly include all their content inside the RSS feed. For example, if it's an RSS feed with announcements by some institution or project, their objective is to make that information reach as many people as possible, so they will include the whole article.

Akregator's window with three vertical window panes, the rightmost being its built-in web browser. — Akregator showing an articled published by Planet KDE, an RSS feed that aggregates articles from various KDE contributors' blogs.

Image, Audio, and Video in RSS Content

There are at least four different ways to include an image, a piece of audio, or video in an RSS feed's XML code. Let's understand how they work.

References vs. Embeddings

Before we continue, it's worth noting that multimedia files are practically always referenced in RSS by a URL, they aren't embedded.

This means if you see an image, audio player, or video player in your RSS client, that's not part of the content of the article that comes in the XML file. This is important to understand because RSS clients can archive the article content permanently in your PC, but the referenced content won't be subject of this archival.

For example, let's say you subscribed to a podcast via RSS. The RSS client may display an audio player to play the audio. This audio isn't in the XML saved in your PC, it's in the Internet, on some web server. When you click play, your RSS client downloads the audio data to play it.

This means if the web server that hosts the audio shuts down, you won't be able to play the audio even though you have an article for that episode of the podcast in your "local" RSS client. Similarly, images part of the article are downloaded from the website when you try to view an article.

Accidental Scraping

Generally speaking, websites don't like people downloading their images from the images' URLs without seeing the webpages of their websites. That's because images can be pretty heavy, so they can be costly, specially if there are lots of people are downloading the image. For video the problem is naturally bigger because video is just bigger. This is specially true when you consider that many websites are monetized with ads, and if you just look at the image directly, you don't see any ads that would appear only on the webpage that contains the image.

The practice of embedding an image from one website in another website is called hotlinking, and it's generally frowned upon. If a large website or forum hotlinks an image hosted in a small website, the large volume of users of the large website will overwhelm the small website's server.

We can imagine the same problem may occur with RSS. If everyone used RSS, and they were all downloading images directly from their RSS clients, the website would see a lot of cost with no way to make revenue from it. To be fair, not many people use RSS, and websites have their images downloaded all the time by all sorts of bots, like Google's image search spider, so I doubt this would ever become an actual problem in practice. However, this simply doesn't feel very sustainable for me.

Your RSS client could download all the image, audio, and video data of all articles, even the ones you didn't open to read, in order to archive everything permanently, but doing so would be a massive cost for the website. Note that the website is created with the assumption that humans will view the webpages, and humans only view a small number of pages per hour. A scraping program can download the entire website automatically. Nobody creates a website thinking people are just going to do that.

Avoid exploiting the websites that you subscribe to.

As Enclosures

RSS supports referencing arbitrary files within the feed via URL. These are called "enclosures" in the RSS 2.0 specification, being declared with <enclosure> in XML code.

Some RSS clients call them "attachments."

As HTML

The content of an RSS feed may be in HTML format, and if it is, it may include images with the <img> tag, which can be and often is a URL to an image. The same applies to audio and video, with <audio> and <video> tags.

Vivaldi's window displaying a list of RSS feeds, a list of articles with one article selected, and the contents of the selected article in three vertical panes. — Vivaldi displaying an RSS article's content. The image shown was declared with an `<img>` tag.

Special Extensions

Because RSS feeds are written in XML, the eXtensible Markup Language, it's possible eXtend them with more language than they initially had by declaring additional XML namespaces with xmlns. Let's take a look at how this is done with embed multimedia in RSS feeds.

Media RSS Extension

Some RSS feeds declare media content using the Media RSS extension (http://search.yahoo.com/mrss/).

Notably, Peertube and Vimeo use it in their video feeds.

Awkwardly, none of the RSS clients I've tested support it. It seems the only desktop RSS client that supported it was QuiteRSS. It only added support in its last version. "Last" because after that a component it used became deprecated because it was unsecure, and it was too much effort to replace the component, so it became unmaintained.

Youtube Extension

Youtube has RSS feeds for Youtube channels, see [How to Use RSS to Subscribe to a Youtube Channel] for details. In these RSS feeds, there an additional XML namespace that you won't find in your typical RSS feed.

<feed
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:yt="http://www.youtube.com/xml/schemas/2015"
><entry>
  <yt:videoId>dQw4w9WgXcQ</yt:videoId>
  <yt:channelId>UCuAXFkgsw1L7xaCfnd5JJOw-</yt:channelId>
</entry></feed>

An RSS client that supports this Youtube extension could use the provided data to embed a Youtube video player into the RSS client, or simply generate a link to the Youtube video from the declared video ID.

https://www.youtube.com/watch?v=dQw4w9WgXcQ

It seems that Vivaldi does support it. My reason for believing this is that when you add a Youtube feed to Vivaldi, it shows a video player before the "Open article" it includes in every article, while when you add a Peertube feed to Vivaldi, it displays a "linked video" after the "Open article" link. Peertube uses <enclosure> for its video content.

Youtube also provides an enclosure with the MIME type application/x-shockwave-flash that just links to the video player instead of an actual .swf file. It seems Liferea uses this to display the player at the bottom of the content, where podcast's audio players (also <enclosure>) normally appear.