Categories Declared in an RSS Feed

Share

It's possible for an RSS feed to declare in its XML code what categories are associated with each article. These would be the categories of the website. For example, if a website has an "Entertainment" category, then every article in this category would have a declaration in the RSS feed that says it belongs to the "Entertainment" category.

It's only possible to declare "categories" in RSS, which means if a website uses tags or hashtags inside the website, it would have to call them "categories" in an RSS feed. In fact, the website would be free to use anything as a category even if it doesn't match the categories of the article in the website.

Unfortunately, support for this feature appears to be abysmal.

RSS Client Support

To my knowledge, the only RSS client that can automatically tag articles with their categories from an RSS feed is Thunderbird, the e-mail client. Observe that if two different website has a same "Entertainment" category, they would both get tagged with the same tag in Thunderbird, so Thunderbird provides the ability to specify a "prefix" that will be added to all categories imported from the RSS feed.

There is no space character added automatically after this prefix, so it's a good idea to avoid using a prefix that should end in whitespace, because it's hard to tell if you typed it or not. For example, if you want to prefix "coolsite," instead of [coolsite] ending in a space, a better prefix would be /coolsite/. It's important that all prefixes start with the same character (in this case, /), because if they're sorted alphabetically (they are in Thunderbird), you'll be able to quickly tell where the imported tags start. The way Thunderbird sort tags is such that tags starting with / will appear AFTER the ones that start with letters, so your manual tags will appear first, while the imported tags appear last.

It's worth noting that Liferea have the ability to display the categories (they appear after "Filed under"), but it doesn't have the ability to make use of them.

XML Code

For reference, the XML code that tells us what the category of an article is in RSS would be this:

<category><![CDATA[Category name goes here]]></category>

Each <item> element in RSS 2.0 may have multiple of these <category> elements.

In the ATOM format, the code is different.

<category scheme="https://www.example.com" term="category-id-123" label="Category name goes here" />

Every <entry> may have multiple <category> elements.

This may look a bit more complicated. According to the ATOM specification [https://www.ietf.org/rfc/rfc4287.txt], term is required, while scheme and label are optional.

You can imagine that, if an RSS client that actually supported it existed, you could have two different websites with an "Entertainment" category but they would be treated as different categories if they belonged to different categorization schemes.

It would also be possible for you to change the spelling of a category in a website without breaking the relationship with tags in users' RSS clients by using the label attribute for user-facing text and the term attribute for identifiers.

I'm not very sure you would actually want that either of these things to happen, but I assume this is what the spec was intended for.

Unfortunately, the way it's implemented in WordPress is that instead of using a category's numeric ID for term, WordPress uses the category's name for term. I wonder if this is because clients don't actually support label?

Comments

Leave a Reply

Leave your thoughts! Required fields are marked *