Plain Text

Share

What is Plain Text?

Plain text is a piece of data in the computer, such as a file, that encodes a text as a simple sequence of characters. Plain text files typically have a .txt file extension. The term "plain text" contrasts with "rich text," which would Microsoft Office Word (.doc) or LibreOffice Writer (.odt) files. A notable difference is that plain text doesn't support "bold" and "italic," nor does it really have the concept of paragraphs, pages, margins, bullet lists, etc.

For example, "Hello world!" is a sequence of characters: the first character is "H," then "e," the "l," and the last is "!." In the computer, this text must exist as bits and bytes, so there must be an algorithm that tells us what permutation of bits is associated with which character. This is called a text encoding, such as ASCII and UTF.

On Windows, you can open and edit most plain text files with Notepad, which supports the ANSI and UTF encodings. Notepad++ supports a lot of other encodings. Generally this doesn't matter because everyone uses UTF nowadays, however, in the past, you could have a situation where a text was encoded in "Latin1" encoding, or the Japanese "SHIFT-JIS" encoding, and a text editor would display the wrong characters for the bytes in the file. For the record, plain text files don't have metadata that tells what character set they were encoded with, so a program that supports multiple encodings would have to detect the encoding heuristically.

In UTF-8, assuming all characters are valid in ASCII, a sequence of 12 characters like Hello world! would require exactly 12 bytes of memory to be encoded, one for each character. If there's an accented letter, or it's a character from CJK languages (Chinese, Japanese, Korean), then the amount of bytes required to encode a single character changes.

In plain text, spaces are characters. More specifically, there are multiple whitespace characters: the space character ( ), the tab character, the carriage return character, and the line feed character. Each of these characters is encoded as one byte in UTF-8. The line feed character is also called the "new line" character. When you press the Return key to add a new line in text, what you're really doing is, often, inserting a new line character into the sequence.

Source Code

With some esoteric exceptions, all source code is plain text. This means when programmers program programs, they're writing plain text source code files.

Source code files normally don't have a .txt file extension like normal plain text files but instead an extension that describes the language of the source code. For example, .html is used for HTML (Hyper-Text Markup Language) files, .py is used for Python scripts, etc.

These files can be opened and edited in any plain text editor. This means you could theoretically program anything ever programmed used the Notepad that comes with Windows. That is an extremely terrible idea, but it's possible.

If you're dealing with source code files a lot, it's recommended to use a specialized source code editor like Notepad++ or Visual Studio Code, or a specialized IDE. These programs have features like syntax highlighting, the ability to display whitespace characters, and folding of code blocks that can help working with source code a lot.

Code in Social Media Posts

In many social media websites, you are only allowed to write plain text in a post, but the plain text you write isn't just text, it's actually source code.

If you write #HappyNewYear in a post and that turns into a link to the hashtag, that's no longer plain text, because links don't exist in plain text.

What happened is that the social media has a program that interprets the plain text in your post and formats certain text codes it finds within the plain text. So the post's contents are actually code. You're officially a coder now!

Markdown

A common source code language used on social media is called markdown. It's a markup language based on how text was written on e-mails. For example, Reddit and Discord support markdown, although in different flavors (allowing different features).

In markdown, you would have *italic* surrounded by one asterisk, **bold** surrounded by two, and ***italic and bold*** surrounded by three. Some flavors use underscores (_) instead. Tildes make text ~struck through~.

A link is written like this:

[Link text](https://example.com/)

A bullet list is written with asterisks at the start:

* First item.
* Second item.

And quote blocks start with greater sign (>). If you have ever seen a mailing list, they're full of these.

>>Do you know markdown?
>No.
You do now.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *