How File Sizes Work

Share

In a computer, the size of a file is measured in bytes, kilobytes, megabytes, gigabytes, terabytes, petabytes, and so on, or, alternatively, in kibibytes, mibibytes, gibibytes, and so on. One byte is equivalent to 8 bits of data, or 23 bits, the bit being the smallest unit we have to measure data. One kilobyte is equivalent to 1000 bytes of data, but, because data sizes generally work in powers of 2, we also have the kibibyte which is equivalent to 1024 bytes, or 210 bytes. Besides these, we also have kilobits and kibibits which are multiples of bits instead of bytes, so they are 8 times smaller, and create a lot of confusion when used to measure download and upload speeds.

One bit can only store a binary value: whether something is true or false, a signal on or off, and so on. But with multiple bits it's possible to interpret them as binary numbers. With 8 bits, or 1 byte, we can count from 0 to 255. It's possible to use numbers to represent text characters. For example, if we say 0 is A, and 1 is B, an 2 is C, and so on, we only need numbers from 0 to 25 to represent the whole alphabet. One character encoding system like this is called ASCII, and uses one byte for each character. This means a basic text file with 500 bytes has 500 characters (a character can be letters, numbers, spaces, punctuation, and line breaks).

Meanwhile, for images, the most basic format is called the bitmap. In a bitmap, an image is stored as a grid of tiny squares called pixels. Each pixel has a color, which is stored as values for the three light channels red, green, and blue. Each channel has a value of 0 to 255, or one byte. So each RGB color requires 3 bytes of data. Transparent images have an extra channel for opacity, called alpha, which also takes one byte, and so the RGBA color format would take 4 bytes of data. A RGB bitmap that's 300 pixels per 300 pixels would take 3 bytes (RGB) times 300 times 300, or 270000 bytes of data (270 kilobytes). This is a very small image. The average image has millions of pixels so it would take many megabytes to store it this way. Instead, what normally happens is that an image is stored on disk not as a simple bitmap but in a format that support compression, such as JPG. The compression makes them much smaller but loses some detail that's generally not perceptible to the human eye. When an image is displayed on the screen, however, it can't be compressed, your computer has to have it as a full bitmap. For example, to load a transparent, square image that's 2048 pixels by side, you need 16 mibibytes of RAM.

When data is stored in either RAM or disk, it's common for it to take more space than it needs because the space reserved is rounded up. It's typically not possible for a program to use just 1 bit of RAM, because RAM is addressed per byte, if you need to store data in memory, you end up taking at least one whole byte of RAM no matter how few bits you need. For files in the disk, the idea is the same but the extra space reserved is much bigger. It's normal for a file to take several kilobytes more of disk space than it needs to store its data. For example, if the file system splits the disk into 4 kibibyte spaces, and you have a 13 kibibyte file, it can't be stored into 3 spaces and one quarter, it has to take 4 whole spaces (16 kibibytes), so 3 kibibytes are taken by the file, but not used. When you have hundreds and hundreds of files, it adds up and you end up with many megabytes or even gigabytes of wasted space.

Finally, when talking about download and upload speeds, the unit of measure used is sometimes confusingly based on bits instead of bytes. For example, if your internet provider said you have 80mbps of download speed, you would expect that to mean 80 megabytes per second, but if those are 80 megabits, then you only have 10 megabytes per second. Technically, the bit unit is supposed to be written with a lower case b when abbreviated, while the byte unit becomes an upper case B, so 80mbps is megabits, while 80mBps is megaBytes. But even if internet providers followed this rule and used the term megabit per second, the average person wouldn't know the difference and so confusion seems unavoidable.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *