What is Code?
A code is something that has a meaning only to those that can understand it. For example, #FF0000
is a code for the color red, and 0xFF
is a code for the number 255. More specifically, a code takes information and encodes it in a medium, such as text or image, such that anyone can see the image or text, but only those that understand the code are able to decode the information encoded in it. In our example, anyone can read and, most importantly, write the text codes #FF0000
and 0xFF
without having any idea of what they mean.
There are many types of codes like this that you can read, write, and copy-paste, but that people barely even bother to call codes: URLs are code (e.g. https://www.example.com/
), e-mail addresses are code (e.g. [email protected]
), filepaths are code (e.g. C:/examples/example.jpg
), keyboard shortcuts like Ctrl+C
are code. I mean what do you mean "+"? That makes no sense for someone who has never used a keyboard shortcut.
We also have code as images, like bar codes and QR codes.
With computers, there's a third type of medium, perhaps even more fundamental: bits and bytes. We can encode information as bits and bytes, such that it's possible to inspect what bits and bytes we have without having any idea of the meaning they represent. For example, the bits 01111010
can mean different things depending on what we have encoded. If we're encoding an 1-byte integer number that can range from 0 to 255, that's probably1 the decimal number 122
. On the other hand, if these bits are encoding text using ASCII encoding, then 01111010
means the letter z
. For a more complex example, the color data of a transparent pixel in RGBA format typically occupies 4 bytes in memory (1 byte for each channel: R, G, B, A). The typical integer also occupies 4 bytes in memory. In old, 32-bit CPUs, an address in memory is also a 4 bytes. This means a random combination of 32 bits can be any of these 3 things.
Just like humans, the computer isn't born knowing what a code means, be it in text format, image format, or binary format. There is one exception, for computers at least. CPUs work by executing a binary code called machine code. Generally, this machine code is a long sequence of bytes, the first byte being an operation code (or opcode) that tells the CPU what it should do, and the next bytes are details about the operation. For example, adding two numbers together is one operation. It's this machine code that programs the computer. Originally, human programmers read and wrote machine code, but as the computer machines became more and more complex, this became more and more difficult, so they switched to using assembly code. Assembly is a text representation of the machine code. By being textual, it's easier for humans to understand, but this code can't be executed directly by the CPU, because the CPU only understands machine code. So it's code that the CPU and most people can't understand. To make the CPU understand it, a program called an assembler takes this assembly code and encodes it as machine code so the CPU can understand it. Or are we decoding assembly back to its original machine code form? I don't even know anymore.
We call this human-readable code that is feed into a program to create executable machine code (i.e. a program) the source code of the program. Most programmer don't write their program's source code in the assembly language, they use higher level programming languages like C, C++, Rust, Java, C#, PHP, Javascript, Python, and dozens of others. The source code written in these languages always has to pass through some program before it can be executed by the CPU. For lower level languages, like C, the program is called a compiler, and generates machine code. For higher level languages, like Python, an executable file isn't generated, instead, the source code is sent to the user's computer, and a program called an interpreter reads commands from the textual source code and executes them without turning them into machine code. This interpreter works like a virtual CPU between the source code and the real CPU. It's also possible to translate source code into an intermediary type of code that is neither human-readable nor directly CPU-executable, called byte code. In this case, the developer turns his source code into byte code, and the user has a program that can run byte code. Finally, it's also possible to compile source code or byte code into machine code in the user's computer to make things faster, a technique called JIT (Just-in-Time) compilation.
In any case, it's important to note that the C compiler program doesn't understand Python code, and the Python interpreter program doesn't understand C code. Even if you the exact same algorithm in both languages, they wouldn't be able to understand it, because the code looks all different. In other words, you could say the algorithm is encoded as a programming language, and you have to know the programming language to know what the code means. This isn't true only for programs, but for programmers as well.
$foo = $bar.BAZ;
Above we have PHP code. If you asked a programmer who never touched PHP what the code above does, he would guess it wrong. That's because while in most programming languages—including C, C++, C#, D, Go, Java, Javascript, Pascal, Python, Ruby, and Rust—the dot (.
) means member access, in PHP it means string concatenation. Everywhere else this would be $bar + BAZ
. Truly a code beyond anybody's comprehension!
References
- It's how it works in the C programming language, and everyone uses C, so that's probably the standard way of encoding an integer, but it really depends on who is encoding it. I mean, I don't know what kind of psycho would encode it in any other way, but it's possible to do it, e.g. https://en.wikipedia.org/wiki/Gray_code (accessed 2024-05-06). ↩︎
Leave a Reply