How computers represent text using numbers.
At the most fundamental level, computers only understand numbers (specifically binary: 0s and 1s). To represent text, we need a system that maps each character to a unique number. This mapping is called character encoding.
Different encoding systems have been developed over time to handle different languages and symbols. The two most important ones are ASCII and UTF-8.
ASCII is one of the earliest character encoding standards, developed in the 1960s. It uses 7 bits to represent 128 different characters (0-127), including:
For example, the letter 'A' is represented by the number 65, 'B' is 66, and so on. Lowercase 'a' is 97.
Limitation:
ASCII can only represent English letters and basic symbols. It cannot handle characters from other languages like Chinese, Arabic, or even accented European letters (é, ñ, ü).
UTF-8 is a modern character encoding standard that can represent every character in all the world's writing systems. It's the most widely used encoding on the web today.
UTF-8 is a variable-length encoding, meaning characters can use 1 to 4 bytes:
Advantage:
UTF-8 is backward compatible with ASCII (same encoding for ASCII characters) while supporting over 1 million possible characters. It's efficient and universal.
Enter text below to see how each character is encoded in both ASCII and UTF-8. Non-ASCII characters will show "N/A" for ASCII encoding.
Character | ASCII Decimal | ASCII Hex | UTF-8 Bytes (Hex) | UTF-8 Bytes (Binary) |
---|
0
0