Character Encoding Explained

How Computers Represent Characters

At the most fundamental level, computers only understand numbers (specifically binary: 0s and 1s). To represent text, we need a system that maps each character to a unique number. This mapping is called character encoding.

Different encoding systems have been developed over time to handle different languages and symbols. The two most important ones are ASCII and UTF-8.

ASCII (American Standard Code for Information Interchange)

ASCII is one of the earliest character encoding standards, developed in the 1960s. It uses 7 bits to represent 128 different characters (0-127), including:

Control characters (0-31): Non-printing characters like newline, tab, backspace
Printable characters (32-126): Letters (A-Z, a-z), digits (0-9), punctuation, and symbols
DEL character (127): Delete control character

For example, the letter 'A' is represented by the number 65, 'B' is 66, and so on. Lowercase 'a' is 97.

Limitation:

ASCII can only represent English letters and basic symbols. It cannot handle characters from other languages like Chinese, Arabic, or even accented European letters (é, ñ, ü).

UTF-8 (Unicode Transformation Format - 8-bit)

UTF-8 is a modern character encoding standard that can represent every character in all the world's writing systems. It's the most widely used encoding on the web today.

UTF-8 is a variable-length encoding, meaning characters can use 1 to 4 bytes:

1 byte (0-127): Compatible with ASCII - same characters, same numbers
2 bytes (128-2,047): Latin extended, Greek, Cyrillic, Arabic, Hebrew, etc.
3 bytes (2,048-65,535): Most Asian languages including Chinese, Japanese, Korean
4 bytes (65,536-1,114,111): Emoji, rare symbols, historical scripts

Advantage:

UTF-8 is backward compatible with ASCII (same encoding for ASCII characters) while supporting over 1 million possible characters. It's efficient and universal.

Character Encoding Calculator

Enter text below to see how each character is encoded in both ASCII and UTF-8. Non-ASCII characters will show "N/A" for ASCII encoding.

Enter text:

Character Breakdown:

Character	ASCII Decimal	ASCII Hex	UTF-8 Bytes (Hex)	UTF-8 Bytes (Binary)

Total Characters

0

UTF-8 Bytes

0

Common Examples

ASCII Characters

'A' → 65 (0x41)
'a' → 97 (0x61)
'0' → 48 (0x30)
' ' (space) → 32 (0x20)

UTF-8 Multi-byte

'é' → 0xC3 0xA9 (2 bytes)
'世' → 0xE4 0xB8 0x96 (3 bytes)
'🌍' → 0xF0 0x9F 0x8C 0x8D (4 bytes)