Character Encoding

How computers represent text using numbers.

How Computers Represent Characters

At the most fundamental level, computers only understand numbers (specifically binary: 0s and 1s). To represent text, we need a system that maps each character to a unique number. This mapping is called character encoding.

Different encoding systems have been developed over time to handle different languages and symbols. The two most important ones are ASCII and UTF-8.

ASCII (American Standard Code for Information Interchange)

ASCII is one of the earliest character encoding standards, developed in the 1960s. It uses 7 bits to represent 128 different characters (0-127), including:

For example, the letter 'A' is represented by the number 65, 'B' is 66, and so on. Lowercase 'a' is 97.

Limitation:

ASCII can only represent English letters and basic symbols. It cannot handle characters from other languages like Chinese, Arabic, or even accented European letters (é, ñ, ü).

UTF-8 (Unicode Transformation Format - 8-bit)

UTF-8 is a modern character encoding standard that can represent every character in all the world's writing systems. It's the most widely used encoding on the web today.

UTF-8 is a variable-length encoding, meaning characters can use 1 to 4 bytes:

Advantage:

UTF-8 is backward compatible with ASCII (same encoding for ASCII characters) while supporting over 1 million possible characters. It's efficient and universal.

Character Encoding Calculator

Enter text below to see how each character is encoded in both ASCII and UTF-8. Non-ASCII characters will show "N/A" for ASCII encoding.

Common Examples

ASCII Characters

  • 'A' → 65 (0x41)
  • 'a' → 97 (0x61)
  • '0' → 48 (0x30)
  • ' ' (space) → 32 (0x20)

UTF-8 Multi-byte

  • 'é' → 0xC3 0xA9 (2 bytes)
  • '世' → 0xE4 0xB8 0x96 (3 bytes)
  • '🌍' → 0xF0 0x9F 0x8C 0x8D (4 bytes)