What Is Character Encoding?
Computers store text as numbers. A character encoding defines how letters, digits, and symbols map to numeric values and bytes. When sender and receiver use the same encoding, text is displayed correctly.
ASCII
ASCII (American Standard Code for Information Interchange) is a 7-bit encoding with values from 0 to 127. It covers English letters, digits, punctuation, and control codes.
- Uses 1 byte in storage, but only 7 bits are defined.
- Great for basic English text and legacy systems.
- Cannot represent characters like ä, ö, å, €, or emoji.
| Character | ASCII Decimal | Binary (7-bit) | Hex |
|---|---|---|---|
| A | 65 | 1000001 | 0x41 |
| a | 97 | 1100001 | 0x61 |
| 0 | 48 | 0110000 | 0x30 |
| Space | 32 | 0100000 | 0x20 |
UTF-8
UTF-8 is a variable-length encoding for Unicode. It can represent almost every written character in modern computing. It uses 1 to 4 bytes per character.
- ASCII characters keep the same byte values in UTF-8.
- Common Latin characters with accents often use 2 bytes.
- Emoji and many symbols use 4 bytes.
- It is the standard encoding for the web.
| Character | Unicode Code Point | UTF-8 Bytes (Hex) | Byte Count |
|---|---|---|---|
| A | U+0041 | 41 | 1 |
| ä | U+00E4 | C3 A4 | 2 |
| € | U+20AC | E2 82 AC | 3 |
| 😀 | U+1F600 | F0 9F 98 80 | 4 |
ASCII vs UTF-8 in Practice
For plain English text, ASCII and UTF-8 bytes are identical. Differences appear when text includes accents, non-Latin scripts, symbols, or emoji.
| Text | ASCII Possible? | UTF-8 Byte Length |
|---|---|---|
| Hello | Yes | 5 |
| Kärenlampi | No (contains ä) | 11 |
| Price: 10 € | No (contains €) | 13 |
| Hi 😀 | No (contains emoji) | 7 |
Interactive Encoder
Enter text below to see per-character encoding details. Characters outside ASCII are shown as not representable in ASCII, while UTF-8 is always provided.
ASCII Bytes
-
-
UTF-8 Bytes
-
-
| # | Character | Code Point | ASCII | UTF-8 |
|---|---|---|---|---|
| No data yet. | ||||