ASCII and UTF-8 Education

What Is Character Encoding?

Computers store text as numbers. A character encoding defines how letters, digits, and symbols map to numeric values and bytes. When sender and receiver use the same encoding, text is displayed correctly.

ASCII

ASCII (American Standard Code for Information Interchange) is a 7-bit encoding with values from 0 to 127. It covers English letters, digits, punctuation, and control codes.

Uses 1 byte in storage, but only 7 bits are defined.
Great for basic English text and legacy systems.
Cannot represent characters like ä, ö, å, €, or emoji.

Character	ASCII Decimal	Binary (7-bit)	Hex
A	65	1000001	0x41
a	97	1100001	0x61
0	48	0110000	0x30
Space	32	0100000	0x20

UTF-8

UTF-8 is a variable-length encoding for Unicode. It can represent almost every written character in modern computing. It uses 1 to 4 bytes per character.

ASCII characters keep the same byte values in UTF-8.
Common Latin characters with accents often use 2 bytes.
Emoji and many symbols use 4 bytes.
It is the standard encoding for the web.

Character	Unicode Code Point	UTF-8 Bytes (Hex)	Byte Count
A	U+0041	41	1
ä	U+00E4	C3 A4	2
€	U+20AC	E2 82 AC	3
😀	U+1F600	F0 9F 98 80	4

UTF-8 on the Binary Level

UTF-8 uses specific high bits to show whether a byte starts a new character or continues one. The first byte tells how many bytes the character uses, and every following byte uses the continuation pattern.

A Unicode code point is the unique number assigned to a character. It is written as U+ followed by a hexadecimal number, such as A = U+0041, € = U+20AC, and 😀 = U+1F600. UTF-8 converts that number into one to four bytes; the code point identifies the character, while the UTF-8 bytes describe how it is stored or transmitted.

Code Point Range	Byte Pattern	Meaning
U+0000–U+007F	0xxxxxxx	Single-byte character (ASCII range).
U+0080–U+07FF	110xxxxx 10xxxxxx	2-byte character.
U+0800–U+FFFF	1110xxxx 10xxxxxx 10xxxxxx	3-byte character.
U+10000–U+10FFFF	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx	4-byte character.
—	10xxxxxx	Continuation byte (never valid as a first byte).

The prefix 10 marks a byte that continues a character, and the leading byte prefix (0, 110, 1110, or 11110) tells how many total bytes belong to that character.

Example Breakdown

A (U+0041) = 01000001 (starts with 0, so 1 byte).
ä (U+00E4) = 11000011 10100100 (first byte starts with 110, next byte starts with 10).
€ (U+20AC) = 11100010 10000010 10101100 (first byte starts with 1110, then two continuation bytes).

ASCII vs UTF-8 in Practice

For plain English text, ASCII and UTF-8 bytes are identical. Differences appear when text includes accents, non-Latin scripts, symbols, or emoji.

Text	ASCII Possible?	UTF-8 Byte Length
Hello	Yes	5
Kärenlampi	No (contains ä)	11
Price: 10 €	No (contains €)	13
Hi 😀	No (contains emoji)	7

Interactive Encoder

Enter text below to see per-character encoding details. Characters outside ASCII are shown as not representable in ASCII, while UTF-8 is always provided.

Text input

ASCII Bytes

-

UTF-8 Bytes

-

#	Character	Code Point	ASCII	UTF-8
No data yet.