ASCII and UTF-8

A practical guide to two important character encoding standards used in computing.


What Is Character Encoding?

Computers store text as numbers. A character encoding defines how letters, digits, and symbols map to numeric values and bytes. When sender and receiver use the same encoding, text is displayed correctly.

ASCII

ASCII (American Standard Code for Information Interchange) is a 7-bit encoding with values from 0 to 127. It covers English letters, digits, punctuation, and control codes.

  • Uses 1 byte in storage, but only 7 bits are defined.
  • Great for basic English text and legacy systems.
  • Cannot represent characters like ä, ö, å, €, or emoji.
Character ASCII Decimal Binary (7-bit) Hex
A 65 1000001 0x41
a 97 1100001 0x61
0 48 0110000 0x30
Space 32 0100000 0x20

UTF-8

UTF-8 is a variable-length encoding for Unicode. It can represent almost every written character in modern computing. It uses 1 to 4 bytes per character.

  • ASCII characters keep the same byte values in UTF-8.
  • Common Latin characters with accents often use 2 bytes.
  • Emoji and many symbols use 4 bytes.
  • It is the standard encoding for the web.
Character Unicode Code Point UTF-8 Bytes (Hex) Byte Count
A U+0041 41 1
ä U+00E4 C3 A4 2
U+20AC E2 82 AC 3
😀 U+1F600 F0 9F 98 80 4

UTF-8 on the Binary Level

UTF-8 uses specific high bits to show whether a byte starts a new character or continues one. The first byte tells how many bytes the character uses, and every following byte uses the continuation pattern.

Byte Pattern Meaning
0xxxxxxx Single-byte character (ASCII range).
110xxxxx 10xxxxxx 2-byte character.
1110xxxx 10xxxxxx 10xxxxxx 3-byte character.
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4-byte character.
10xxxxxx Continuation byte (never valid as a first byte).

The prefix 10 marks a byte that continues a character, and the leading byte prefix (0, 110, 1110, or 11110) tells how many total bytes belong to that character.

Example Breakdown

  • A = 01000001 (starts with 0, so 1 byte).
  • ä = 11000011 10100100 (first byte starts with 110, next byte starts with 10).
  • = 11100010 10000010 10101100 (first byte starts with 1110, then two continuation bytes).

ASCII vs UTF-8 in Practice

For plain English text, ASCII and UTF-8 bytes are identical. Differences appear when text includes accents, non-Latin scripts, symbols, or emoji.

Text ASCII Possible? UTF-8 Byte Length
Hello Yes 5
Kärenlampi No (contains ä) 11
Price: 10 € No (contains €) 13
Hi 😀 No (contains emoji) 7

Interactive Encoder

Enter text below to see per-character encoding details. Characters outside ASCII are shown as not representable in ASCII, while UTF-8 is always provided.

ASCII Bytes

-

-

UTF-8 Bytes

-

-
# Character Code Point ASCII UTF-8
No data yet.