ASCII and UTF-8

A practical guide to two important character encoding standards used in computing.

What Is Character Encoding?

Computers store text as numbers. A character encoding defines how letters, digits, and symbols map to numeric values and bytes. When sender and receiver use the same encoding, text is displayed correctly.

ASCII

ASCII (American Standard Code for Information Interchange) is a 7-bit encoding with values from 0 to 127. It covers English letters, digits, punctuation, and control codes.

  • Uses 1 byte in storage, but only 7 bits are defined.
  • Great for basic English text and legacy systems.
  • Cannot represent characters like ä, ö, å, €, or emoji.
Character ASCII Decimal Binary (7-bit) Hex
A 65 1000001 0x41
a 97 1100001 0x61
0 48 0110000 0x30
Space 32 0100000 0x20

UTF-8

UTF-8 is a variable-length encoding for Unicode. It can represent almost every written character in modern computing. It uses 1 to 4 bytes per character.

  • ASCII characters keep the same byte values in UTF-8.
  • Common Latin characters with accents often use 2 bytes.
  • Emoji and many symbols use 4 bytes.
  • It is the standard encoding for the web.
Character Unicode Code Point UTF-8 Bytes (Hex) Byte Count
A U+0041 41 1
ä U+00E4 C3 A4 2
U+20AC E2 82 AC 3
😀 U+1F600 F0 9F 98 80 4

ASCII vs UTF-8 in Practice

For plain English text, ASCII and UTF-8 bytes are identical. Differences appear when text includes accents, non-Latin scripts, symbols, or emoji.

Text ASCII Possible? UTF-8 Byte Length
Hello Yes 5
Kärenlampi No (contains ä) 11
Price: 10 € No (contains €) 13
Hi 😀 No (contains emoji) 7

Interactive Encoder

Enter text below to see per-character encoding details. Characters outside ASCII are shown as not representable in ASCII, while UTF-8 is always provided.

ASCII Bytes

-

-

UTF-8 Bytes

-

-
# Character Code Point ASCII UTF-8
No data yet.