A Student's Guide to Numerical Concepts

Understanding the building blocks of digital information.

Why Learn This?

In our digital world, everything from your phone to your car runs on computers. At their core, computers speak a language of numbers. Understanding concepts like binary, signed integers, and floating-point numbers isn't just for programmers; it gives you a fundamental insight into how technology works. It helps demystify computers and empowers you to understand topics like networking, data storage, and digital color representation.

Bits, Bytes, and Words

The most basic unit of information in computing is a bit. A bit can only have one of two values: 0 or 1. Think of it like a light switch that can be either off (0) or on (1).

A byte is an ordered collection of 8 bits. By grouping bits together, we can represent more complex information. With 8 bits, there are 28 (or 256) possible combinations, from 00000000 to 11111111.

A word is a larger group of bits, and its size depends on the computer's architecture. Common word sizes are 16, 32, or 64 bits. A larger word size allows a computer to process more data at once.

Data Units: Kilo, Mega, Giga, Tera

When we talk about file sizes or storage capacity, we often use prefixes like kilo, mega, and giga. However, there's a common point of confusion: whether these prefixes refer to powers of 1000 or powers of 1024.

  • Decimal (SI) Prefixes: These are based on powers of 1000, used in general science and engineering.
    • 1 kilobyte (KB) = 103 bytes = 1,000 bytes
    • 1 megabyte (MB) = 106 bytes = 1,000,000 bytes
    • 1 gigabyte (GB) = 109 bytes = 1,000,000,000 bytes
    • 1 terabyte (TB) = 1012 bytes = 1,000,000,000,000 bytes
    Hard drive manufacturers typically use these decimal prefixes.
  • Binary (IEC) Prefixes: These are based on powers of 1024 (210), which are more natural for computer memory addressing. They have slightly different names:
    • 1 kibibyte (KiB) = 210 bytes = 1,024 bytes
    • 1 mebibyte (MiB) = 220 bytes = 1,048,576 bytes
    • 1 gibibyte (GiB) = 230 bytes = 1,073,741,824 bytes
    • 1 tebibyte (TiB) = 240 bytes = 1,099,511,627,776 bytes
    Operating systems often report file sizes using these binary prefixes (though sometimes still using the "KB" or "MB" notation, which can be confusing).

The distinction explains why a "1 TB" hard drive might appear as "0.909 TB" or "931 GB" in your operating system – it's because the OS is using the 1024-based calculation.

Unsigned Binary Numbers (Base-2)

Computers use the binary system because it's easy to represent with physical hardware (on/off, high/low voltage). Each position in a binary number represents a power of 2, starting from the right (20, 21, 22, etc.). This example shows an unsigned integer, where all bits contribute to the magnitude.

Interactive Unsigned Byte

Click the switches below to turn bits on (1) or off (0) and see how the decimal and hexadecimal values change.

Decimal Value (Base-10)

0

Hexadecimal Value (Base-16)

00

Signed Integers (Two's Complement)

How do computers store negative numbers? The most common method is Two's Complement. In this system, the leftmost bit is the "sign bit". If it's 0, the number is positive and calculated normally. If it's 1, the number is negative. To find the value of a negative number, you invert all the bits (a NOT operation), add one, and then make the result negative.

Interactive Signed Byte

The leftmost bit determines the sign. An 8-bit signed integer can represent numbers from -128 to 127.

Decimal Value (Two's Complement)

0

Floating-Point Numbers (Half-Precision)

For numbers with fractions (like 3.14159), computers use a format called floating-point. It's similar to scientific notation, splitting the number into a sign, an exponent, and a fractional part (mantissa). This demo uses the 16-bit "half-precision" format (1 sign bit, 5 exponent bits, 10 mantissa bits). The formula is: (-1)Sign x 2(Exponent - 15) x (1 + Mantissa). The '15' is the exponent bias.

Sign (1 bit)

Exponent (5 bits)

Mantissa (10 bits)

Resulting Value

0

Understanding Number Bases

We're used to the decimal (base-10) system, which uses ten digits (0-9). Other bases are common in computing:

  • Binary (Base-2): Uses two digits (0 and 1). This is the native language of computers.
  • Hexadecimal (Base-16): Uses sixteen symbols: 0-9 and A-F, where A=10, B=11, C=12, D=13, E=14, and F=15. Hex is widely used because it's a very compact way to represent a byte. One hex digit can represent 4 bits (a "nibble"), so two hex digits perfectly represent one byte.
  • Octal (Base-8): Uses eight digits (0-7). Presents 3 bits. It was more common in the past as an easier way to read binary.

Universal Number Base Converter

This tool lets you convert a number from any of the common bases to all the others. Try converting your age to binary or your zip code to hexadecimal!

Decimal (Base-10):

-

Binary (Base-2):

-

Hexadecimal (Base-16):

-

Octal (Base-8):

-

Endianness: Big-Endian vs. Little-Endian

When a number is larger than a single byte (e.g., a 32-bit integer), the computer needs to decide the order in which to store those bytes in memory. This ordering is called endianness.

  • Big-Endian: Stores the most significant byte (MSB) at the lowest memory address. This is like how we write numbers; the "big end" comes first. For example, the number 0x12345678 would be stored in memory as the bytes 12, 34, 56, 78.
  • Little-Endian: Stores the least significant byte (LSB) at the lowest memory address. The "little end" comes first. The same number 0x12345678 would be stored as 78, 56, 34, 12.

This is crucial in networking, file formats, and low-level programming. Most network protocols (like TCP/IP) use big-endian, which is why it's also called "network byte order". Most modern desktop CPUs (like Intel and AMD x86) are little-endian.

Interactive Endianness Demo

Enter a 32-bit hexadecimal number to see how it's stored in memory in both big-endian and little-endian systems.

0x

Big-Endian (Network Byte Order)

Little-Endian (x86, ARM)

Your System's Endianness

-

Logical (Bitwise) Operations

Logical operations work on individual bits. They are fundamental to how computers make decisions and perform calculations. Here you can see how they work by comparing two bytes bit-by-bit.

Operation Definitions

AND
The result bit is 1 (true) only if both input bits are 1.
OR
The result bit is 1 (true) if either of the input bits is 1.
XOR (Exclusive OR)
The result bit is 1 (true) if the input bits are different (one is 0 and one is 1).
NOT
This operation takes only one input. It simply inverts the bit: 1 becomes 0, and 0 becomes 1.

Byte A

Byte B


Result

Result (Decimal)

0

Result (Hex)

00

NOT Operation

The NOT operation is unique because it only takes one input and inverts all its bits (0 becomes 1, 1 becomes 0). The result of NOT on Byte A is shown below.

Even / Odd Test with Bitwise AND

A quick and efficient way to check if a number is even or odd in binary is by using the bitwise AND operator with the number 1. This works because:

When you perform a bitwise AND with 1 (which is 00000001 in an 8-bit representation), all bits except the least significant bit are effectively masked out. The result will be 0 if the number is even, and 1 if the number is odd.

Example:

Decimal 5 (Odd): 00000101 AND 1: 00000001 -------------------------- Result: 00000001 (Decimal 1) Decimal 4 (Even): 00000100 AND 1: 00000001 -------------------------- Result: 00000000 (Decimal 0)

This is equivalent to checking the remainder when dividing by 2 (N % 2), but often faster at a low level of computation.

Bitwise Shifts and Rotations

Bitwise shifts and rotations move the bits of a binary number to the left or right. These operations are crucial for low-level programming, optimizing calculations, and manipulating data at the bit level.

Definitions:

Left Shift (N << K)
Moves all bits to the left by K positions. Zeros are filled in from the right. This is equivalent to multiplying the number by 2K. Bits shifted off the left end are lost.
Right Shift (N >> K)
Moves all bits to the right by K positions. For unsigned numbers, zeros are filled in from the left (logical right shift). For signed numbers, the sign bit is usually replicated (arithmetic right shift) to preserve the sign. This is equivalent to dividing the number by 2K (integer division).
Left Rotate
Moves all bits to the left by K positions. Bits shifted off the left end are "rotated" back in on the right end. No bits are lost.
Right Rotate
Moves all bits to the right by K positions. Bits shifted off the right end are "rotated" back in on the left end. No bits are lost.

Interactive Byte Shifter/Rotator

Manipulate an 8-bit number using shift and rotate operations. Click the bits to toggle them, then use the buttons to see the effects.

Decimal Value

0

Hexadecimal Value

00

Binary-Coded Decimal (BCD)

Binary-Coded Decimal (BCD) is a way to represent decimal numbers where each decimal digit is represented by its own 4-bit binary code (a nibble). For example, the decimal number 123 would be represented as `0001 0010 0011` in BCD.

While it might seem less efficient than pure binary (since 4 bits can represent 0-15, but BCD only uses 0-9), BCD has some advantages:

Is it used anymore? Yes, BCD is still used in specific applications where decimal precision is paramount, such as:

However, for general-purpose computing, pure binary and floating-point representations are far more common due to their efficiency in storage and computation.

Weird Things with Bits

Swapping Two Values with Three XORs

This is a classic bitwise trick to swap the values of two variables without needing a temporary third variable. It leverages the properties of the XOR operation (A XOR A = 0 and A XOR B XOR B = A).

The Algorithm:

a = a ^ b; (a now holds A XOR B) b = a ^ b; (b now holds (A XOR B) XOR B = A) a = a ^ b; (a now holds (A XOR B) XOR A = B)

XOR CX, CX vs. MOV CX, 0 (Assembly Optimization)

In older Intel x86 assembly languages, a common optimization trick was to use XOR CX, CX to set the CX register to zero, instead of using MOV CX, 0.

Why was this faster?

  • Instruction Size: XOR CX, CX is a shorter instruction (fewer bytes) than MOV CX, 0. Smaller instructions can be fetched and decoded faster.
  • Execution Units: On older processors, the XOR instruction could often be executed on a dedicated arithmetic logic unit (ALU) that was already available and optimized for bitwise operations, potentially freeing up other units (like those handling memory moves) for other tasks.
  • No Immediate Value: MOV CX, 0 requires the processor to fetch an immediate value (the '0') from memory or instruction stream. XOR CX, CX operates directly on the register's current value, which is already available in the CPU.

While this optimization was significant in the days of limited cache and simpler pipeline architectures, modern CPUs have highly sophisticated pipelines, out-of-order execution, and predictive capabilities. Compilers are also much smarter. Today, the performance difference between XOR REG, REG and MOV REG, 0 is often negligible, and sometimes MOV can even be faster due to specific micro-architectural optimizations (e.g., zeroing idioms). However, it remains a fascinating example of low-level optimization.

Language Specifics: Numbers and Bitwise Operations

Different programming languages provide various ways to represent numbers in different bases and perform bitwise operations. Here's a quick overview for common languages like C, Java, JavaScript, and Python:

Number Literals

  • Decimal: Standard base-10 numbers (e.g., 123) are written without any special prefix in all these languages.
  • Binary: Typically prefixed with 0b or 0B (e.g., 0b1011). This is supported in Java 7+, C++14+, JavaScript (ES6+), and Python.
  • Hexadecimal: Universally prefixed with 0x or 0X (e.g., 0xFF).
  • Octal:
    • In C and Java, octal numbers are prefixed with a single 0 (e.g., 0755).
    • In Python, they are prefixed with 0o or 0O (e.g., 0o755).
    • In JavaScript, the 0 prefix for octal is deprecated in strict mode; explicit octal literals use 0o or 0O (ES6+).

Binary Operations

The core bitwise operations use very similar syntax across C, Java, JavaScript, and Python:

  • AND: & (e.g., a & b)
  • OR: | (e.g., a | b)
  • XOR (Exclusive OR): ^ (e.g., a ^ b)
  • NOT (Bitwise Complement): ~ (e.g., ~a).
    • Note for Python: ~x is equivalent to -x - 1 due to Python's handling of arbitrary precision integers.
  • Left Shift: << (e.g., a << 2). This generally multiplies the number by powers of 2.
  • Right Shift: >> (e.g., a >> 2). This generally divides the number by powers of 2.
    • In C, Java, and Python, this is typically an *arithmetic* right shift for signed numbers (preserving the sign bit).
    • JavaScript also has an *unsigned* right shift operator >>> which always fills with zeros from the left.
  • Rotations: Direct bitwise rotation operators are generally *not* available in these high-level languages. They are typically implemented manually using a combination of shift and OR operations, often requiring consideration of the specific bit width (e.g., 8-bit, 32-bit).