# Integers vs. Floating-point Numbers

#### Integers vs Floating-point Numbers

Floating-point numbers offer some distinct advantages over integers. To see why integers are exact and floating-point numbers are not, we will explore the way computers store and manipulate the integer and floating-point types.

Computers store all data internally in binary form. The decimal system uses 10 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Figure 4.4 shows how the familiar base 10 place value system works.

• The base 10 place value system
• 473,406 = 4×105 +7×104 +3×103 +4×102 +0×101 +6×100
• = 400,000+70,000+3,000+400+0+6
• = 473,406
• The base 2 place value system
• 1001112 = 1×2 5 +0×2 4 +0×2 3 +1×2 2 +1×2 1 +1×2 0
• = 32+0+0+4+2+1
• = 39

With only two digits to work with, the binary number system distinguishes place values by powers of two. Sometimes to be very clear we will attach a subscript of 10 to a decimal number, as in 10010.

In the decimal system, it is easy to add 3+5: 3 + 5 = 8

The sum 3+9 is a little more complicated, 3 + 9 = 12

We can say 3+9 is 2, carry the 1. 1 + 03 + 09 = 12

• 02 +02 = 02
• 02 +12 = 12
• 12 +02 = 12
• 12 +12 = 102

#### Integer Implementation

Standard C++ supports multiple integer types: int, short, long, and long long, unsigned, unsigned short, unsigned long, and unsigned long long. The most commonly used integer type in C++ is int.The exact number of bits in an int is processor specific.

• Binary Bit String Decimal Value
• 00000 0
• 00001 1
• 00010 2
• 00011 3
• 00100 4
• 00101 5
• 00110 6
• 00111 7
• 01000 8
• 01001 9
• 01010 10
• 01011 11
• 01100 12
• 01101 13
• 01110 14
• 01111 15
• 10000 16
• 10001 17
• 10010 18
• 10011 19
• 10100 20
• 10101 21
• 10110 22
• 10111 23
• 11000 24
• 11001 25
• 11010 26
• 11011 27
• 11100 28
• 11101 29
• 11110 30
• 11111 31

Adding 1 to 4,294,967,295 produces 0, one position clockwise from 4,294,967295. Subtracting 4 from 2 yields 4,294,967,294, four places counter clock wise from 2.

#### Example

``````#include <iostream>
int main() {
int x = 2147483645; // Almost the largest possible int value
std::cout << x << " + 1 = " << x + 1 << '\n';
std::cout << x << " + 2 = " << x + 2 << '\n';
std::cout << x << " + 3 = " << x + 3 << '\n';
}``````

#### Floating-point Implementation

The standard C++ floating point types consist of float, double, and long double. As with the integer types, the different floating-point types may be distinguished by the number of bits of storage required and corresponding range of values. The type float stands for single-precision floating-point, and double stands for double-precision floating-point.

Single-precision floating-point numbers (type float) occupy 32 bits, distributed as follows:

• Mantissa 24 bits
• Exponent 7 bits
• Sign 1 bit
• Total 32 bits

Double-precision floating-point numbers (type double) require 64 bits:

• Mantissa 52 bits
• Exponent 11 bits
• Sign 1 bit
• Total 64 bits

#### Code Example

``````#include <iostream>
#include <iomanip>
int main() {
double d1 = 2000.5;
double d2 = 2000.0;
std::cout << std::setprecision(16) << (d1 - d2) << '\n';
double d3 = 2000.58;
double d4 = 2000.0;
std::cout << std::setprecision(16) << (d3 - d4) << '\n';
}``````
``````//Output
0.5
0.5799999999999272``````

#### Code Example

``````#include <iostream>
int main() {
double one = 1.0,
one_fifth = 1.0/5.0,
zero = one - one_fifth - one_fifth - one_fifth - one_fifth - one_fifth;
std::cout << "one = " << one << ", one_fifth = " << one_fifth
<< ", zero = " << zero << '\n';
}``````
``````//Output
one = 1, one_fifth = 0.2, zero = 5.55112e-017``````