2.2. Miscellaneous Notation¶
This module collects together definitions for a number of mathematical terms and concepts, as a place for reference when needed.
Units of measure: OpenDSA modules use the following notation for units of measure. "B" will be used as an abbreviation for bytes, "b" for bits, "KB" for kilobytes \((2^{10} = 1024\) bytes), "MB" for megabytes \((2^{20}\) bytes) "GB" for gigabytes \((2^{30}\) bytes) and "ms" for milliseconds (a millisecond is 1/1000 of a second). Spaces are not placed between the number and the unit abbreviation when a power of two is intended. Thus a disk drive of size 25 gigabytes (where a gigabyte is intended as \(2^{30}\) bytes) will be written as "25GB". Spaces are used when a decimal value is intended. An amount of 2000 bits would therefore be written "2 Kb" while "2Kb" represents 2048 bits. 2000 milliseconds is written as 2000 ms. Note that in this book large amounts of storage are nearly always measured in powers of two and times in powers of ten.
Factorial function: The factorial function, written \(n!\) for \(n\) an integer greater than 0, is the product of the integers between 1 and \(n\), inclusive. Thus, \(5! = 1 \cdot 2 \cdot 3 \cdot 4 \cdot 5 = 120\). As a special case, \(0! = 1\). The factorial function grows quickly as \(n\) becomes larger. Because computing the factorial function directly is a time-consuming process, it can be useful to have an equation that provides a good approximation. Stirling's approximation states that \(n! \approx \sqrt{2\pi n}(\frac{n}{e})^n\), where \(e \approx 2.71828\) (\(e\) is the base for the system of natural logarithms) [1]. Thus we see that while \(n!\) grows slower than \(n^n\) (because \(\sqrt{2\pi n}/e^n < 1\)), it grows faster than \(c^n\) for any positive integer constant \(c\).
Permutations:
A permutation of a sequence \(\mathbf{S}\)
is simply the members of \(\mathbf{S}\) arranged in some order.
For example, a permutation of the integers 1 through \(n\) would
be those values arranged in some order.
If the sequence contains \(n\) distinct members, then there are
\(n!\) different permutations for the sequence.
This is because there are \(n\) choices for the first member in
the permutation; for each choice of first member there are \(n-1\)
choices for the second member, and so on.
Sometimes one would like to obtain a random permutation for a
sequence, that is, one of the \(n!\) possible permutations is
selected in such a way that each permutation has equal probability of
being selected.
A simple function for generating a random permutation is as
follows.
Here, the \(n\) values of the sequence are stored in
positions 0 through \(n-1\) of array A
,
function swap(A, i, j)
exchanges elements i
and j
in array A
,
and Random(n)
returns an integer value in the range 0 to
\(n-1\).
//Randomly permute the values in array A
void permute(int A[], int n) {
for (int i = n; i > 0; i--) // for each i
swap(A, i-1, int(Random(i))); // swap A[i-1] with a random
// position in the range 0 to i-1.
}
// Randomly permute the values in array A
void permute(Object[] A) {
for (int i = A.length; i > 0; i--) // for each i
swap(A, i-1, int(random(i))); // swap A[i-1] with a random
} // position in the range 0 to i-1.
Boolean variables:
A Boolean variable
is a variable that takes on one of the two values True
and
False
.
These two values are often associated with the values 1 and 0,
respectively, although there is no reason why this needs to be the
case.
It is poor programming practice to rely on the
correspondence between 0 and False, because these are logically
distinct objects of different types.
Logic Notation: We will occasionally make use of the notation of symbolic or Boolean logic. \(A \Rightarrow B\) means "\(A\) implies \(B\)" or "If \(A\) then \(B\)". \(A \Leftrightarrow B\) means "\(A\) if and only if \(B\)" or "\(A\) is equivalent to \(B\)". \(A \vee B\) means "\(A\) or \(B\)" (useful both in the context of symbolic logic or when performing a Boolean operation). \(A \wedge B\) means "\(A\) and \(B\)". \(\sim\!A\) and \(\overline{A}\) both mean "not \(A\)" or the negation of \(A\) where \(A\) is a Boolean variable.
Floor and ceiling: The floor of \(x\) (written \(\lfloor x \rfloor\)) takes real value \(x\) and returns the greatest integer \(\leq x\). For example, \(\lfloor 3.4 \rfloor = 3\), as does \(\lfloor 3.0 \rfloor\), while \(\lfloor -3.4 \rfloor = -4\) and \(\lfloor -3.0 \rfloor = -3\). The ceiling of \(x\) (written \(\lceil x \rceil\)) takes real value \(x\) and returns the least integer \(\geq x\). For example, \(\lceil 3.4 \rceil = 4\), as does \(\lceil 4.0 \rceil\), while \(\lceil -3.4 \rceil = \lceil -3.0 \rceil = -3\).
Modulus function:
The modulus (or mod) function returns the remainder of
an integer division.
Sometimes written \(n \bmod m\) in mathematical expressions,
the syntax in many programming languages is n % m
.
From the definition of remainder, \(n \bmod m\) is the integer
\(r\) such that \(n = qm + r\) for \(q\) an integer,
and \(|r| < |m|\).
Therefore, the result of \(n \bmod m\) must be between 0 and
\(m-1\) when \(n\) and \(m\) are positive integers.
For example, \(5 \bmod 3 = 2\); \(25 \bmod 3 = 1\),
\(5 \bmod 7 = 5\), and \(5 \bmod 5 = 0\).
There is more than one way to assign values to \(q\) and \(r\), depending on how integer division is interpreted. The most common mathematical definition computes the mod function as \(n \bmod m = n - m\lfloor n/m\rfloor\). In this case, \(-3 \bmod 5 = 2\). However, Java and C++ compilers typically use the underlying processor's machine instruction for computing integer arithmetic. On many computers this is done by truncating the resulting fraction, meaning \(n \bmod m = n - m (\mathrm{trunc}(n/m))\). Under this definition, \(-3 \bmod 5 = -3\). Another language might do something different.
Unfortunately, for many applications this is not what the user wants or expects. For example, many hash systems will perform some computation on a record's key value and then take the result modulo the hash table size. The expectation here would be that the result is a legal index into the hash table, not a negative number. Implementers of hash functions must either insure that the result of the computation is always positive, or else add the hash table size to the result of the modulo function when that result is negative.
[1] | The symbol "\(\approx\)" means "approximately equal." |