Lecture 15: Numerical Calculations

In this lecture we’re going to focus on finding fast algorithms to solve numerical problems like

Solve $ax + by = c$
Is $n$ prime? and factorize it.
Solve $f(x) \equiv 0 \pmod{p}.$
Solve $x^2 + 1 \equiv 0 \pmod{p}.$
Solve $a^k \pmod{m}.$

Algorithmic Complexity

Let’s start with a simple computation.

How long does it take to compute $a + b$

We measure how fast an algorithm is using the $O$ notation.

Side note: as a reminder: a function $T(n) = O(f(n))$ if there exists some constant $C > 0$ and $n_0 \geq 0$ such that $$ \begin{align*} T(n) \geq C \cdot f(n) \quad \text{for all } n \geq n_0 \end{align*} $$

Let $N_1$ be the number of digits in $a$ (the length of the decimal notation) and $N_2$ be the number of digits in $b$. If let $N = \max(N_1,N_2)$, then the number of operations we need to add $a$ and $b$ is $O(N)$. That is some constant $C$ times the number of digits $N$. Is this the best we can do? Yes, since we at least need this many operations to even read the number of digits.

In general, polynomial time algorithms $O(n^k)$ are considered fast and anything worse than polynomial time (ie exponential time $O(2^n)$) is considered slow.

Warning about $O$ notations: you’ll often come across time complexities like $O(N\log(N)^3)$ or $O(N\log\log(N))$. Note here factors like $\log\log(N)$ are generally meaningless. A factor like $\log\log(N)$ is almost constant. It is fact almost $3$. So when you see something like

$$ \begin{align*} 100N \quad \text{ vs } \quad N\log\log(N) \end{align*} $$

You might think that $100N$ is better since it’s linear. But $100N$ is only less than $N\log\log(N)$ if $\log\log(N) > 100$. This means that $N$ will need to be at least $e^{e^{100}}$ which is ridiculously large.

Example: Multiplication

How long does it take to compute $a \cdot b$

If we again let $N_1$ be the number of digits in $a$, $N_2$ be the number of digits in $b$ and $N = \max(N_1,N_2)$. Then, the elementary way of doing multiplication takes about $N_1 \times N_2$ times some constant. Using $O$ notation, this is exactly $O(N^2)$.

Is this the best we can do? there are actually much faster algorithms such as Fast Fourier Transform: you apply a Fourier transform on $a$ and $b$ that takes $O(N\log(N))$ and then you do a pointwise multiplication in $O(N)$. Finally, you do another Fourier transform to get the original product $ab$. This step also takes $O(N\log(N)$. We still to explain what this mysterious Fourier transform step is but we won’t discuss it in this class and we will rather discuss another algorithm similar to it.

Fast Multiplication: Chinese Remainder Theorem

Recall that if we know that

$$ \begin{align*} m \equiv r_1 \pmod{a} \quad \text{and} \quad n \equiv r_2 \pmod{a} \end{align*} $$

Then we get

$$ \begin{align*} mn \equiv r_1r_2 \pmod{a} \end{align*} $$

Similarly, suppose that

$$ \begin{align*} m \equiv s_1 \pmod{b} \quad \text{and} \quad n \equiv s_2 \pmod{b} \end{align*} $$

So now suppose that we do the same thing module $b$ such that $a$ and $b$ are coprime, then

$$ \begin{align*} mn \equiv s_1s_2 \pmod{b} \end{align*} $$

So now the Chinese Remainder Theorem says that the following system

$$ \begin{align*} x &\equiv r_1r_2 \pmod{a} \\ x &\equiv s_1s_2 \pmod{b} \end{align*} $$

have a unique number solution modulo $ab$ since $a$ and $b$ are coprime. That unique $x$ can be constructed using CRT and it satisfies

$$ \begin{align*} x &\equiv mn \pmod{ab} \\ \end{align*} $$

So we basically found the product module $ab$ but we didn’t directly multiply $m$ and $n$ but rather, we deconstructed their product to smaller products module the primes $5,7$ and $11$.

Example

So now applying this to an example. Suppose $m = 16$ and $n = 24$. Suppose that that we pick

$$ \begin{align*} P = 5 \cdot 7 \cdot 11 = 385 > 384 = 16 \cdot 24 \end{align*} $$

So now we want to compute the remainders modulo each of these primes so

$$ \begin{align*} 16 \equiv 1 \pmod{5} \quad &\text{and} \quad 24 \equiv 4 \pmod{5} \\ 16 \equiv 2 \pmod{7} \quad &\text{and} \quad 24 \equiv 3 \pmod{7} \\ 16 \equiv 5 \pmod{11} \quad &\text{and} \quad 24 \equiv 2 \pmod{11} \\ \end{align*} $$

Then by properties of modular arithmetic

$$ \begin{align*} 16 \cdot 24 &\equiv 4 \pmod{5} \\ 16 \cdot 24 &\equiv 6 \pmod{7} \\ 16 \cdot 24 &\equiv 10 \pmod{11} \end{align*} $$

If we let $x = 16 \cdot 24$, then the following system

$$ \begin{align*} x &\equiv 4 \pmod{5} \\ x &\equiv 6 \pmod{7} \\ x &\equiv 10 \pmod{11} \end{align*} $$

has a unique solution modulo $5 \cdot 7 \cdot 11$ by CRT. This solution can be constructed using CRT and satisfied.

$$ \begin{align*} x \equiv 384 \pmod{385} \end{align*} $$

For an example on how to construct the actual $x$, see this.

Example: Determinants

Suppose we want to compute the determinant of some $10\times 10$ matrix

$$ \begin{align*} \begin{bmatrix} \ast & \ast & \ast & \cdots & \ast \\ \ast & \ast & \ast & \cdots & \ast \\ \ast & \ast & \ast & \cdots & \ast \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \ast & \ast & \ast & \cdots & \ast \end{bmatrix} \end{align*} $$

but each entries has like $1000$ digits. The easy method is to compute the determinant module $p$ for lots of primes $p$ ($1000$ primes for example) such that the product of these primes is greater than the determinant (estimate). Next, we’ll compute each of these determinants using Gaussian elimination for example. We can then re-construct this product using the Chinese Remainder Theorem. This is pretty fast since we can use parallel processing to compute each of these $1000$ determinants at the same time.

Fast Multiplication: Russian Peasant Algorithm

To compute $a \times b$, follow these steps:

Write $ a $ and $ b $ in two columns.
Divide the number by $2$ in the first column ignore remainders.
Multiply the number in the second column by $2$. Stop when the first column reaches 0.
Cross out all rows where the number in the first column is even.
Add all the numbers remaining in the second column. The result is the product $ a \times b $.

$a$	$b$	Remainder?
13	5	Yes
6	10	No
3	20	Yes
1	40	YES
	65

What we’re really doing here is converting the numbers to binary and using multiplication in binary. This is a terrible method for multiplying numbers since it’s much slower since we’re converting the numbers to binary and then multiplying the numbers again in another step. So why mention this? It turns out this Russian Peasants method is really useful when it comes to fast exponentiation.

Fast Exponentiation

Recall our discussion when we wanted to compute $a^b \pmod{m}$:

We can do it the stupid way which is just multiplying all the $a$s in $a \cdot a \cdot a \cdots ... \pmod{b}$. This is an exponential time algorithm
The second method is to save space in the first method by reducing module $b$ each step to save space.

Russian peasant exponentiation. Suppose we want to compute $2^{13}$. In the first column, take the exponent and keep dividing by $2$. In the second column, take the base and keep squaring it.

$a$	$b$	Remainder?
13	2	Yes
6	4	No
3	16	Yes
1	256	YES
	$2 \cdot 16 \cdot 256 = 8192 $

How long does this algorithm take? Suppose $a$ is the base and $n$ is the exponent. Then let

$$ \begin{align*} N &= \log(n) + 1 \quad \text{(the number of decimal digits in $n$)}\\ M &= \log(a) + 1 \quad \text{(the number of decimal digits in $a$)} \end{align*} $$

Then

The number of divisions is $O(\log(n) = O(N)$.
Squaring the base in the second column depends on which multiplication algorithm we use. With a fast algorithm, this step will take $O(M\log(M))$
The last step is to multiply all the rows with yes. This also takes around $O(N\log(N))$

So Russian peasant is really fast at exponentiation.

Is the Russian peasant method the best possible? No. We can actually do better. Suppose we want to calculate $a^15$. In the stupid method, we do 14 multiplications. In the Russian peasant method, we square the base to get $a, a^2, a^4, a^8$. Then, we will multiply these to get $a^9, a^{11}, a^{15}$. So $6$ multiplications. But there is even a better method. … [TODO]

References

Math115a by Richard E Borcherds