Lecture 17: Factorization

How can we factorize a number $n$ into $n = xy$ for some integers $x$ and $y$? The fundamental algorithm involves iterating through every prime p up to the square root of n and testing if $p$ divides $n$. This approach is effective for small integers, but it becomes extremely slow as n grows larger. Another method previously discussed (used by Fermat) involves expressing n as the difference of two squares, $a^2 - b^2$. However, this method works only if the factors $x$ and $y$ are close to each other. Otherwise, it fails.

Pollard's $\rho$ Factorization (Algorithm)

The mathematician John Pollard came up with a method for factorization called the Pollard $\rho$ method: Given some integer $n$. We want to find a factor $p$ of $n$ so choose random numbers $u_1, u_2,\cdots u_k$. Now, suppose that two of them, $u_i$ and $u_j$, are congruent modulo $p$ (we don’t know $p$ but this just illustrates why it works) so

$$ \begin{align*} u_i \equiv u_j \pmod{p} \end{align*} $$

This implies that $p \mid u_i - u_j$. But we also know that $p \mid n$. So $p$ divides both $n$ and $u_i - u_j$. Therefore, the gcd of both $u_i - u_j$ and $n$ must also divide $p$,

$$ \begin{align*} \gcd(u_i - u_j, n) \mid p \end{align*} $$

Now, if the gcd is $p$, then we have found a non-trivial factor of $n$. If the gcd is $n$, then it’s not a non-trivial factor and we need to try again.

But now you might say that we don’t know $p$ so none of this makes sense. The algorithm itself proceeds with computing the gcd of $n$ and the $u_i - u_j$ and then check whether that GCD is a nontrivial factor of $n$. The above is the intuition for why this works. So now, how many $u$’s must we try before we have a reasonable chance of finding a factor $p$? In order to know how many $u$’s we need, we will quickly go over the birthday paradox next.

The Birthday Paradox

If you have about $30$ people in a room. The chance of two of them having the same birthday is quite large. First, the number of pairs we can form is

$$ \begin{align*} \frac{30 \cdot 29}{2} \approx \frac{30^2}{2} = 450 \end{align*} $$

Observe now that we have $365$ days in a year. We want to compute the probability that at least of the $450$ pairs share a birthday. That is

$$ \begin{align*} P(\text{at least one pair share a birthday}) = 1 - P(\text{no pairs share birthdays}) \end{align*} $$

The probability of having no matches (no collisions) where each of the $30$ people has a distinct birthday is

$$ \begin{align*} \frac{365}{365} \cdot \frac{364}{365} \cdots \frac{336}{365} \approx 0.2937 \end{align*} $$

This is because the first person can have any birthday, the second person can only have one of the $364$ days left since otherwise they will have the same birthday as the first person and so on. Therefore, the probability that we get at least one pair with a matching birthday is $1 - 0.2937 = 0.7063$. So roughly a $70\%$ chance! To get a $50\%$ chance, we would need at least $23$ people. In general, to get a $50\%$ chance with $n=365$ days and $k$ people, we need $\approx \sqrt{k}$.

Pollard's $\rho$ Factorization (Analysis)

Based on the analysis of the birthday paradox and since we’re working modulo $p$, then we will need roughly $\sqrt{p}$ numbers to guarantee a $50\%$ chance. If we let $k = \sqrt{p}$, then we’ll have approximately $k^2$ pairs to check. For each pair, we need to check if the greatest common divisor of their difference and $n$ is $p$u. But wait! we’re checking $k^2 = p$ pairs. So this method is completely useless. We might as well just use the stupid method to find the divisors. Pollard came up with a method to speed this up significantly.

In Pollard’s improved way, we pick $u_0$, but then we pick

$$ \begin{align*} u_1 = f(u_0), u_2=f(u_1), \cdots \end{align*} $$

$f$ can be some polynomial. For example,

$$ \begin{align*} f(x) = x^2 + 1 \end{align*} $$

works really well. Observe here that if $u_i \equiv u_j \pmod{p}$. Then

$$ \begin{align*} u_i + 1 \equiv u_j + 1 \pmod{p} \end{align*} $$

and also

$$ \begin{align*} u_i^2 \equiv u_j^2 \pmod{p} \end{align*} $$

Therefore

$$ \begin{align*} f(u_i) \equiv f(u_j) \pmod{p} \end{align*} $$

So when we check that $u_i \equiv u_j \pmod{p}$, we are also checking other pairs at the same time. For example, if we’re checking $u_4$ and $u_8$, then we’re automatically checking $(u_3,u_7), (u_2,u_6), (u_1,u_5),\cdots$. So how many checks do we need to make now? The average running time turns out to be roughly the square root of the smallest factor $p$.

Example

Factor $7913$ with $f(x) = x^2+1$

Let $u_1 = 2$, we generate more numbers but reducing each time module $7913$ so they don’t get too large so

$$ \begin{align*} u_2 = f(u_1) = 5, u_3 = f(5) = 26, 677, 7289, 1640, \cdots \end{align*} $$

Then, we check for each pair the gcd of $u_i - u_{2i}$ and $7913$. Until we get to $\gcd(u_{14} - u_{7},7913) = 41$. $41$ is a prime divisor of $7913$.

Pollard's $p-1$ Method

An integer is $k$-smooth if it has no prime factors $> k$.

For example, $65537$ is smooth because $p-1 = 2^{16}$. Pollard’s (p-1) method is good for finding factors $p$ of $n$ where $p-1$ is smooth. The steps are

pick $a$ at random
let $M$ be a smooth number such as $$ \begin{align*} M = 2^4 \cdot 3^2 \cdot 5 \cdot 7 \cdot 11 \cdot 13 \cdot 17 \cdot 19 \end{align*} $$
Compute the gcd of $a^M - 1$ and $n$ $$ \begin{align*} d = \gcd(a^M - 1, n) \end{align*} $$ Now,
- if $d = 1$ or $d = n$, try again with another $a$ or $M$.
- if $1 < d < n$, we have a non-trivial factor of $n$. We're done!

Pollard’s observation was that if $n$ has a prime factor $p$, then by Fermat

$$ \begin{align*} a^{p-1} \equiv 1 \pmod{p} \end{align*} $$

Now, if we raise $a$ to $M$ such that $p-1 \mid M$, then by Fermat again (since $p -1 \mid M$)

$$ \begin{align*} a^M \equiv 1 \pmod{p} \end{align*} $$

But this means that $p \mid a^M - 1$. So $p$ divides both $n$ and this constructed number $a^M - 1$. Then this means that

$$ \begin{align*} \gcd(a^M - 1, n) \geq p \end{align*} $$

So now the trick is to just compute this gcd and then see if we get $1 < p < n$. If we do, then we’re successful in finding a non-trivial factor of $n$.

But we also don’t need to compute $M$. To see this, suppose $n = 119$, $a = 2$ and suppose we let

$$ \begin{align*} M = 2^2 \cdot 3 \cdot 5 \end{align*} $$

To compute $a^M \pmod{n}$, we compute

$$ \begin{align*} (((a^2)^2)^3)^5 \pmod{n} \end{align*} $$

In each step where we raise $a$ to some power, we reduce module $n$

$$ \begin{align*} a^4 &= 16 \\ (a^4)^3 &= (16)^3 \equiv 50 \pmod{119} \\ ((a^4)^3)^4 &= (50)^5 \equiv 50 \pmod{119} \end{align*} $$

Next we compute the gcd

$$ \begin{align*} (a^M - 1, 119) = (50, 119) = 7. \end{align*} $$

Observe here that $p-1 = 6 = 2 \cdot 3$. $2$ and $3$ are some of the primes in $M$’s product.

References

Math115a by Richard E Borcherds

Lecture 17: Factorization

Pollard's \(\rho\) Factorization (Algorithm)

The Birthday Paradox

Pollard's \(\rho\) Factorization (Analysis)

Example

Pollard's \(p-1\) Method

References