How can we factorize a number \(n\) into \(n = xy\) for some integers \(x\) and \(y\)? The fundamental algorithm involves iterating through every prime p up to the square root of n and testing if \(p\) divides \(n\). This approach is effective for small integers, but it becomes extremely slow as n grows larger. Another method previously discussed (used by Fermat) involves expressing n as the difference of two squares, \(a^2 - b^2\). However, this method works only if the factors \(x\) and \(y\) are close to each other. Otherwise, it fails.
Pollard's \(\rho\) Factorization (Algorithm)
The mathematician John Pollard came up with a method for factorization called the Pollard \(\rho\) method: Given some integer \(n\). We want to find a factor \(p\) of \(n\) so choose random numbers \(u_1, u_2,\cdots u_k\). Now, suppose that two of them, \(u_i\) and \(u_j\), are congruent modulo \(p\) (we don’t know \(p\) but this just illustrates why it works) so
This implies that \(p \mid u_i - u_j\). But we also know that \(p \mid n\). So \(p\) divides both \(n\) and \(u_i - u_j\). Therefore, the gcd of both \(u_i - u_j\) and \(n\) must also divide \(p\),
Now, if the gcd is \(p\), then we have found a non-trivial factor of \(n\). If the gcd is \(n\), then it’s not a non-trivial factor and we need to try again.
But now you might say that we don’t know \(p\) so none of this makes sense. The algorithm itself proceeds with computing the gcd of \(n\) and the \(u_i - u_j\) and then check whether that GCD is a nontrivial factor of \(n\). The above is the intuition for why this works. So now, how many \(u\)’s must we try before we have a reasonable chance of finding a factor \(p\)? In order to know how many \(u\)’s we need, we will quickly go over the birthday paradox next.
The Birthday Paradox
If you have about \(30\) people in a room. The chance of two of them having the same birthday is quite large. First, the number of pairs we can form is
Observe now that we have \(365\) days in a year. We want to compute the probability that at least of the \(450\) pairs share a birthday. That is
The probability of having no matches (no collisions) where each of the \(30\) people has a distinct birthday is
This is because the first person can have any birthday, the second person can only have one of the \(364\) days left since otherwise they will have the same birthday as the first person and so on. Therefore, the probability that we get at least one pair with a matching birthday is \(1 - 0.2937 = 0.7063\). So roughly a \(70\%\) chance! To get a \(50\%\) chance, we would need at least \(23\) people. In general, to get a \(50\%\) chance with \(n=365\) days and \(k\) people, we need \(\approx \sqrt{k}\).
Pollard's \(\rho\) Factorization (Analysis)
Based on the analysis of the birthday paradox and since we’re working modulo \(p\), then we will need roughly \(\sqrt{p}\) numbers to guarantee a \(50\%\) chance. If we let \(k = \sqrt{p}\), then we’ll have approximately \(k^2\) pairs to check. For each pair, we need to check if the greatest common divisor of their difference and \(n\) is \(p\)u. But wait! we’re checking \(k^2 = p\) pairs. So this method is completely useless. We might as well just use the stupid method to find the divisors. Pollard came up with a method to speed this up significantly.
In Pollard’s improved way, we pick \(u_0\), but then we pick
\(f\) can be some polynomial. For example,
works really well. Observe here that if \(u_i \equiv u_j \pmod{p}\). Then
and also
Therefore
So when we check that \(u_i \equiv u_j \pmod{p}\), we are also checking other pairs at the same time. For example, if we’re checking \(u_4\) and \(u_8\), then we’re automatically checking \((u_3,u_7), (u_2,u_6), (u_1,u_5),\cdots\). So how many checks do we need to make now? The average running time turns out to be roughly the square root of the smallest factor \(p\).
Example
Let \(u_1 = 2\), we generate more numbers but reducing each time module \(7913\) so they don’t get too large so
Then, we check for each pair the gcd of \(u_i - u_{2i}\) and \(7913\). Until we get to \(\gcd(u_{14} - u_{7},7913) = 41\). \(41\) is a prime divisor of \(7913\).
Pollard's \(p-1\) Method
For example, \(65537\) is smooth because \(p-1 = 2^{16}\). Pollard’s (p-1) method is good for finding factors \(p\) of \(n\) where \(p-1\) is smooth. The steps are
- pick \(a\) at random
- let \(M\) be a smooth number such as $$ \begin{align*} M = 2^4 \cdot 3^2 \cdot 5 \cdot 7 \cdot 11 \cdot 13 \cdot 17 \cdot 19 \end{align*} $$
- Compute the gcd of \(a^M - 1\) and \(n\)
$$
\begin{align*}
d = \gcd(a^M - 1, n)
\end{align*}
$$
Now,
- if \(d = 1\) or \(d = n\), try again with another \(a\) or \(M\).
- if \(1 < d < n\), we have a non-trivial factor of \(n\). We're done!
Pollard’s observation was that if \(n\) has a prime factor \(p\), then by Fermat
Now, if we raise \(a\) to \(M\) such that \(p-1 \mid M\), then by Fermat again (since \(p -1 \mid M\))
But this means that \(p \mid a^M - 1\). So \(p\) divides both \(n\) and this constructed number \(a^M - 1\). Then this means that
So now the trick is to just compute this gcd and then see if we get \(1 < p < n\). If we do, then we’re successful in finding a non-trivial factor of \(n\).
But we also don’t need to compute \(M\). To see this, suppose \(n = 119\), \(a = 2\) and suppose we let
To compute \(a^M \pmod{n}\), we compute
In each step where we raise \(a\) to some power, we reduce module \(n\)
Next we compute the gcd
Observe here that \(p-1 = 6 = 2 \cdot 3\). \(2\) and \(3\) are some of the primes in \(M\)’s product.