FACTORING DETAILS
In
dealing with Mersenne numbers, it is easy to prove
that if 2P-1 is prime, then P must be a prime. Thus, the first step in
the search is to create a list of prime exponents to test.
The next step is to eliminate exponents by finding a
small factor. There are very efficient algorithms for determining if a number
divides 2P-1. For example, let's see if 47 divides 223-1. Convert the exponent 23 to binary, you get
10111. Starting with 1, repeatedly square, remove the top bit of the exponent
and if 1 multiply squared value by 2, then compute the remainder upon division
by 47.
|
Remove |
Optional |
|
Square |
top bit |
mul by 2 |
mod 47 |
------------ |
------- |
------------- |
------ |
1*1 = 1 |
1 0111 |
1*2 = 2 |
2 |
2*2 = 4 |
0 111 |
no |
4 |
4*4 = 16 |
1 11 |
16*2 = 32 |
32 |
32*32 = 1024 |
1 1 |
1024*2 = 2048 |
27 |
27*27 = 729 |
1 |
729*2 = 1458 |
1 |
Thus,
223 = 1 mod 47. Subtract 1 from both sides. 223-1 = 0 mod
47. Since we've shown that 47 is a factor, 223-1 is not prime.
One very nice property of Mersenne numbers is that any
factor q of 2P-1 must be of the form 2kp+1. Furthermore, q must be 1
or 7 mod 8. A proof
is available. Finally, an efficient program can take advantage of the fact that
any potential factor q must be prime.
The GIMPS factoring code creates a modified sieve of
Eratosthenes with each bit representing a potential 2kp+1 factor.
The sieve then eliminates any potential factors that are divisible by prime
numbers below 40,000 or so. Also, bits representing potential factors of 3 or 5
mod 8 are cleared. This process eliminates roughly 95% of potential factors.
The remaining potential factors are tested using the powering algorithm above.
Now the only question remaining is how much trial
factoring should be done? The answer depends on three variables: the cost of
factoring, the chance of finding a factor, and the cost of a primality test.
The formula used is:
factoring_cost < chance_of_finding_factor * 2 * primality_test_cost
That
is, the time spent factoring must be less than the expected time saved. If a
factor is found we can avoid running both the first-time and double-check
primality tests.
Looking
at past factoring data we see that the chance of finding a factor between 2X
and 2X+1 is about 1/x. The factoring cost and primality test costs
are computed by timing the program. At present, the program trial factors to
these limits:
Exponents Trial
up to factored to
--------- -----------
3960000 260
5160000 261
6515000 262
8250000 263
13380000 264
17850000 265
21590000 266
28130000 267
35100000 268
44150000 269
57020000 270
71000000 271
79300000 272
There is another factoring method that GIMPS uses to
find factors and thereby avoid costly primality tests. This method is called
Pollard's (P-1) method. If q is a factor of a number, then the P-1 method
will find the factor q if q-1 is highly composite - that is it has nothing but
small factors. This method when adapted
to Mersenne numbers is even more effective. Remember, that the factor q is of
the form 2kp+1. It is easy to modify the P-1 method such that it will find the
factor q whenever k is highly composite.
The P-1 method is quite simple. First, pick a bound
B1. P-1 will find the factor q as long as all factors of k are less than B1 (k
is called B1-smooth). Second, compute E - the product of all primes less than
B1. Third, compute x = 3E*2*P. Finally, check the GCD (x-1, 2P-1) to
see if a factor was found.
There is an enhancement to Pollard's algorithm that
uses a second bound B2 and will find the factor q if k has just one factor
between B1 and B2 and all remaining factors are below B1.
GIMPS has used this method to find some impressive
factors. For example:
22944999-1 is divisible by 314584703073057080643101377.
314584703073057080643101377 is 2 * 53409984701702289312 * 2944999 + 1.
The value k, 53409984701702289312, is very smooth:
53409984701702289312 = 2^5 * 3 * 19 * 947 * 7187 * 62297 * 69061
So how does GIMPS intelligently choose B1 and B2? We
use a variation of the formula used in trial factoring. We must maximize:
chance_of_finding_factor * 2 * primality_test_cost - factoring_cost
The chance of finding a factor and the factoring cost
both vary with different B1 and B2 values. Dickman's function (see Knuth's Art
of Computer Programming vol. 2) is used to determine the probability of finding
a factor that is k is B1-smooth or B1-smooth with just one factor between B1
and B2. The program tries many values of B1 and if there is sufficient
available memory several values of B2, selecting the B1 and B2 values that
maximize the formula above.