
version 0.24.0
==============================
- merged development branches 0.22 and 0.23.x

new features:
- CRC32 checksums in JSON output to reduce invalid results
- improved performance on Pascal devices
- improved checkpoint compatibility between versions
  - file names now contain metadata

enhancements:
- merged in file locking code from mfakto to reduce concurrency (thanks to
  James Heinrich)
  - temporary files used to update worktodo.txt now have unique random names
    based on the template "__worktodo__.XXXXXX" (thanks to NStorm)
- updated mul_96() to use 64-bit integers (thanks to NStorm)
  - users should see a small performance increase with CUDA 12.x and above
- improved program output in various places

bug fixes:
- fixed typos and formatting in output
- fix some compilation warnings
- the maximum number of threads per SM should no longer exceed the limit for
  certain architectures
- CUDA versions are now correctly detected

build:
- made the build scripts more robust
- enabled more compiler optimizations by default for a slight performance gain
  (thanks to NStorm)
- various improvements to GitHub Actions workflows (thanks to NStorm)
  - support for additional CUDA versions
  - removed dependency on deprecated windows-2019 images
  - code analysis with CodeQL
- improved visual appearance and clarity of release information (thanks to
  Danny Chia)
- added CUDA 13.x support (thanks to NStorm)
  - breaking change: compute capability 7.2 and below are dropped

other changes:
- dropped support for compute capability 1.x and 32-bit builds
- replaced deprecated cudaThreadSynchronize() with cudaDeviceSynchronize()
- removed the debug option "USE_DEVICE_PRINTF"
- code has been revamped to improve formatting and maintain consistent style
  (thanks to NStorm)
- requested by James Heinrich: results are no longer saved to deprecated
  results.txt by default (thanks to NStorm)
  - please use results.json.txt for submissions
  - set LegacyResultsTxt=1 in mfaktc.ini to restore old behavior and re-enable
    logging to results.txt
- re-organized mfaktc.ini to be more user-friendly (thanks to James Heinrich)
- changed the key "subversion" to "kernel" in the JSON output
  - PrimeNet should still accept results with "subversion"
- renamed the source files cuda_basic_stuff.* to cuda_utils.*

version 0.23.7
==============================
- includes changes backported from mfaktc 0.24.0

new features:
- added two new configuration options
  - TimestampOnSameLine allows the timestamp to be printed on the same line
    as the result
  - ResultsFileTimestampInterval configures how many seconds must pass before
    mfaktc records a new timestamp, similar to Prime95

enhancements:
- improved program output in various places
- getOSJSON() has been made more robust

other changes:
- macOS is now detected as "macOS" rather than the kernel name "Darwin"
  - note: mfaktc is not currently supported on macOS

version 0.23.6
==============================
- includes changes backported from mfaktc 0.24.0

enhancements:
- updated mul_96() to use 64-bit integers (thanks to NStorm)
  - users should see a small performance increase with CUDA 12.x and above

bug fixes:
- fixed some typos in code comments and program output

build:
- added CUDA 13.x support (thanks to NStorm)
  - breaking change: compute capability 7.2 and below are dropped

other changes:
- code has been revamped to improve formatting and maintain consistent style
  (thanks to NStorm)
- updated print_dez96() to use int96 data type (thanks to NStorm)
  - converted some hard-coded values to macros

version 0.23.5
==============================
- includes changes backported from mfaktc 0.24.0

bug fixes:
- added build flags to enable support for Blackwell devices by default in
  Makefile.win (thanks to yellowbeeblackbee)

build:
- enabled more compiler optimizations by default for a slight performance gain
  (thanks to NStorm)
- various improvements to GitHub Actions workflows (thanks to NStorm)
  - support for additional CUDA versions
  - removed dependency on deprecated windows-2019 images
  - code analysis with CodeQL
- improved visual appearance and clarity of release information (thanks to
  Danny Chia)
- added CUDA 12.9 support in GitHub Actions (thanks to NStorm)

other changes:
- re-organized mfaktc.ini to be more user-friendly (thanks to James Heinrich)

version 0.23.4
==============================
- includes changes backported from mfaktc 0.24.0

bug fixes:
- fixed typos and formatting in output
- fix some compilation warnings
- the maximum number of threads per SM should no longer exceed the limit for
  certain architectures
- CUDA versions are now correctly detected

build:
- made the build helper script more robust

other changes:
- replaced deprecated CUDA function calls

version 0.23.3
==============================
bug fixes:
- fixed "random self-test offset" errors at startup; Blackwell devices are now
  correctly supported (thanks to Dmitry Bolshakov)

build:
- updated CUDA version to 12.8.1 in GitHub Actions

version 0.23.2
==============================
bug fixes:
- fixed a log file corruption issue introduced in 0.23.1

build:
- cleaned up makefiles
  - avoid hard-coded paths to specific CUDA installations
  - moved mfaktc.ini to 'src' so custom settings could be excluded from the
    repository
- continuous integration via GitHub Actions (thanks to NStorm)

version 0.23.1
==============================
bug fixes:
- on some Linux distributions, multiple factors per assignment weren't reported
  in the JSON output correctly
- fixed line ending bug

other changes:
- lowered default GPUSieveSize value to 256 to prevent issues on older GPUs

version 0.23.0
==============================
contributors:
- GP2
  - AID-saving code
- tybusby
  - other changes

new features:
- added results.json.txt to comply with PrimeNet results reporting standards,
  with the following new fields
  - aid (assignment ID)
  - os (operating system info)
  - timestamp (completion timestamp)
  - and more!
- added option for logging to mfaktc.log (off by default)
- found factors are now stored in the .ckp file
- elapsed time is now stored in the .ckp file for accurate progress reporting
- changed all timestamps to UTC to meet GIMPS results reporting standards

other changes
- switched to semantic versioning

version 0.22
==============================
- unreleased
  - code was privately shared with Tyler Busby
  - changes will be merged into mfaktc 0.24

new features:
- CRC32 checksums to reduce invalid results
- improved performance on Pascal devices
- improved checkpoint compatibility between versions
  - file names now contain metadata

other changes:
- dropped support for compute capability 1.x and 32-bit builds
- replaced deprecated cudaThreadSynchronize() with cudaDeviceSynchronize()

version 0.21 (2015-02-17)
==============================
contributors:
- Jerry Hallett
  - Windows compatibility
  - binaries
  - lots of testing

new features:
- added support for Wagstaff numbers: (2^p + 1) / 3
- added support for 'worktodo.add'

enhancements:
- enabled GPU sieving on compute capability 1.x GPUs
- dropped lower limit for exponents from 1,000,000 to 100,000
- reworked the '-st' and '-st2' self-tests
  - both options now run the full set of test cases
  - narrowed the '-st' search space to k_min < k_factor < k_max to save time
- some possible speedups
  - funnel shift for compute capability 3.5 and above
  - slightly faster integer division in barrett_{76|77|79} kernels
- print per-kernel stats for the '-st' and '-st2' self-tests

bug fixes:
- GPU sieve: mfaktc could run out of shared memory - this may be the cause of
  some reported crashes. It occurs when you...
  - have a GPU with a relatively small amount of shared memory
  - have a LOW value for GPUSievePrimes
  - have a HIGH value for GPUSieveSize
- there was a relatively small chance to miss a factor if GPUSieveProcessSize
  is set to 24 *and* GPUSieveSize is not a multiple of 3
- SievePrimesAdjust could cause SievePrimes to be lowered to SievePrimesMin for
  very short jobs

build:
- added missing dependencies to Windows makefiles

other changes:
- added a random offset to the self-test to detect potential bugs in the
  sieving code. A static offset always tests the same value and would miss such
  issues
- lots of cleanup and removal of duplicate code

version 0.20 (2012-12-30)
==============================
contributors:
- George Woltman (Prime95 author, http://mersenne.org)

enhancements:
- GPU sieving is now supported on GPUs with compute capability 2.0 and above;
  thank you very much, George!
  - enabled by default but can be configured with SieveOnGPU in mfaktc.ini
  - must be manually disabled for devices with compute capability 1.x
- new kernels, thanks to George:
  - barrett77_mul32 (derived from barrett79_mul32)
  - barrett87_mul32 (derived from barrett92_mul32)
  - barrett88_mul32 (derived from barrett92_mul32)
- minor performance improvement in the barrett76 kernel

other changes:
- moved some common code used in multiple places to tf_96bit_helper.cu
  and tf_barrett96_core.cu
- new default progress header and output lines
  - can be configured with ProgressHeader and ProgressFormat in mfaktc.ini

version 0.19 (2012-08-12)
==============================
contributors:
- Bertram Franz (mfakto author)
- Ethan / 'Ethan (EO)' on http://mersenneforum.org
- George Woltman (Prime95 author, http://mersenne.org)

new features:
- user-configurable status line (merged from mfakto)

enhancements:
- few (although not huge) optimizations for low-end Kepler GPUs - only affects
  Barrett reduction kernels
- new barrett76_mul32 kernel thanks to an idea by George Woltman: up to 23%
  faster than the previous fastest kernel barrett79_mul32 - the new kernel
  is good for factor candidates in the range of 64 to 76 bits
- improved squaring functions in the Barrett reduction and the 75-bit and
  95-bit kernels (George Woltman)
  - the Barrett kernels are up to 3% faster on compute capability 2.0 GPUs

other changes:
- a lot of cleanup and refactoring of the code
- SievePrimesMin lowered to 2000 - not as useful but requested quite often
- added the correct number of compute cores per multiprocessor in low-end
  Kepler GPUs
- removed debug option VERBOSE_TIMING
- micro-optimization of code that initializes the results array (suggested by
  Ethan)
- don't print an error message if deleting a checkpoint fails because there
  was no checkpoint
- moved base math functions from kernels that use "full 32-bit words" to
  tf_96bit_base_math.cu
  - add
  - subtract
  - compare
  - multiply and square

version 0.18 (2011-12-17)
==============================
contributors:
- Eric Christenson

new features:
- command-line option '-st2' runs an even longer self-test with additional
  test cases
- command-line option '-v' controls the verbosity and determines how much
  information is printed (suggested by 'aspen' on http://mersenneforum.org)
- added a (simple) signal handler for SIGINT and SIGTERM to allow mfaktc to
  exit gracefully on Ctrl + C
    1st time: mfaktc will exit after the currently processed class is finished
    2nd time: mfaktc will stop immediately
- added a minimum delay between two checkpoint file writes
  - can be configured with the CheckpointDelay option in mfaktc.ini

enhancements:
- replaced the cmp_72() and cmp_96() functions with cmp_ge_72() and cmp_ge_96()
  in all GPU kernels. Unlike the cmp_*() functions - which checked whether one
  value was equal to, or smaller or greater than the second - the cmp_ge_*()
  functions only check if one is greater than or equal to the other, resulting
  in a very small (around 1% or less) performance improvement across the board.
  Thanks to 'bdot' on http://mersenneforum.org for the suggestion!
- changed the lower limit of the barrett92 kernel to 79 bits for a very small
  performance enhancement :-)
- on finding a factor, mfaktc now saves additional information (such as program
  name, versions and bit levels) to the results line
  - James Heinrich is working to support this on the server side; PrimeNet
    should give more accurate credit for "factor found" results once this is
    fully implemented
- mfaktc can now load a checkpoint from either a Linux or Windows version on
  either OS as long as the executable and the checkpoint file have the same
  version number
- the barrett92_mul32 kernel is a little bit faster due to an improved squaring
  function
- added a new code path to the barrett79_mul32 and barrett92_mul32 kernels
  - CUDA and above 4.1 support multiply-accumulate with carry for compute
    capability 2.0 and higher
  - on my GeForce GTX 470 GPU, this yields up to 15% extra throughput for
    barrett92_mul32 and up to 7% for barrett79_mul32

bug fixes:
- edge case: if StopAfterFactor is set to 2 and a factor if found in
  the very last class, then the result output will not be "partially tested"

other changes:
- auto-adjustment of SievePrimes is now less dependent on the grid size and
  absolute speed. Instead of measuring the absolute (average) wait time per
  processing block as determined by GridSize, the relative time spent waiting
  for the GPU is now calculated
  - the per-class output now shows "CPU wait" instead of "avg. wait"
- added even more code for CHECKS_MODBASECASE debug option. So far, this new
  code runs without issues :-)
- cleanup: there is now only one function which checks if a kernel is possible
- some changes in preparation for automated PrimeNet interaction
  - two new functions in parse.c to calculate GHz-days: amount_of_work() and
    amount_of_work_in_worktodo()
  - second rewrite of code to handle worktodo.txt (thanks to Eric Christenson)
- minor cosmetic changes in the code, such as mfakt -> mfaktc in function names
- CUDA versions are now checked more rigorously
  - the runtime must match the version used to compile mfaktc
  - the driver must have the same or newer version
  - see the programming guide from Nvidia for more information:
    https://docs.nvidia.com/cuda/archive/9.1/pdf/CUDA_C_Programming_Guide.pdf
- reordered the columns in the per-class output

version 0.17 (2011-05-06)
==============================
new features:
- mfaktc can now allow the CPU to sleep instead of running a busy loop if all
  GPU streams are busy and all possible CPU streams are pre-processed. This can
  be enabled or disabled with the AllowSleep option in mfaktc.ini

enhancements:
- mfaktc now reports the architecture it is compiled for (either 32 or 64 bits)
- show all debug options enabled at compile-time (disabled options are no
  longer shown)

other changes:
- replaced the compile-time option THREADS_PER_GRID_MAX with the runtime option
  GridSize in mfaktc.ini
- aligned the elapsed runtime and estimated total runtime in the screen output
  for restarted runs

version 0.16p1 (2011-03-15)
==============================
bug fixes:
- replaced all type conversions from the 'unsigned int' to 'float' data types:
    old code: <float variable> = (float)<unsigned int variable>;
    new code: <float variable> = __uint2float_rn(<unsigned int variable>);
  - this resolves the failed constant computation at compile time with CUDA
    Toolkit 3.0 and 3.1
  - not tested on older CUDA Toolkit versions

version 0.16 (2011-03-13)
==============================
enhancements:
- the barrett92 kernel is up to 5% faster, and barrett79 is up to 18% faster
- changed priority of the kernels for compute capability 1.x because the
  barrett79 kernel is now faster than the 75-bit kernel
- inform the user and give the reason if a line from worktodo.txt is ignored

other changes:
- changed the layout of the per-class status lines. It is possible to select
  between the "new line" or "same line" modes with the new PrintMode option in
  mfaktc.ini
- minor adjustments to screen outputs
- thanks to James Heinrich for keeping an eye on screen outputs, and the
  initial idea for the new layout of the per-class status line!

version 0.15 (2011-02-14)
==============================
bug fixes:
- fixed a wrong format string introduced in mfaktc 0.14

other changes:
- some minor cleanups
  - removed unused parameters
- mfaktc now creates only one checkpoint file per exponent
  - new checkpoint files are named 'M<exponent>.ckp'
- complete rewrite of code to handle worktodo.txt files
- added lots of cudaGetLastError() calls; this will hopefully generate more
  useful error messages if a CUDA function fails
- moved the check for a valid assignment from mfaktc.c to parse.c

version 0.14 (2011-01-23)
==============================
bug fixes:
- make sure the biggest prime used in the sieve is smaller than the exponent
  itself
- various fixes in the debug code
  - warnings about unexpected high 'qi' values are ignored when the factor
    candidate is out of the specified range
  - working sets now have a fixed size

other changes:
- renamed tf_barrett92.* to tf_barrett96.* as the file names of GPU kernels
  contain the size of data types for the long integers, not the maximum
  supported factor candidate size...
- the barrett_79 kernel is no longer a stripped-down version of the
  barrett_92 kernel:
  - tf_barrett96.cu no longer needs to be compiled twice
  - faster barrett_79 kernel:
    - about 10% on my GeForce GTX 470 (GF100 chip, compute capability 2.0)
    - about 3-4% on my GeForce GTX 275 (GT200B chip, compute capability 1.3)
  - not limited to a single bit level anymore
- modified the screen output per class a little bit to show the total number of
  classes

version 0.13p1 (2010-12-05)
==============================
bug fixes:
- resolved an issue which prevented a proper build of a 32-bit Windows binary
  - there was a problem converting "long double" to "unsigned long long int" in
    calculate_k() in mfaktc.c because the conversion was limited to 2^63 - 1
    instead of the expected 2^64 - 1 for an "unsigned long long int"
  - the new code is all integer-based :-)

version 0.13 (2010-10-26)
==============================
contributors:
- Ethan / 'Ethan (EO)' on http://mersenneforum.org

changes:
- modified the stream scheduler (again) to allow more than one data set to be
  pre-computed
  Old behaviour:
    1) pre-compute one data set
    2) start one data set and wait for a stream if needed
    3) goto 1
  Current behaviour:
    1) if a free data set is available: pre-compute one dataset
    2) try to start as many data sets as possible *without* waiting for an
       empty stream
    3) goto 1
- print and check CUDA versions (compiled and current CUDA version)
- screen output is now aligned when mfaktc starts
- some code cleanup by Ethan:
  - use atomicInc() for synchronisation of access to RES[] (GPU code)
  - use cudaThreadSynchronize() to wait for all running streams instead of
    calling cudaStreamSynchronize() for each stream

version 0.12 (2010-09-28)
==============================
contributors:
- Dave / 'amphoria' on http://mersenneforum.org
- Kevin / 'kjaget' on http://mersenneforum.org

enhancements:
- added two new kernels; both perform Barrett reduction to avoid most
  of the costly long divisions. Great speed on newer GPUs with compute
  capability 2.0 and above :-)
- expanded the kernel selection code
- output is more human-readable. For example, 'M' and 'G' suffixes are used to
  keep numbers small
- each test case in the self-test is now run with all suitable GPU kernels
  instead of just the "optimal" kernel
- new RAW_GPU_BENCH debug option in params.h (disables sieving more or less)
- tweaked the automatic adjustment of SievePrimes
- use "launch bounds" to control the register usage in the GPU code. This
  allows code optimized for the sm_11 and sm_20 architectures to be included in
  the binary

build:
- added a new makefile
- moved source files into the 'src' folder
- added a makefile for Windows
  - run 'make -f Makefile.win' to compile
  - initially created by Kevin and modified for the latest mfaktc version by
    Dave - thank you!
  - Dave has also written some instructions to compile mfaktc on Windows. Take
    a look at README.txt for details

other changes:
- renamed the debug option HAS_DEVICE_PRINTF to USE_DEVICE_PRINTF

version 0.11 (2010-09-01)
==============================
enhancements:
- debugging code heavily modified to work on the GPU and not in device
  emulation mode on the CPU
  - this led to the discovery of a computational bug - see below for details!
- improved the sieving code. It is now about 20% faster on my Core i7
- the 75-bit kernel is about 2% faster

bug fixes:
- some minor fixes (such as printf() statements)
- lowered 'ff' in mfaktc_*() functions. It was too large in some cases and
  could result in a missed factor :-(
  - on compute capability 1.x GPUs:
      factor size    | chance to miss
      ---------------+--------------------
      24 ~ 24.2 bits | < 0.1%
      56 ~ 56.2 bits | < 0.1%
      88 ~ 88.2 bits | < 0.1%
      other ranges   | < 0.001%
  - on compute capability 2.x GPUs:
      factor size    | chance to miss
      ---------------+--------------------
      24 ~ 24.2 bits | < 0.1%
      56 ~ 56.2 bits | < 0.1%
      88 ~ 88.2 bits | < 0.1%
      other ranges   | very small if not 0%

version 0.10 (2010-07-26)
==============================
new features:
- two new options in the mfaktc.ini file: Stages and StopAfterFactor

other changes:
- modified the stream scheduler. mfaktc previously assumed the streams were
  executed in the order started
- the number of threads per grid is no longer a compile-time option
    - the maximum number of threads per grid is defined at compilation
    - the actual number of threads per grid is calculated at runtime based on
      the number of multiprocessors in the CPU and THREADS_PER_BLOCK
- GPUs with compute capability 1.0 are not officially supported
  - to my knowledge, the only GPU with compute capability 1.0 is the G80 chip
    in the GeForce 8800 Ultra / GTX / GTS 640 / GTS 320 and their Quadro and
    Tesla variants. An exception is the 8800 GTS 512 contains a G92 GPU
  - the issue seems to be related to the synchronisation of writes to *d_RES
  - this may be fixed in a future release, but it may not be worth the effort
    due to the relatively small number of G80 GPUs in circulation
- moved tf_*() from tf_72bit.cu and tf_96bit.cu to tf_common.cu
  - the code in these functions is very similar in both cases; there are only a
    few differences that are now controlled by #ifdef directives

version 0.9 (2010-07-09)
==============================
new features:
- added a basic test for timer resolution
  - can be launched with the command-line option '-tt'

enhancements:
- the self-test with "known factors" can now be started with the command-line
  option '-st'
- the self-test doesn't write factors to results.txt anymore
- the self-test now checks if a reported factor is the known factor
- a small self-test (currently 9 known factors) now always runs when mfaktc
  starts
- added cudaGetLastError() to check for errors
- added 10 known (composite) factors in the range of 90 to 95 bits to the
  self-test routine
- optimized the calculation of the factor candidate in the mfakt_95() and
  mfakt_75() functions
  - this gives a very small performance improvement for the 95-bit and 75-bit
    kernels, and saves one register (or four bytes) of l_mem

other changes:
- declared most GPU functions as static. This was needed because CUDA Toolkit
  3.1 builds the GPU functions as global by default now...

version 0.8 (2010-06-09)
==============================
new features:
- added a new GPU kernel for factors up to 95 bits
  - this also results in a new kernel for factors up to 75 bits
- added checkpoints
  - needs some fine-tuning but should work

enhancements:
- added more "known factors" above 71 bits to the self-test

bug fixes:
- fixed a signedness bug in parsing the exponent from the command line

build:
- added a makefile for Linux

other changes:
- renamed the function tf_class() to tf_class_71() and mfakt() to mfakt_71()
  in tf_72bit.cu as multiple GPU kernels are now available
- added two more hints to the self-test routine: k_min_hint and k_max_hint

version 0.7 (2010-05-27)
==============================
contributors:
- Luigi / 'ET_' on http://mersenneforum.org
- Kevin / 'kjaget' on http://mersenneforum.org

new features:
- integrated Luigi's functions to parse Prime95-style worktodo.txt files
- new runtime option: NumStreams (suggested by Kevin)
- added a (simple) command-line interface

enhancements:
- added code to check if an exponent and bit_{min|max} have supported sizes
- sieving code is now faster, at least on a Core i7 CPU

bug fixes:
- fixed a division by zero caused by a time measurement
- fixed a wrong data type in printf() in the debug code
- some minor fixes (compile warnings, return values, types)

build:
- mfaktc should now compile on Windows - thank you, Kevin!

other changes:
- removed code path for "non-async memory transfers" (compile-time option)

version 0.6 (2010-04-28)
==============================
enhancements:
- some parameters in mfaktc.ini can now be changed without recompiling mfaktc
- two CUDA streams are used now (as opposed to just one)
  - this allows memory transfers (offset tables in k_tab) and GPU computation
    at the same time on newer GPUs, resulting in a small "free" performance
    increase as the GPU doesn't need to be idle during the k_tab upload
- added some safety checks for compile-time and runtime parameters

other changes:
- split the code into several smaller files
- marked some compile-time parameters in params.h that should not be changed
  without a good reason

version 0.5 (2010-02-22)
==============================
enhancements:
- further unrolled the loop which creates the candidate list
- added a lot more self-tests
- query some device information
  - added some additional checks; for example, THREADS_PER_GRID should be a
    multiple of THREADS_PER_BLOCK * <number of multiprocessors on the device>

bug fixes:
- first attempt to fix a known issue that could affect multiple factors found
  "close together"

other changes:
- replaced the "ptx-hack compile script" with inline PTX assembly
  - easier access to addition and subtraction with carry
- some fine-tuning of offsets for steps in mod_basecase.cu
- saved one register
  - NVCC 2.3 reduces the number from 17 down to 16
  - this helps increase occupancy especially on devices with only 8,192
    registers per block
- reduced how often the copyright notice is printed

version 0.4 (2010-01-28)
==============================
enhancements:
- the new timer added in 0.3 now has its own compile-time option

bug fixes:
- mfakt() function: 'ff' was too large and sometimes overestimated part of the
  quotient

other changes:
- some cleanup
  - removed unused code
- a lot of changes in mod_basecase.cu
  - reduced the number of steps from 5 to 4
  - changed offsets
    - 20-bit difference per offset (down from 21 bits)
  - modified left shift (variable nn)
  - modified subtraction (q = q - nn)

version 0.3 (2010-01-20)
==============================
enhancements:
- allow exponents up to 32 bits
  - tested with some exponents in the M3321xxxxxx range
- siever: improved the loop which creates the candidate list (again)
  - loop unrolled
  - use a lookup table to parse 8 bits at once
- added 40 known factors from "Operation Billion Digits" in the M3321xxxxxx
  range to the self-test
- added another timer to help adjust SIEVE_PRIMES
  - needs to be enabled with the debug option VERBOSE_TIMING

version 0.2 (2010-01-13)
==============================
enhancements:
- allocate and free arrays only once (as opposed to per class)
- siever: improved the loop which creates the candidate list

bug fixes:
- fixed some printf() statements

other changes:
- added check of return values of most *alloc() calls
