blob: cc7ced4e439c8a40b400ed36ee2af2c8b1487b48 [file] [log] [blame]
When using Montgomery reduction, some advantage comes from "fusing" the
multiplication with the modular reduction and unrolling the loops.
For a 32-bit build and if for example using 256 bit BIGs and a base of 2^28
with the NIST256 curve, replace XXX with 256_28 and YYY with NIST256 in
fastest.c
Then compile and execute the program fastest.c like this (using MinGW
port of GCC as an example), in the same directory as arch.h and fp_NIST256.h
gcc -O2 -std=c99 fastest.c -o fastest.exe
fastest > t.txt
Now extract the code fragment from t.txt and insert it where indicated
into fp_NIST256.c (look for FUSED_MODMUL)
Finally make sure that
#define FUSED_MODMUL
appears somewhere in fp_NIST256.h
Finally compile and replace the fp_YYY module in the library, and maybe
get a 30% speed-up! If there is no significant improvement, don't use this
method!
NOTE: This method is experimental. It might impact on numerical stability.