CUDA 2.3 compatible NVOPENCC compiler with extended intrinsics support
The compiler supports the following new intrinsics:
- __mul24hi / __umul24hi: returns 32 MSB bits of 48-bit product of 24-bit integer operands. Signed and unsigned versions respectively.
- __mad24hi / __umad24hi: 24-bit integer multiply-add (returns 32 MSB bits after multiplication). Signed and unsigned versions respectively.
- __addo / __uaddo: signed and unsigned 32-bit addition with carry flag set.
Used with subsequent __addc / __uaddc operations.
- __addc / __uaddc: signed and unsigned addition-with-carry. Carry flag after addition is set automatically.
Original compiler sources are available at: nvopencc compiler.
Modified sources: diff.
Installation instructions:
TXT
Header file with intrinsics declaration: header
Compiler binaries built for 32-bit Linux platform: ZIP (~ 5.0 Mb)
You can find my contact details at the homepage.