Wednesday, July 21, 2021

Multiplying For Lights And 3D

 Because two 32 bits numbers multiplied gives (\(2^{32}*2^{32}=2^{64}\)) at most, a 64 bits number with no carry) and

\(\left(\begin{matrix}d&e&f\\g&h&i\\j&k&l\end{matrix}\right)\left(\begin{matrix}a\\b\\c\end{matrix}\right)=\left(\begin{matrix}d*a+e*b+f*c\\g*a+h*b+i*c\\j*a+k*b+l*c\end{matrix}\right)\)

we have,

\(a*(d\_g\_j)\)

\(b*(e\_h\_k)\)

\(c*(f\_i\_l)\)

where \(d\_g\_j\), \(d\_g\_j\) and \(f\_i\_l\) are long packed register that contain 32 bits numerals at 0, 64, 128 bit boundaries.

\(\overbrace{\square...\square_{160}...d|_{128}}^{\text{64 bits}}\underbrace{\square...\square\square_{96}...g|_{64}}_{\text{64 bits}}\overbrace{\square...\square\square_{32}...j|_0}^{\text{64 bits}}\)

All three multiplications are done simultaneously. And the sum, \(d*a+e*b+f*c\) is obtained by adding across the results after the simultaneous multiplications at the 0, 64 and 128 bits boundaries.  

\(\square...\square_{160}...a*d|_{128}\square...\square\square_{96}...a*g|_{64}\square...\square\square_{32}...a*j|_0\\+\)

\(\square...\square_{160}...b*e|_{128}\square...\square\square_{96}...b*h|_{64}\square...\square\square_{32}...b*k|_0\\+\)

\(\square...\square_{160}...c*f|_{128}\square...\square\square_{96}...c*i|_{64}\square...\square\square_{32}...c*l|_0\)

Each partial sums of the long packed registers may have summation carries from bits 63, 127 and 191.  These are indicative of overflow. All three additions are done simultaneously.

In hardware, with long packed registers, this \((3\times 3)(3\times 1)\) matrix multiplication, can be implemented in one multiply and one addition clock cycles.