• Listen to a special audio message from Bill Roper to the Hive Workshop community (Bill is a former Vice President of Blizzard Entertainment, Producer, Designer, Musician, Voice Actor) 🔗Click here to hear his message!
  • Read Evilhog's interview with Gregory Alper, the original composer of the music for WarCraft: Orcs & Humans 🔗Click here to read the full interview.

[C] [SSE] squaring complex numbers

Status
Not open for further replies.
Level 6
Joined
Aug 26, 2009
Messages
192
Hey guys, i gotta do some stuff with complex numbers and with SSE in C. It must be fucking fast, therefore i have to make as much optimizations as possible. I first have to square as many complex numbers z_i as possible then i have to add a constant complex number c_i to it and then check if its absolute value is within a given radius (Mandelbrot set). Therefore i have to calculate:

z_new = z * z + c and i have to see if |z_new| < r.

Right now I'm doing it like this:

C:
__m128 c = random stuff;
__m128 z = random stuff;

__m128 temp1 = _mm_moveldup_ps(z);
temp1 = _mm_mul_ps(temp1, z);

__m128 temp2 = _mm_movehdup_ps(z);
temp2 = _mm_mul_ps(temp2, z);

temp2 = _mm_shuffle_ps(temp2, temp2, _MM_SHUFFLE(2,3,0,1));

As you see, I'm working always with 2 complex numbers, both being in z and the constant in c. Something like:
c = (re(c1),im(c1),re(c2),im(c2)).
After the instructions above I have z and c as above and
temp1 = (re(z1)^2, re(z1)im(z1), re(z2)^2, re(z2)im(z2)),
temp2 = (im(z1)^2, re(z1)im(z1), im(z2)^2, re(z2)im(z2)).
Therefore, i can use _mm_addsub_ps() to obtain z^2, while i can use _mm_add_ps() to obtain |z|^2. The absolute value is now checked for its radius and then we can just use another _mm_add_ps() to add c to z^2.

Ofc, there's some more stuff going on, like calculating the colors etc, but I'm wondering, if anybody has an idea how to optimize this further.

Anotherthing is, that I have to calculate a color for the given iterations of the Mandelbrot set. We have to use YUV and convert it to RGb, but since this needs 3 values, I don't have any Idea how to do this for two complex numbers in the same time if there is any.
 
Last edited by a moderator:

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,258
Please define your complex number type... Is it a pair of single precision floats? Pair of double precision floats? Pair of signed ints? This makes a huge difference.

I am guessing a pair of single precision floats?

i can use _mm_addsub_ps() to obtain z^2
But that is SSE3? This violates your SSE requirements.

We have to use YUV and convert it to RGb, but since this needs 3 values, I don't have any Idea how to do this for two complex numbers in the same time if there is any.
R only needs YV and B only needs YU. This means both can be computed from 2 single float complex numbers in 128 bits efficiently. G is the problem since it needs all YUV to compute so the only option is to use 2 instruction for either the real and imaginary parts or run 2 computation cycles with the second only half utilized.

For R and B you can compute them both simultaneously by loading 2 Y into a register and then adding to it the result from a constant multiplication with V and U. This would mean the RB values are computed in 1 multiplication and 1 addition (with some data preparation). G is more tricky and will probably need you to load UV at once, scale them, sum them and then add with Y. Not sure how SUM is done but it will waste the final addition but will need 1 multiplication, 1 SUM and 1 addition.

However, if you compute 2 YUV to RGB values at a time, things become a lot more efficient overall since there will be no wastage and fewer data consolidation steps. You create 3 registers, each for the YUV channels with 2 values in them (both real and complex). You then create scaling registers for the U and V scales (or if they can be constants in the instructions that is even better). You can use the Y register to start to computation (as its scale is always 1) and then add to it the product of the other channels with their appropriate multipliers.

After setup, it would take 1 multiplication and 1 addition each for R and B and 2 multiplications and 2 additions for G. The end result would be complex numbers for 2 RGB values from 2 YUV values.
 
Level 6
Joined
Aug 26, 2009
Messages
192
Thanks for the answer, that might be an idea.

Please define your complex number type... Is it a pair of single precision floats? Pair of double precision floats? Pair of signed ints? This makes a huge difference.

I am guessing a pair of single precision floats?

Obv float. You see im using 4 numbers in a 128 register, therefore 4 32bit numbers and since im talking about the Mandelbrot set, integers don't make any sense at all.

But that is SSE3? This violates your SSE requirements.

We can use SSE, SSE2, SSE3.

edit: oh i forgot. We have already values
Y = 0.2
U = 2 * iterations/maxIterations - 1
V = 0.5 - iterations/maxIterations

Therefore we can calculate the vector except the iterations/maxIterations part, since these are variable and obtain the vector for RGB with quot = iterations/maxIterations
(bla - bla * quot, bla - bla * quot, -bla + bla * quot).
Thus, I just need 1 division (iterations/maxIterations), 1 SSE multiplication and one addition for every complex number. And I think it's not possible to optimize this any further, is it?
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,258
Obv float.
Usually mathematical things use doubles since singles are not accurate enough. Singles are only used where performance is required such as real time systems/programs.

And I think it's not possible to optimize this any further, is it?
Depends on how it is pipelined in the processor. This is why humans should not be optimizing.

If you were to alter the set of registers used and run 2 such complex number operations at once, it may pipe line them more efficiently and thus allow for faster execution. However it is very difficult to tell and entirely processor dependant.
 
Level 6
Joined
Aug 26, 2009
Messages
192
Usually mathematical things use doubles since singles are not accurate enough. Singles are only used where performance is required such as real time systems/programs.

This is for computer science, not mathematics. And since I used _4_ numbers in a 128 bit register, it must be float, not double.
Anyway, thanks for you answer, I think I can't do much more and i only have 45 minutes left :D!
 
Status
Not open for further replies.
Top