- Joined
- Aug 26, 2009
- Messages
- 192
Hey guys, i gotta do some stuff with complex numbers and with SSE in C. It must be fucking fast, therefore i have to make as much optimizations as possible. I first have to square as many complex numbers z_i as possible then i have to add a constant complex number c_i to it and then check if its absolute value is within a given radius (Mandelbrot set). Therefore i have to calculate:
z_new = z * z + c and i have to see if |z_new| < r.
Right now I'm doing it like this:
As you see, I'm working always with 2 complex numbers, both being in z and the constant in c. Something like:
c = (re(c1),im(c1),re(c2),im(c2)).
After the instructions above I have z and c as above and
temp1 = (re(z1)^2, re(z1)im(z1), re(z2)^2, re(z2)im(z2)),
temp2 = (im(z1)^2, re(z1)im(z1), im(z2)^2, re(z2)im(z2)).
Therefore, i can use _mm_addsub_ps() to obtain z^2, while i can use _mm_add_ps() to obtain |z|^2. The absolute value is now checked for its radius and then we can just use another _mm_add_ps() to add c to z^2.
Ofc, there's some more stuff going on, like calculating the colors etc, but I'm wondering, if anybody has an idea how to optimize this further.
Anotherthing is, that I have to calculate a color for the given iterations of the Mandelbrot set. We have to use YUV and convert it to RGb, but since this needs 3 values, I don't have any Idea how to do this for two complex numbers in the same time if there is any.
z_new = z * z + c and i have to see if |z_new| < r.
Right now I'm doing it like this:
C:
__m128 c = random stuff;
__m128 z = random stuff;
__m128 temp1 = _mm_moveldup_ps(z);
temp1 = _mm_mul_ps(temp1, z);
__m128 temp2 = _mm_movehdup_ps(z);
temp2 = _mm_mul_ps(temp2, z);
temp2 = _mm_shuffle_ps(temp2, temp2, _MM_SHUFFLE(2,3,0,1));
As you see, I'm working always with 2 complex numbers, both being in z and the constant in c. Something like:
c = (re(c1),im(c1),re(c2),im(c2)).
After the instructions above I have z and c as above and
temp1 = (re(z1)^2, re(z1)im(z1), re(z2)^2, re(z2)im(z2)),
temp2 = (im(z1)^2, re(z1)im(z1), im(z2)^2, re(z2)im(z2)).
Therefore, i can use _mm_addsub_ps() to obtain z^2, while i can use _mm_add_ps() to obtain |z|^2. The absolute value is now checked for its radius and then we can just use another _mm_add_ps() to add c to z^2.
Ofc, there's some more stuff going on, like calculating the colors etc, but I'm wondering, if anybody has an idea how to optimize this further.
Anotherthing is, that I have to calculate a color for the given iterations of the Mandelbrot set. We have to use YUV and convert it to RGb, but since this needs 3 values, I don't have any Idea how to do this for two complex numbers in the same time if there is any.
Last edited by a moderator: