JASS Benchmarking Results

LeP · Jan 3, 2017

In my tests IsUnitType worked just fine in 1.27 in filterfuncs

Dr Super Good · Jan 3, 2017

DracoL1ch said:
At least I didn't try to mess with boolexpr ops And, Or and following (who the hell uses them anyway?)

Everyone who uses GUI so like >90% of people.

DracoL1ch · Jan 3, 2017

Dr Super Good said:
Everyone who uses GUI so like >90% of people.

GUI isn't topic of discuss anyway, here's real jassers speakin

edo494 · Jan 3, 2017

1.26a is affected by this for sure, dont know about 1.27+

DracoL1ch · Jan 3, 2017

edo494 said:
1.26a is affected by this for sure, dont know about 1.27+

could you provide the code?
blizz didnt change a fuck during new patches, dont worry, everything broken is still broken

Dr Super Good · Jan 3, 2017

DracoL1ch said:
blizz didnt change a fuck during new patches, dont worry, everything broken is still broken

Except BLP files. Which are now slightly less broken and more strict.

Maybe it only applies to boolexprs made with Condition and not Filter.

DracoL1ch · Jan 3, 2017

Dr Super Good said:
Except BLP files. Which are now slightly less broken and more strict.

well they fixed some stuff with lightnings and other stuff, but here we are speaking of jass

Aniki · Jan 10, 2017

This is small comparison between structs/arrays, hashtables and gamecache.
The idea for it comes from Tc's XColl (600 Spheres) but this version is not as cool =).

DracoL1ch · Jan 10, 2017

Aniki said:
This is small comparison between structs/arrays, hashtables and gamecache.
The idea for it comes from Tc's XColl (600 Spheres) but this version is not as cool =).

wc3 < 1.24 ?

Aniki · Jan 10, 2017

wc3 < 1.24 ?

Yes the XColl map attached at the post at wc3c.net predates patch 1.24 and it uses the return bug, although its easy to be fixed because most of the RB functions are not used.
It also uses some cJass so it is needed in order to compile the map as well.

Cokemonkey11 · Jan 21, 2017

I'm interested in a modern (1.27+ working) version if anyone sorts it out.

TriggerHappy · Jan 21, 2017

Cokemonkey11 said:
I'm interested in a modern (1.27+ working) version if anyone sorts it out.

Will do when MindWorX releases the new version of SharpCraft (soon).

DracoL1ch · Apr 2, 2017

anytime unit performs any action (move/attack/order/selection/etc) there's loop through all his abilities for looking which of them would like to trigger on this action. So, for those who cares about nanoseconds, another tip - decrease amount of abilities your units use. it will help a lot (comparing to other microoptimizations)

Trokkin · Apr 2, 2017

DracoL1ch said:
anytime unit performs any action (move/attack/order/selection/etc) there's loop through all his abilities for looking which of them would like to trigger on this action. So, for those who cares about nanoseconds, another tip - decrease amount of abilities your units use. it will help a lot (comparing to other microoptimizations)

Do passive abils counts? It would be a dramatic decrease with bonus systems since they overload unit with passives

DracoL1ch · Apr 2, 2017

Trokkin said:
Do passive abils counts? It would be a dramatic decrease with bonus systems since they overload unit with passives

literally any. every ability may have callback for an action, and game never knows, so re-check everytime

Bribe · Apr 3, 2017

Trokkin said:
Do passive abils counts? It would be a dramatic decrease with bonus systems since they overload unit with passives

Another reason to use Tomes instead, imo.

Trokkin · Apr 3, 2017

DracoL1ch said:
literally any. every ability may have callback for an action, and game never knows, so re-check everytime

Ok theb.

Bribe said:
Another reason to use Tomes instead, imo.

Can we? I don't remember any tome ability increasing, for example, armor or sight radius
I think we can reduce the amount of abilities needed for bonus through making them multi-levelled. For example, one abil per hex digit - 4 times fewer abilities in worst case.

Bribe · Apr 3, 2017

Those two you still need the usual BS.

Flux · Apr 3, 2017

Trokkin said:
Do passive abils counts? It would be a dramatic decrease with bonus systems since they overload unit with passives

I wouldn't think about it being "dramatic". DracoL1ch specifically mentioned "nanoseconds" and in Bonus systems that applies a binary approach for giving bonuses, the probability of having most number passive abilities at once is 1/(2^n) where n = number of abilities. Say the bonus system uses 12 abilities (-4096 to +4095 possible bonus damage), the only time your unit would have all 11 abilities is you applied +4095 bonus damage. Most of the time, we can estimate (or average) that the unit would have 6 extra abilities or n/2 (half of the total abilities). With that said, I would still very much use Bonus system after knowing this information. It's an interesting info but I don't think it affects much.

Trokkin · Apr 3, 2017

Flux said:
I wouldn't think about it being "dramatic". DracoL1ch specifically mentioned "nanoseconds" and in Bonus systems that applies a binary approach for giving bonuses, the probability of having most number passive abilities at once is 1/(2^n) where n = number of abilities. Say the bonus system uses 12 abilities (-4096 to +4095 possible bonus damage), the only time your unit would have all 11 abilities is you applied +4095 bonus damage. Most of the time, we can estimate (or average) that the unit would have 6 extra abilities or n/2 (half of the total abilities). With that said, I would still very much use Bonus system after knowing this information. It's an interesting info but I don't think it affects much.

Well -1 is worse than maximum + by one ability.
And I meant dramatic in that nano scale, and didn't meant not to use Bonus; only ways to improve it. Even on that scale.
However I think that sometimes that nanoseconds can grow up if there are too much orders evaluating at once, so it's not absolutely useless thing to improve.

DracoL1ch · Apr 3, 2017

if you're rly care about that, you should also remember my older post about abilitirs.

JASS Benchmarking Results

hashtables are awesome for that, although with vjass help it can be done via arrays too

DracoL1ch said:
JASS:

set b=not b

bit faster than

JASS:

set b=b==false

sentence, like ~40%. probably because game have to parse 3 ops (get the b, get the == equialent, cast false to 0) instead of 2 (get the b, get the 'not')

actually I was wrong, as usual when it comes to ops

leandrotp said:
not X is just 2 operations:
GETVAR R1, X
NOT R1
X == false is 5 operations:
GETVAR R1, X
PUSH R1 //Why they compile this with a fucking PUSH?
LITERAL R2, 0
POP R3
EQUAL R2,R3,result

TriggerHappy · Sep 12, 2018

If you want to benchmark on a newer patch (1.29.2) then you can with this tool: Custom Natives on 1.29.2

There is a demo map there as well (https://www.hiveworkshop.com/attachments/stopwatch-demo-w3x.306453/).

Here are some of my results on 1.29.2. Maybe I will update the main thread if I do some more exhaustive tests.

Code:

===================================
5000 iterations of ExecuteFunc took 38 milliseconds to finish.
5000 iterations of Call took 3 milliseconds to finish.

Call was 1166.667% faster than ExecuteFunc

===================================
5000 iterations of UnitAlive took 3 milliseconds to finish.
5000 iterations of GetWidgetLife took 4 milliseconds to finish.

UnitAlive is 33.33% faster than GetWidgetLife

===================================
5000 iterations of SetUnitPosition took 166 milliseconds to finish.
5000 iterations of SetUnitX/Y took 10 milliseconds to finish.

SetUnitX/Y was 1560.000% faster than SetUnitPosition

===================================
5000 iterations of ForGroup took 112 milliseconds to finish.
5000 iterations of FirstOfGroup took 4 milliseconds to finish.

FirstOfGroup was 2700.000% faster than ForGroup

Bribe · Sep 12, 2018

Please compare SetUnitX/Y with the new SetSpecialEffectPosition native?

TriggerHappy · Sep 13, 2018

Bribe said:
Please compare SetUnitX/Y with the new SetSpecialEffectPosition native?

I tested a few times and got consistent results.

Code:

5000 iterations of SetUnitX/Y took 11 milliseconds to finish.
5000 iterations of SetSpecialEffectPosition took 8 milliseconds to finish.

SetSpecialEffectPosition was 37.50% faster than SetUnitX/Y

Bribe · Sep 13, 2018

37.50 is a lot less than I was expecting, but I guess it shows how efficient SetUnitX/Y always have been. Obviously, a missile system is going to benefit far more than 37.50% since it won't need all those dummy units existing in the background.

Still, gains are gains.

DracoL1ch · Sep 13, 2018

Set*X/Y operates with all the same structure innerly, it's more about random delays and memory cache when you compare them

Bribe · Sep 13, 2018

DracoL1ch said:
Set*X/Y operates with all the same structure innerly, it's more about random delays and memory cache when you compare them

Only that it also has the overhead of enter-region events. I bet if the code were switching the unit between regions (with an event attached) you'd see a huge spike.

Trokkin · Sep 13, 2018

TriggerHappy, could you test that case?
And probably it's desirable to have more different cases with different amount of regions.

And one case to check if enter/leave events are processed completely during the setX/Y call, like it is with damage event.

Bribe · Sep 13, 2018

In the latter case, no that's going to have a "zero second" delay (so no interrupts).

Pyrogasm · Sep 14, 2018

Bribe said:
Only that it also has the overhead of enter-region events. I bet if the code were switching the unit between regions (with an event attached) you'd see a huge spike.

It also has "unit in range" overhead (which might be the same as aura update overhead, though that does lag noticeably behind quickly-moving units).

Bribe · Sep 14, 2018

Pyrogasm said:
It also has "unit in range" overhead (which might be the same as aura update overhead, though that does lag noticeably behind quickly-moving units).

In the instances being dealt with here, the dummy unit is likely to have the Locust ability and therefore not trigger auras or unit-in-range events.

@Trokkin I have forgotten to include that when using the Filter boolexpr of a TriggerRegisterEnterRegion, it does stop the current thread in order to interweave the enter-region event (that's what Unit Indexers use to detect the exact instant a unit is created). HOWEVER, any conditions or actions attached to the trigger will not run until that zero-second delay has elapsed.

TriggerHappy said:

I tested a few times and got consistent results.

Code:

5000 iterations of SetUnitX/Y took 11 milliseconds to finish.[/FONT][/LEFT]
[FONT=Verdana]
[LEFT]5000 iterations of SetSpecialEffectPosition took 8 milliseconds to finish.

SetSpecialEffectPosition was 37.50% faster than SetUnitX/Y

It occurs to me after checking the common.j library that SetSpecialEffectPosition also includes a Z variable. Therefore, the test would make more sense if it also included a third line for the unit regarding SetUnitFlyHeight.

Either that or just strictly comparing SetUnitX/Y with BlzSetSpecialEffectX/Y.

Pyrogasm · Sep 14, 2018

Bribe said:
In the instances being dealt with here, the dummy unit is likely to have the Locust ability and therefore not trigger auras or unit-in-range events.

You're right to say it wouldn't trigger any unit-in-ranges nor be subjected to any auras. Unless I am mistaken, even locusted units propagate their own auras (I know they wouldn't have any if they're units-as-sfx) and it would still need to check if any 'unit in range' events have been applied to the moved unit (though there wouldn't be, or unless the game doesn't check/fire those triggers when the source of the event gets moved). The check for it having any auras and for it having any unit-in-ranges associated with it have to add some bit of overhead there.

DracoL1ch · Sep 14, 2018

unit in range event has nothing to do with auras, auras perfomed in simple timers
you overthink this shit. Remember collect corpse ability from meat wagon? It makes a call for searching a corpse each time it's coordinates change. Literally. A fucking each time. 10k calls each second while it moving somewhere. And still game going just well with that. Look into other directions.

Pyrogasm · Sep 14, 2018

DracoL1ch said:
unit in range event has nothing to do with auras, auras perfomed in simple timers
you overthink this shit. Remember collect corpse ability from meat wagon? It makes a call for searching a corpse each time it's coordinates change. Literally. A fucking each time. 10k calls each second while it moving somewhere. And still game going just well with that. Look into other directions.

I'm not suggesting anywhere that they have anything to do with each other, and I don't think anyone is suggesting in that this optimization functionally matters except in extreme scenarios. This discussion is about curiosity regarding the difference between old natives we know and new natives we don't, and what contributes to their difference in execution. I'm not arguing with anyone, rather stating two further things that might cause additional overhead if they're actually relevant to UnitXY calls. For the curiosity, dude, not for obsessive code efficiency.

DracoL1ch · Sep 14, 2018

Not like I was talking directly to anyone, just saying these kind of data oftenly being used as "FASTER MEANS EVERYONE SHOULD CODE LIKE THAT" in the internet. So yet another PSA won't hurt.

Bribe · Sep 14, 2018

DracoL1ch said:
unit in range event has nothing to do with auras, auras perfomed in simple timers
you overthink this shit. Remember collect corpse ability from meat wagon? It makes a call for searching a corpse each time it's coordinates change. Literally. A fucking each time. 10k calls each second while it moving somewhere. And still game going just well with that. Look into other directions.

It is more likely to be a call every 0.10 seconds, like the UnitInRange events.

Frotty · Sep 14, 2018

DracoL1ch said:
just saying these kind of data oftenly being used as "FASTER MEANS EVERYONE SHOULD CODE LIKE THAT" in the internet.

Imho, I think it's actually just certain people heavily propagating that idiom, especially here on hive, and newer members following along.
Many members, including some of the oldest, perhaps more on other sites, have always been preferring and recommending "proper" programming paradigms over minor performance gains.

One neat possibility of course is to have some of these optimizations be done by the compiler automatically, if we know them well enough.

Thus @Topic, is there any way to benchmark on 1.30 currently? Any "confirmed" stuff coming from blizz? The first post seems a little out of date.

DracoL1ch · Sep 14, 2018

Bribe said:
It is more likely to be a call every 0.10 seconds, like the UnitInRange events.

what are you talking about? Im displaying long tested data, and 0.1 is too low for ingame stuff.

Bribe · Sep 14, 2018

DracoL1ch said:
what are you talking about? Im displaying long tested data, and 0.1 is too low for ingame stuff.

Relax. I would like to see that data.

DracoL1ch · Sep 14, 2018

1.26: sub_6F043770_AuraRecounter
dunno wheres collect corpse data, but you can create an instance and then put a watcher onto it's vtable to see how much requests are going there.

DracoL1ch · Nov 6, 2018

idk if I already wrote it or not, anyway

JASS:

set b=GetOwningPlayer(u)==p

vs

JASS:

set b=IsUnitOwnedByPlayer(u,p)

are identical in terms of speed, despite the fact 1st one adds yet another jass op, but 2nd one takes 1 more argument

Bribe · Nov 6, 2018

Compare IsUnit to == for the lulz

DracoL1ch · Nov 6, 2018

IsUnit() 2x slower than ==, for obvious reasons

DracoL1ch · Nov 20, 2018

function name matters when it's called. Longer the name, more microseconds it takes. Difference is laughable, but 1+1+1+1+1+1... +1 is still many. In case if you dont use auto-obfuscation like Vexorian you might consider spoofing names to shorter.
In my tests avg difference between 5 symbols funcname and 192 symbols funcname is 2.2. Means 5 letters executes 2.2x times faster. Result always about the same in 100 runs.

Same goes for local variables, declaring short names is faster, but difference is huge here.
1000 locals with 10 symbols length vs 1000 locals with 192 symbols length: 6-10x difference, heavy spread.
1000 locals with 10 symbols vs 1000 locals with 5 symbols: first one takes 1.5x time to execute comparing to second

Tracking globals init impossible for me, but I believe same goes there. Assessing globals with 192 symbols takes 10-20 times longer than globals with 5 symbols. At least they dont have to be created each time function called.

Couldn't track any difference in terms if amount of globals even matter, with 3k and 14k of them operations over any var performed with the same speed, which means its not required to clean up variables if they're left useless.
Same goes for amount of functions used, no difference between 8k and 18k

All of those been said long ago, but I didn't believe it because jass pre-filter everything to byte code before we start, so how could names ever affect it if names are never used but indexes of variables? Well, they are used for some reason.
All tests are made on 26, maybe new patches handle it better

Cokemonkey11 · Nov 20, 2018

On which NUMA nodes were you pinning your cores? Have you tried extracting a part of the API into SDL to run on a PCB?

I've had some success in the past using branch prediction hints.

Dr Super Good · Nov 20, 2018

Cokemonkey11 said:
On which NUMA nodes were you pinning your cores? Have you tried extracting a part of the API into SDL to run on a PCB?

What post is this aimed at?

DracoL1ch said:
All of those been said long ago, but I didn't believe it because jass pre-filter everything to byte code before we start, so how could names ever affect it if names are never used but indexes of variables? Well, they are used for some reason.

JASS is converted to bytecode to run, however the byte code is not statically linked. All variables are referenced by name rather than directly as an address, hence a hashtable look up operation is required which has a O(n) complexity string hash algorithm.

DracoL1ch · Nov 21, 2018

Dr Super Good said:
JASS is converted to bytecode to run, however the byte code is not statically linked. All variables are referenced by name rather than directly as an address, hence a hashtable look up operation is required which has a O(n) complexity string hash algorithm.

any explanation for dummies? Because in bytecode all those vars are referred by index, means they are pre-parsed, but then for some reason it restores var's name basing on index
in pseudo-code literally every access looks like that

Code:

        v12 = sub_6F44B2C0_VariableIDToStringName(*(_DWORD *)(v5 + 10328), v7);
        sub_6F44CBC0_JASSVM_GetVariableAddress((int)v11 + 144, v12);

leandrotp · Nov 30, 2018

DracoL1ch said:
any explanation for dummies? Because in bytecode all those vars are referred by index, means they are pre-parsed, but then for some reason it restores var's name basing on index
in pseudo-code literally every access looks like that

Code:

v12 = sub_6F44B2C0_VariableIDToStringName(*(_DWORD *)(v5 + 10328), v7); sub_6F44CBC0_JASSVM_GetVariableAddress((int)v11 + 144, v12);

Most of the strange things can be explained by pure laziness. They probably needed a way to lookup variables by name in some very specific ocasions (like TriggerRegisterVariableEvent), and then when implementing the Jass VM, it was just easier to restore the string from the id and call the by-name lookup, instead of wasting time implementing a way to lookup directly by the string id. And speed was never a concern, when they were creating the Jass language and the interpreter, they couldn't possibly imagine that the modding community would become as big as it is.

DracoL1ch · Jan 2, 2019

Turned out, even for 6-units group ForGroup is still 2x slower than Loop(). Who would have thought.

Frotty · Jan 2, 2019

DracoL1ch said:
Turned out, even for 6-units group ForGroup is still 2x slower than Loop().

What's Loop()?

JASS Benchmarking Results

Attachments