• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

JASS Benchmarking Results

Level 19
Joined
Dec 12, 2010
Messages
2,069
anytime unit performs any action (move/attack/order/selection/etc) there's loop through all his abilities for looking which of them would like to trigger on this action. So, for those who cares about nanoseconds, another tip - decrease amount of abilities your units use. it will help a lot (comparing to other microoptimizations)
 
Level 8
Joined
Jan 23, 2015
Messages
121
anytime unit performs any action (move/attack/order/selection/etc) there's loop through all his abilities for looking which of them would like to trigger on this action. So, for those who cares about nanoseconds, another tip - decrease amount of abilities your units use. it will help a lot (comparing to other microoptimizations)
Do passive abils counts? It would be a dramatic decrease with bonus systems since they overload unit with passives
 
Level 8
Joined
Jan 23, 2015
Messages
121
literally any. every ability may have callback for an action, and game never knows, so re-check everytime
Ok theb.
Another reason to use Tomes instead, imo.
Can we? I don't remember any tome ability increasing, for example, armor or sight radius
I think we can reduce the amount of abilities needed for bonus through making them multi-levelled. For example, one abil per hex digit - 4 times fewer abilities in worst case.
 
Level 22
Joined
Feb 6, 2014
Messages
2,466
Do passive abils counts? It would be a dramatic decrease with bonus systems since they overload unit with passives
I wouldn't think about it being "dramatic". DracoL1ch specifically mentioned "nanoseconds" and in Bonus systems that applies a binary approach for giving bonuses, the probability of having most number passive abilities at once is 1/(2^n) where n = number of abilities. Say the bonus system uses 12 abilities (-4096 to +4095 possible bonus damage), the only time your unit would have all 11 abilities is you applied +4095 bonus damage. Most of the time, we can estimate (or average) that the unit would have 6 extra abilities or n/2 (half of the total abilities). With that said, I would still very much use Bonus system after knowing this information. It's an interesting info but I don't think it affects much.
 
Level 8
Joined
Jan 23, 2015
Messages
121
I wouldn't think about it being "dramatic". DracoL1ch specifically mentioned "nanoseconds" and in Bonus systems that applies a binary approach for giving bonuses, the probability of having most number passive abilities at once is 1/(2^n) where n = number of abilities. Say the bonus system uses 12 abilities (-4096 to +4095 possible bonus damage), the only time your unit would have all 11 abilities is you applied +4095 bonus damage. Most of the time, we can estimate (or average) that the unit would have 6 extra abilities or n/2 (half of the total abilities). With that said, I would still very much use Bonus system after knowing this information. It's an interesting info but I don't think it affects much.
Well -1 is worse than maximum + by one ability.
And I meant dramatic in that nano scale, and didn't meant not to use Bonus; only ways to improve it. Even on that scale.
However I think that sometimes that nanoseconds can grow up if there are too much orders evaluating at once, so it's not absolutely useless thing to improve.
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
if you're rly care about that, you should also remember my older post about abilitirs.

JASS Benchmarking Results

hashtables are awesome for that, although with vjass help it can be done via arrays too

JASS:
set b=not b
bit faster than
JASS:
set b=b==false
sentence, like ~40%. probably because game have to parse 3 ops (get the b, get the == equialent, cast false to 0) instead of 2 (get the b, get the 'not')
actually I was wrong, as usual when it comes to ops
leandrotp said:
not X is just 2 operations:
GETVAR R1, X
NOT R1
X == false is 5 operations:
GETVAR R1, X
PUSH R1 //Why they compile this with a fucking PUSH?
LITERAL R2, 0
POP R3
EQUAL R2,R3,result
 
If you want to benchmark on a newer patch (1.29.2) then you can with this tool: Custom Natives on 1.29.2

There is a demo map there as well (https://www.hiveworkshop.com/attachments/stopwatch-demo-w3x.306453/).

Here are some of my results on 1.29.2. Maybe I will update the main thread if I do some more exhaustive tests.

Code:
===================================
5000 iterations of ExecuteFunc took 38 milliseconds to finish.
5000 iterations of Call took 3 milliseconds to finish.

Call was 1166.667% faster than ExecuteFunc

===================================
5000 iterations of UnitAlive took 3 milliseconds to finish.
5000 iterations of GetWidgetLife took 4 milliseconds to finish.

UnitAlive is 33.33% faster than GetWidgetLife

===================================
5000 iterations of SetUnitPosition took 166 milliseconds to finish.
5000 iterations of SetUnitX/Y took 10 milliseconds to finish.

SetUnitX/Y was 1560.000% faster than SetUnitPosition

===================================
5000 iterations of ForGroup took 112 milliseconds to finish.
5000 iterations of FirstOfGroup took 4 milliseconds to finish.

FirstOfGroup was 2700.000% faster than ForGroup
 
Last edited:
Please compare SetUnitX/Y with the new SetSpecialEffectPosition native?

I tested a few times and got consistent results.

Code:
5000 iterations of SetUnitX/Y took 11 milliseconds to finish.
5000 iterations of SetSpecialEffectPosition took 8 milliseconds to finish.

SetSpecialEffectPosition was 37.50% faster than SetUnitX/Y
 

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,464
Set*X/Y operates with all the same structure innerly, it's more about random delays and memory cache when you compare them
Only that it also has the overhead of enter-region events. I bet if the code were switching the unit between regions (with an event attached) you'd see a huge spike.
 
Level 8
Joined
Jan 23, 2015
Messages
121
TriggerHappy, could you test that case?
And probably it's desirable to have more different cases with different amount of regions.

And one case to check if enter/leave events are processed completely during the setX/Y call, like it is with damage event.
 

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,464
It also has "unit in range" overhead (which might be the same as aura update overhead, though that does lag noticeably behind quickly-moving units).
In the instances being dealt with here, the dummy unit is likely to have the Locust ability and therefore not trigger auras or unit-in-range events.

@Trokkin I have forgotten to include that when using the Filter boolexpr of a TriggerRegisterEnterRegion, it does stop the current thread in order to interweave the enter-region event (that's what Unit Indexers use to detect the exact instant a unit is created). HOWEVER, any conditions or actions attached to the trigger will not run until that zero-second delay has elapsed.

I tested a few times and got consistent results.

Code:
5000 iterations of SetUnitX/Y took 11 milliseconds to finish.[/FONT][/LEFT]
[FONT=Verdana]
[LEFT]5000 iterations of SetSpecialEffectPosition took 8 milliseconds to finish.

SetSpecialEffectPosition was 37.50% faster than SetUnitX/Y


It occurs to me after checking the common.j library that SetSpecialEffectPosition also includes a Z variable. Therefore, the test would make more sense if it also included a third line for the unit regarding SetUnitFlyHeight.

Either that or just strictly comparing SetUnitX/Y with BlzSetSpecialEffectX/Y.
 
Last edited:
Level 39
Joined
Feb 27, 2007
Messages
5,013
In the instances being dealt with here, the dummy unit is likely to have the Locust ability and therefore not trigger auras or unit-in-range events.
You're right to say it wouldn't trigger any unit-in-ranges nor be subjected to any auras. Unless I am mistaken, even locusted units propagate their own auras (I know they wouldn't have any if they're units-as-sfx) and it would still need to check if any 'unit in range' events have been applied to the moved unit (though there wouldn't be, or unless the game doesn't check/fire those triggers when the source of the event gets moved). The check for it having any auras and for it having any unit-in-ranges associated with it have to add some bit of overhead there.
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
unit in range event has nothing to do with auras, auras perfomed in simple timers
you overthink this shit. Remember collect corpse ability from meat wagon? It makes a call for searching a corpse each time it's coordinates change. Literally. A fucking each time. 10k calls each second while it moving somewhere. And still game going just well with that. Look into other directions.
 
Level 39
Joined
Feb 27, 2007
Messages
5,013
unit in range event has nothing to do with auras, auras perfomed in simple timers
you overthink this shit. Remember collect corpse ability from meat wagon? It makes a call for searching a corpse each time it's coordinates change. Literally. A fucking each time. 10k calls each second while it moving somewhere. And still game going just well with that. Look into other directions.
I'm not suggesting anywhere that they have anything to do with each other, and I don't think anyone is suggesting in that this optimization functionally matters except in extreme scenarios. This discussion is about curiosity regarding the difference between old natives we know and new natives we don't, and what contributes to their difference in execution. I'm not arguing with anyone, rather stating two further things that might cause additional overhead if they're actually relevant to UnitXY calls. For the curiosity, dude, not for obsessive code efficiency.
 

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,464
unit in range event has nothing to do with auras, auras perfomed in simple timers
you overthink this shit. Remember collect corpse ability from meat wagon? It makes a call for searching a corpse each time it's coordinates change. Literally. A fucking each time. 10k calls each second while it moving somewhere. And still game going just well with that. Look into other directions.
It is more likely to be a call every 0.10 seconds, like the UnitInRange events.
 
Level 23
Joined
Jan 1, 2009
Messages
1,610
just saying these kind of data oftenly being used as "FASTER MEANS EVERYONE SHOULD CODE LIKE THAT" in the internet.

Imho, I think it's actually just certain people heavily propagating that idiom, especially here on hive, and newer members following along.
Many members, including some of the oldest, perhaps more on other sites, have always been preferring and recommending "proper" programming paradigms over minor performance gains.

One neat possibility of course is to have some of these optimizations be done by the compiler automatically, if we know them well enough.

Thus @Topic, is there any way to benchmark on 1.30 currently? Any "confirmed" stuff coming from blizz? The first post seems a little out of date.
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
function name matters when it's called. Longer the name, more microseconds it takes. Difference is laughable, but 1+1+1+1+1+1... +1 is still many. In case if you dont use auto-obfuscation like Vexorian you might consider spoofing names to shorter.
In my tests avg difference between 5 symbols funcname and 192 symbols funcname is 2.2. Means 5 letters executes 2.2x times faster. Result always about the same in 100 runs.

Same goes for local variables, declaring short names is faster, but difference is huge here.
1000 locals with 10 symbols length vs 1000 locals with 192 symbols length: 6-10x difference, heavy spread.
1000 locals with 10 symbols vs 1000 locals with 5 symbols: first one takes 1.5x time to execute comparing to second

Tracking globals init impossible for me, but I believe same goes there. Assessing globals with 192 symbols takes 10-20 times longer than globals with 5 symbols. At least they dont have to be created each time function called.

Couldn't track any difference in terms if amount of globals even matter, with 3k and 14k of them operations over any var performed with the same speed, which means its not required to clean up variables if they're left useless.
Same goes for amount of functions used, no difference between 8k and 18k

All of those been said long ago, but I didn't believe it because jass pre-filter everything to byte code before we start, so how could names ever affect it if names are never used but indexes of variables? Well, they are used for some reason.
All tests are made on 26, maybe new patches handle it better
 
Last edited:

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,198
On which NUMA nodes were you pinning your cores? Have you tried extracting a part of the API into SDL to run on a PCB?
What post is this aimed at?
All of those been said long ago, but I didn't believe it because jass pre-filter everything to byte code before we start, so how could names ever affect it if names are never used but indexes of variables? Well, they are used for some reason.
JASS is converted to bytecode to run, however the byte code is not statically linked. All variables are referenced by name rather than directly as an address, hence a hashtable look up operation is required which has a O(n) complexity string hash algorithm.
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
JASS is converted to bytecode to run, however the byte code is not statically linked. All variables are referenced by name rather than directly as an address, hence a hashtable look up operation is required which has a O(n) complexity string hash algorithm.
any explanation for dummies? Because in bytecode all those vars are referred by index, means they are pre-parsed, but then for some reason it restores var's name basing on index
in pseudo-code literally every access looks like that
Code:
        v12 = sub_6F44B2C0_VariableIDToStringName(*(_DWORD *)(v5 + 10328), v7);
        sub_6F44CBC0_JASSVM_GetVariableAddress((int)v11 + 144, v12);
 
Level 9
Joined
Jul 30, 2012
Messages
156
any explanation for dummies? Because in bytecode all those vars are referred by index, means they are pre-parsed, but then for some reason it restores var's name basing on index
in pseudo-code literally every access looks like that
Code:
        v12 = sub_6F44B2C0_VariableIDToStringName(*(_DWORD *)(v5 + 10328), v7);
        sub_6F44CBC0_JASSVM_GetVariableAddress((int)v11 + 144, v12);

Most of the strange things can be explained by pure laziness. They probably needed a way to lookup variables by name in some very specific ocasions (like TriggerRegisterVariableEvent), and then when implementing the Jass VM, it was just easier to restore the string from the id and call the by-name lookup, instead of wasting time implementing a way to lookup directly by the string id. And speed was never a concern, when they were creating the Jass language and the interpreter, they couldn't possibly imagine that the modding community would become as big as it is.
 
Top