• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

JASS Benchmarking Results

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,456
FirstOfGroup is faster because it otherwise creates an entire thread per unit enumerated.

The problem is exasperated by local variables being able to be referenced within a FirstOfGroup loop.

The only time it's slower is when someone is using a filter instead of null. A null filter and a FirstOfGroup loop is the fastest way to grab units in a circle and do something.
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
FirstOfGroup is faster because it otherwise creates an entire thread per unit enumerated.

The problem is exasperated by local variables being able to be referenced within a FirstOfGroup loop.

The only time it's slower is when someone is using a filter instead of null. A null filter and a FirstOfGroup loop is the fastest way to grab units in a circle and do something.
this one is interesting enough.
I have this code:
JASS:
function Tether_EnemiesEffectForGroup takes nothing returns nothing
    local unit u=GetEnumUnit()
    if not IsUnitInGroup(u,Tether_AffectedTempGroup) and IsUnitEnemy(u,Tether_WispPlayer) and AliveNonBuildOrMarker(u) and not IsUnitType(u,UNIT_TYPE_MAGIC_IMMUNE) and IsUnitVulnerable(u) then
        call GroupAddUnit(Tether_AffectedTempGroup,u)
        call UnitAddAbilityTimedExF(u,'A2TP',1,0.25+0.5*Tether_AffectedTempLvl,'B0GU')//A2TP -> Tether Slow, B0GU -> Tether
        call MyAssistsDebuffWrap(Tether_Wisp,u)
        call DestroyEffect(AddSpecialEffectTarget("Abilities\\Weapons\\DragonHawkMissile\\DragonHawkMissile.mdl",u,"chest"))
    endif
    set u=null
endfunction

function DefaultEnemyFilterWithAncientsNonImmune takes nothing returns boolean
    set tt_filterunit=GetFilterUnit()
    return IsUnitEnemy(tt_filterunit,tt_player1) and AliveNonBuildOrMarker(tt_filterunit) and not IsUnitType(tt_filterunit,UNIT_TYPE_MAGIC_IMMUNE)
endfunction

function Tether_EnemiesFilter takes nothing returns boolean
    return DefaultEnemyFilterWithAncientsNonImmune() and not IsUnitInGroup(tt_filterunit,Tether_AffectedTempGroup) and IsUnitVulnerable(tt_filterunit)
endfunction

function Tether_ProvideEffectsOnEnemiesBetween takes unit wisp,unit target,group g, real dist returns nothing
    local real wispX=GetUnitX(wisp)
    local real wispY=GetUnitY(wisp)
    local real targetX=GetUnitX(target)
    local real targetY=GetUnitY(target)
    local real angle=AngleBetweenXYRad(wispX,wispY,targetX,targetY)
    local real groupX
    local real groupY
    local group gg=GetAvailableGroup()
    local integer i=1
    local real dx=dist/8.*Cos(angle)
    local real dy=dist/8.*Sin(angle)
    local unit u2
    local player p=GetOwningPlayer(wisp)
    set Tether_AffectedTempGroup=g
    set Tether_Wisp=wisp
    set Tether_WispPlayer=p
    set Tether_AffectedTempLvl=GetUnitAbilityLevel(wisp,'A1TA')//A1TA -> Tether
    loop
        set groupX=wispX+dx*i
        set groupY=wispY+dy*i
        set tt_unit1=wisp
        set tt_player1=GetOwningPlayer(tt_unit1)
        call fStartTimer()
        call GroupEnumUnitsInRange2(gg,groupX,groupY,100,null)//Condition(function Tether_EnemiesFilter)
  
//        call ForGroup(gg,function Tether_EnemiesEffectForGroup)
  
//  
        loop
            set u2=FirstOfGroup(gg)
            exitwhen u2==null
            call GroupRemoveUnit(gg,u2)
            if IsUnitEnemy(u2,Tether_WispPlayer) and AliveNonBuildOrMarker(u2) and not IsUnitType(u2,UNIT_TYPE_MAGIC_IMMUNE) and not IsUnitInGroup(u2,Tether_AffectedTempGroup) and IsUnitVulnerable(u2) then
                call GroupAddUnit(Tether_AffectedTempGroup,u2)
                call UnitAddAbilityTimedExF(u2,'A2TP',1,0.25+0.5*Tether_AffectedTempLvl,'B0GU')//A2TP -> Tether Slow, B0GU -> Tether
                call MyAssistsDebuffWrap(Tether_Wisp,u2)
                call DestroyEffect(AddSpecialEffectTarget("Abilities\\Weapons\\DragonHawkMissile\\DragonHawkMissile.mdl",u2,"chest"))
            endif
        endloop
        call echo("t="+I2S(fStopTimer()))
        set i=i+1
        exitwhen i>8
    endloop
    call KillGroup(gg)
endfunction

thing is, ForGroup is faster in this case. It used to be faster with proper condition instead of "null" at grouping stage, but now, will "null", it's better to use ForGroup. Both test cases included all the same 12 units inside the groups (one or another). Speed is about 100ns for ForGroup and 150ns for loop. I can't see the reason of this behavior.
With filter, ForGroup is about equal to Loop - 150ns, with very small improvements on Loop's perfomance.


fStartTimer / fStopTimer are my benchmark timer.


Obviously, we have 2 overheats: both Condition() and ForGroup() creates an extra thread per each unit in the AoE/Group. By removing Condition using "null", we skip the first one. But then ForGroup() becomes slightly faster, because.. what? Easier to find free threads from the storage?

Edit:
well, that was easy. Loop checked for IsUnitInAffected much later than ForGroup. After i moved it to the start as well Loop became faster again. Nothing new then
 
Last edited:
Level 15
Joined
Nov 30, 2007
Messages
1,202
I failed installing JNPG yesterday and I'm wondering if someone could run a test on how much worse recurssion is over iteration. I wrote a test suite for it which I hope compiles.

JASS:
scope Stopwatch initializer onInit
   
    globals
        private constant integer ITERATIONS = 1800
        private Hashtable ht = InitHashtable()
        private integer temp
    endglobals
   
    private function Setup takes nothing returns nothing 
        local integer i = 0 
        loop
            exitwhen i == ITERATIONS
            call SaveInteger(ht, 0, i, i) 
            set i = i + 1 
        endloop
    endfunction
   
    private function Iterative_1 takes nothing returns nothing 
        local integer i = ITERATIONS
        loop
            exitwhen i == 0
            call SaveInteger(ht, 0, i, LoadInteger(ht, 0, i - 1))
            set i = i - 1
        endloop
    endfunction
   
    private function Iterative takes nothing returns nothing 
        local intger first = 0
        local intger last = ITERATIONS - 1
        loop 
            exitwhen first >= last 
            set temp = LoadInteger(ht, 0, first)
            call SaveInteger(ht, 0, first, LoadInteger(ht, 0, last))
            call SaveInteger(ht, 0, last, temp)
            set first = first + 1 
            set last = last - 1
        endloop 
    endfunction
   
    private function Recursive takes integer first, integer last returns nothing // arg.length == 2
        if first < last then 
            set temp = LoadInteger(ht, 0, first)
            call SaveInteger(ht, 0, first, LoadInteger(ht, 0, last))
            call SaveInteger(ht, 0, last, temp)
        endif
    endfunction 
   
    private function Recursive_1 takes integer i returns nothing     // arg.length == 1
        if i > 0 then 
            call SaveInteger(ht, 0, i, LoadInteger(ht, 0, i - 1))
            call Recursive_1(i - 1)
        endif
    endfunction
   
    private function Actions takes nothing returns boolean
        local integer sw
        local integer i = 0
        local real array ticks
        local string output

        set sw = StopWatchCreate()
           
        call Iterative()
           
        set ticks[0] = StopWatchTicks(sw)
        set output = I2S(ITERATIONS) + " iterations of Test #1 took " + I2S(StopWatchMark(sw)) + " milliseconds to finish.\n"
        call StopWatchDestroy(sw)

        set sw = StopWatchCreate()    // recursive 
           
        call Recursive()
       
        set ticks[1] = StopWatchTicks(sw)
        set output = output + I2S(ITERATIONS) + " iterations of Test #2 took " + I2S(StopWatchMark(sw)) + " milliseconds to finish.\n\n"
        call StopWatchDestroy(sw)
           
        if (ticks[0] > ticks[1]) then
            set ticks[2] = 100 - (ticks[1]/ticks[0] * 100)
            set output = output + "Test #2 was " + R2S(ticks[2]) + "% faster than Test #1\n\n"
        else
            set ticks[2] = 100 - (ticks[0]/ticks[1] * 100)
            set output = output + "Test #1 is " + R2S(ticks[2]) + "% faster than Test #2\n\n"
        endif
       
        call DisplayTextToPlayer(GetLocalPlayer(), 0, 0, output)
       
        return false
    endfunction

    //===========================================================================
    private function onInit takes nothing returns nothing
        local trigger t = CreateTrigger()
        call TriggerRegisterPlayerEvent(t, Player(0), EVENT_PLAYER_END_CINEMATIC)
        call TriggerAddCondition(t, function Actions)
        call Setup()
    endfunction

endscope
 
Last edited:

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,456
If that's the case it's probably the use of <= requiring more than >, or that creating new parameters is somehow cheaper than setting variables.

Either way your code won't compile, and no one can test it unless they're on the old patch which doesn't work with SharpCraft. You'll have to design this to be measured in frames per second.
 
Level 9
Joined
Jul 20, 2018
Messages
176
As far as I know, recursion has its own limit. One day I designed the pathchecking algorithm using recursion. I worked fine, but if a person with a lot of towers decided to close path, algorithm would not be proceeded to the end. I treated such situation as no path. But then I designed the same algorithm using loop inside loop, and this implementation had not that problem.

I wrote a test suite for it which I hope compiles.
Your code contains calls of undeclared functions and some spelling mistakes.
 
Last edited:
Level 9
Joined
Mar 26, 2017
Messages
376
Just wondering if people have knowledge about using Player Unit Events versus Unit Events as far as efficiency is concerned.

For Player Unit Event the code is simpler, because you only have to create one single trigger per event, and then call TriggerRegisterPlayerUnitEvent for each player at game start. You can add a TriggerAction that runs down a set of if evaluations based on a SpellId/UnitId/etc., or checks an array and then dynamically call a function. (I believe the latter can be done in lua rather efficiently now)
The downside of such a method is that triggers will fire for each unit, even if you would only use the trigger for a small subset of your units.

With Unit Event, you will gain the benefit of less trigger activations. This will give better performance, especially for events that gets triggered often, like Attacked or Damaged. The downside is that you would need to call TriggerRegisterUnitEvent every time a unit is created, making the code more complicated.
I'm also curious if having a large amount of units that have been assigned with TriggerRegisterUnitEvent will somehow cause a large overhead. There might be internal checks that are done seperately for each registered unit that work differently than how checks are done for PlayerUnits. In that case Unit Events might not even perform better with high unit counts.
This overhead might increase during gameplay, if new units are created and dead registered units are removed, but are still included in these checks.

Might this suggest that Unit Events with high unit counts are less performing than Player Unit Events, and that it could be worthwile to destroy and rebuild triggers with Unit Events periodically?
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
no, i said game will search for triggers which are supposed to react on the event, and in case of attack (and I guess all other PLAYER_* events) it checks for both, first PLAYER, then individual. no matter which one you do use, if the unit isn't in the list (isn't owned by the player you've made trigger for or doesn't have personal trigger), he won't fire anything
 

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,456
whenever a unit is attacked, it searchs for triggers which react on PLAYER_UNIT_ATTACKED event, then for triggers on UNIT_ATTACKED. Doesn't matter which one you use, no difference at all, since both are called anyway

I think he's saying if you have a fixed number of units who should expect a unit event, and there is no need for a playerunitevent of that type in the entire map, that that unitevent will have a performance improvement due to not needing to evaluate to the user for every instance.
 
Level 9
Joined
Mar 26, 2017
Messages
376
I think he's saying if you have a fixed number of units who should expect a unit event, and there is no need for a playerunitevent of that type in the entire map, that that unitevent will have a performance improvement due to not needing to evaluate to the user for every instance.

This is exactly what I mean. If you dont use PlayerUnitEvent but instead use TriggerRegisterUnitEvent unit by unit for only the units you need the trigger for, you will have less trigger evaluations.

obviously the less objects (triggers) you have, the better, but JASS is very slow unlike inner object handling, which turns the long if-then-else into a bad coding style as you won't have any win there

Regardless of using a UnitEvent or PlayerUnitEvent, you could use only one single trigger for something like ATTACKED in your entire map, and then use if-else or a dynamic call to proceed to the right code block.

I hope that with Lua, this efficiency will also improve. I believe that calling a function with _G[function name]() could already be quite a bit faster than the JASS way of dynamic calls.
 

Bribe

Code Moderator
Level 50
Joined
Sep 26, 2009
Messages
9,456
I do not think that Blizzard will improve JASS since they add Lua. In future (a year or later) they can even remove its support at all.
They wouldn't (shouldn't) do it without having some kind of JASS -> Lua map converter which automatically converts a JASS map to a Lua map.
 
Apologies for the necropost.

In the most recent patch version (1.31+ - Classic), while in the Lua environment, ForGroup iteration (when keeping the group) appears to be faster than the FirstOfGroup loop and swap operation, especially when dealing with a large group of units. Moreover, iteration via BlzGroupUnitAt is even faster than ForGroup for a certain amount of units (not more than 1600). If the group has that many units, ForGroup actually becomes the fastest among the three methods.

Testing procedure is done in this manner
  1. Store the initial time stamp via os.clock()
  2. Perform the iterations
  3. Obtain time elapsed by subtracting the current time stamp by the initial time stamp.
I suppose it might have something to be on my end, but I thought it might be interesting to post such results.
 

Attachments

  • NativeSpeedTest.w3x
    27.2 KB · Views: 124

~El

Level 17
Joined
Jun 13, 2016
Messages
551
Apologies for the necropost.

In the most recent patch version (1.31+ - Classic), while in the Lua environment, ForGroup iteration (when keeping the group) appears to be faster than the FirstOfGroup loop and swap operation, especially when dealing with a large group of units. Moreover, iteration via BlzGroupUnitAt is even faster than ForGroup for a certain amount of units (not more than 1600). If the group has that many units, ForGroup actually becomes the fastest among the three methods.

Testing procedure is done in this manner
  1. Store the initial time stamp via os.clock()
  2. Perform the iterations
  3. Obtain time elapsed by subtracting the current time stamp by the initial time stamp.
I suppose it might have something to be on my end, but I thought it might be interesting to post such results.

The "math.floor" in your test is unnecessary. Lua 5.3 has support for integer semantics in number values. Unless BlzGroupGetSize returns a float instead of an integer.

There's also a fourth method of iteration that is now available - collecting the units into a native Lua table (via a GroupEnum Filter for example), and then iterating on that table instead of a JASS group.

For anyone curious, here are the results from MyPad's tests + the table test:
upload_2020-1-16_8-7-30.png

Also worth noting, that if you add an actual payload to the loop, then the differences start to quickly equalize. I think it's safe to say that the method of iteration won't matter much in real world code unless the code inside the loop takes less time than the loop itself... Here are the results with one `GetUnitFacing` call on the returned unit.
upload_2020-1-16_8-9-17.png
 
Level 9
Joined
Mar 26, 2017
Messages
376
Apologies for the necropost.

In the most recent patch version (1.31+ - Classic), while in the Lua environment, ForGroup iteration (when keeping the group) appears to be faster than the FirstOfGroup loop and swap operation, especially when dealing with a large group of units. Moreover, iteration via BlzGroupUnitAt is even faster than ForGroup for a certain amount of units (not more than 1600). If the group has that many units, ForGroup actually becomes the fastest among the three methods.

Testing procedure is done in this manner
  1. Store the initial time stamp via os.clock()
  2. Perform the iterations
  3. Obtain time elapsed by subtracting the current time stamp by the initial time stamp.
I suppose it might have something to be on my end, but I thought it might be interesting to post such results.

Interesting. Thank you, I was not aware of the use of BlzGroupUnitAt before you posted this.

I tried to reproduce these results, but I found FirstOfGroup to be not as far behind.

In your example, you are calling FirstOfGroup twice for each loop, making for an unfair comparison. A more efficient way of using FOG would be:
Code:
while true do
    u = FirstOfGroup(g)
    if u == nil then break else
        GroupRemoveUnit(g, u)
    end
end

Still, BlzGroupUnitAt comes out on top. Especially at more realistic group sizes.

In your test map I ran 400 units x 10000 iterations:
4.362 for FirstOfGroup
3.182 for BlzGroupUnitAt

And 10 units x 1000000
11.070 for FirstOfGroup
5.475 for BlzGroupUnitAt

Likely because BlzGroupUnitAt becomes heavier the more units in a group, but FirstOfGroup is unaffected by group size.

I also saw that GroupEnumUnitsOfPlayer is 4x faster than GroupEnumUnitsInRect and GroupEnumUnitsInRange (though this might be well known already).


Full results:
Code:
400 units (x10000)
GroupEnumUnitsInRect: 8.163
GroupEnumUnitsOfPlayer: 2.164
GroupEnumUnitsInRange: 8.245
ForGroup: 8.003
ForGroup with anonymous function: 7.992
BlzGroupUnitAt: 2.981
Table Loop: 0.127
FirstOfGroup (with GroupEnumUnitsOfPlayer and variable bind): 6.526 (-2.164 = 4.362)
BlzGroupUnitAt (with GroupEnumUnitsOfPlayer and variable bind): 5.346 (-2.164 = 3.182)

10 units (x1000000)
GroupEnumUnitsInRect: 6.149
GroupEnumUnitsOfPlayer: 2.435
GroupEnumUnitsInRange: 7.004
ForGroup: 20.282
ForGroup with anonymous function: 20.488
BlzGroupUnitAt: 5.186
Table Loop: 0.395
FirstOfGroup (with GroupEnumUnitsOfPlayer and variable bind): 13.505 (-2.435 = 11.070)
BlzGroupUnitAt (with GroupEnumUnitsOfPlayer and variable bind): 7.910 (-2.435 = 5.475)


Test script:
Code:
function f()
    GetEnumUnit()
end
function TestNatives()
    g = CreateGroup()
        r  = GetWorldBounds()
    p = Player(0)
    l = {}
    GroupEnumUnitsInRect(g, r, nil)
    t0 = os.clock()
    for i=1,0 do
        GroupEnumUnitsInRect(g, r, nil)
    end
    t1 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        GroupEnumUnitsOfPlayer(g, p, nil)
    end
    t2 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        GroupEnumUnitsInRange(g, 0, 0, 99999, nil)
    end
    t3 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        ForGroup(g, f)
    end
    t4 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        ForGroup(g, function() GetEnumUnit() end)
    end
    t5 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        for x=1,BlzGroupGetSize(g) do
            BlzGroupUnitAt(g, x)
        end
    end
    t6 = os.clock()-t0
    for x=1,BlzGroupGetSize(g) do
        l[#l+1] = BlzGroupUnitAt(g, x)
    end
    t0 = os.clock()
    for i=1,0 do
        for k,v in pairs(l) do
        end
    end
    t7 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        GroupEnumUnitsOfPlayer(g, p, nil)
        while FirstOfGroup(g) ~= nil do
            u = FirstOfGroup(g)
            GroupRemoveUnit(g, u)
        end
    end
    t8 = os.clock()-t0
    t0 = os.clock()
    for i=1,0 do
        GroupEnumUnitsOfPlayer(g, p, nil)
        while true do
            u = FirstOfGroup(g)
            if u == nil then break else
                GroupRemoveUnit(g, u)
            end
        end
    end
    t9 = os.clock()-t0
    t0 = os.clock()   
    for i=1,1000000 do
        GroupEnumUnitsOfPlayer(g, p, nil)
        for x=1,BlzGroupGetSize(g) do
            u = BlzGroupUnitAt(g, x)
        end
    end
    t10 = os.clock()-t0
    print("GroupEnumRect:", t1)
    print("GroupEnumPlayer:", t2)
    print("GroupEnumRange:", t3)
    print("ForGroup:", t4)
    print("ForGroupAnon:", t5)
        print("BlzGroupUnitAt:", t6)
    print("TableLoop:",t7)
    print("FirstOfGroup(w/EnumPlayer):", t8)
    print("FirstOfGroupType2(w/EnumPlayer):", t9)
    print("BlzGroupUnitAt(w/EnumPlayer):", t10)
end
 
Last edited:
Level 12
Joined
Jan 30, 2020
Messages
875
Hello there.

Nice to see a proper way to test and compare native speeds using the OS time stamp !

Was just wondering about this :
I also saw that GroupEnumUnitsOfPlayer is 4x faster than GroupEnumUnitsInRect and GroupEnumUnitsInRange (though this might be well known already).

Seems nice, but i I am not sure it can replace GroupEnumUnitsInRange, as you then need to calculate the distance and compare it to your range.
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
for pure jass functions with params:
passing a constant (means inlined number, not a variable) integer/real/boolean is faster than a constant string by ~50% for 1 symbol string and difference increases with each additional symbol within the passed strings

why? because I had a system which took a string as an argument to return the requested parameter, a-la configuration
GetMyUnitData('hpeo',"StartMana")

it makes sense to update it into a constant integers to pass them by
GetMyUnitData('hpeo',START_MANA_CONFIG_KEY)
to get slighly better perfomance in this particular case

Sooo.. strings are slow!


because there are no real iteration throught each unit to find out if he located somewhere inside the area, instead it moves through the list of all player's units
also a correction - thru EVERY unit ingame and filters out those which are owned by other player. So the more crappy units you have, the worse the perfomance of this function.
 
Top