[Benchmarks] The truth is (out) there.

Bribe · Aug 24, 2011

I'm sure an Admin could sort it out, but a moderator has limits.

Troll-Brain · Aug 24, 2011

Ok, then i will just create a new thread and make enough reserved posts since the beginning ...

Just because 3 is clearly not enough to sort it.

You can graveyard this thread. (if the other posters don't care about it)

Bribe · Aug 24, 2011

No, keep the thread, I'll just ask Rui to add some more posts to the top
when necessary.

You can fit tens and tens of thousands of characters in a thread before it hits
the limits, you are still very far from it.

Troll-Brain · Aug 24, 2011

Bribe said:
No, keep the thread, I'll just ask Rui to add some more posts to the top
when necessary.

You can fit tens and tens of thousands of characters in a thread before it hits
the limits, you are still very far from it.

The thread limit was mostly a joke (but if i posted jass code i would hit with only one benchmark

).
No the real problem is the sort, i want to create categories, it doesn't motive me to have a such mess.
And you have make me lost 10*30+5 seconds of my life by deleting my reserved posts

only a bad joke ofc

And honestly i don't have the feeling to wait for an admin mood to have a such thread, too bad you don't have this power, too bad i didn't think about it in the first place.

Bribe · Aug 24, 2011

Better than wasting 9.5 hours of your life every week day at a
call center.

Troll-Brain · Aug 24, 2011

Bribe said:
Better than wasting 9.5 hours of your life every week day at a
call center.

I'm feeling sorry for you

But that explains a lot your activity here

Back on topic, call me a moron or whatever else but i won't continue if i don't have at least the first ten posts.

Or i hope that i won't be the only one who want to make benchmarks.
So these eventual people could create since the begin a sorted thread, or eventually would be kind enough to wait an admin intervention and handle the time between the request and the effective sort (i've heard that this ability is called patience but i'm out of mana)

Dr Super Good · Aug 24, 2011

The major time consumer with JASS, as far as I could see, was that it processed the argument strings many many times with some form of hash. There is infact a penalty point in argument length...

One hash process itterates through every character, thus has a time scale of O(n) with length of argument.
Another hash process itterates only through every 4th caharacter, this has a time scale of O(n/4) rounded down or up (I think it iterated through the first character always).
This means the argument 'tes' will use noticably less time than the argument 'test' as that now requires an extra itteration of the every 4th character hash.

So what does this actually mean? Every 4 characters an argument uses has a linear performance inpact that is applied suddenly. Keeping argument names under 4 characters will minimize the impact of this second hash process. The sort of performance penalty we are looking at might be like adding an extra character if it was not present.

Troll-Brain · Aug 24, 2011

So what does this actually mean? Every 4 characters an argument uses has a linear performance inpact that is applied suddenly. Keeping argument names under 4 characters will minimize the impact of this second hash process. The sort of performance penalty we are looking at might be like adding an extra character if it was not present.

Not sure to get what your are suggesting, but keeping names under 4 characters will make your code just unreadable, which is very lame.
Oh well, you are probably not suggesting a such thing, so could you plz explain better your point (at least for me)

Dr Super Good · Aug 24, 2011

This thread is about benchmarks of JASS? So I thought I would mention what I observed after an hour or so of folling machine code execution in a dissasembler.

Can this information be useful? Well I guess it means that if you can keep a name less than divisible by 4 it will save you from extra performance penalties. Eg using a name 7 characters long instead of 8 (which might still be readable).

Ultimatly, a precompiler should do all name optimization. Thus you have readable source code and unreadable efficient JASS code as the output. We have vexorian's optimizer to do this (although it does have problems and its optimizer could be improved).

Within something like JassHelper (or other precompilers), the optimization could be built in. This would allow specific optimizer instructions as well (such as do not shorten this name) which would mean avoiding the problems that vexorians optimizer has. Globals that are referenced very often (eg, a projectile system) could be given high piority on short variable names while single fire functions (only run once) could be givien longer names.

Shortening a variable that is referenced 1000 times a second to only 1 letter would be a significant optimization. I doubt vexorian's optimizer has the intelligence to identify this case so in such situations manual optimization might be preferable (via a macro or special precompiler reserved word).

Readability is good, but performance could be argued as better. Where as readability only helps the programmer to create and impliment scripts, performance effects everyone who actually runs the script. One is once off, the other will last until the map is no longer hosted. Yes computers are stupidly fast and will keep getting even faster (for now), but that is not really an excuse to through away processing time. Thus the only real solution that both lets programmers keep their sanity and keeps code economical would be improved precompilers (or even an improved editor).

Variable number must effect performance in some way. It is quite obvious that some hashing of names is performed and this is only really useful to a hashtable like systems. Hashtables that fill up become slower as there are more collisions. The bucket array might expand dynamically to compensate (so performance degredation is minor) but there could be numbers more or less suitable for performance. I agree though that this form of optimization can be completly ignored, handling even 10 collisions (Highly unlikly I am guessing) in a bucket probably takes less time than passing a single character of a name in JASS. In short, the performance impact of large numbers of names is drowned out completly by other more costly processes.

Troll-Brain · Aug 24, 2011

Don't take me wrong, i appreciate your input, i just didn't understood what you meant. (English is not my native language, but yes it's not a good excuse).

I'm totally agree about the name optimization, but it clearly shouldn't be done directly by the programmer because it's error prone due to the poor readability of the code.

About the script optimizer tips i've also thought about it, and one day or an other i will make a such script optimizer (i just hope it will be before the end world), but without a GUI (of course i mean a graphic user interface, it will supports the GUI wc3 stuff), it could be used with the JNGP though, editing the wehack.lua

Nestharus · Aug 24, 2011

I'm totally agree about the name optimization, but it clearly shouldn't be done directly by the programmer because it's error prone due to the poor readability of the code.

That's not true for my scripts >: o

Does BigInt have any errors? I think not ^)^. Encoder? Nop ;D.

Troll-Brain · Aug 24, 2011

Nestharus said:
That's not true for my scripts >: o

Does BigInt have any errors? I think not ^)^. Encoder? Nop ;D.

Nestharus you are not objective about yourself.

Bribe · Aug 25, 2011

Nestharus, even you have said you cannot read BigInt.

Readability is not just for readability, it is also for helping you
to backtrack on your own work and debug something 6 months
down the line, for example.

Think about JassHelper's source code, for example. No one is
prepared to modify it to a reasonable extent, and the same
can be said for your libraries. For all we know the thing could
be sped up 2x.

Nestharus · Aug 25, 2011

I'll go to normal naming conventions when we get a 100% working optimizer for var names.

adic is working on such an optimizer for cjass, so that time is coming ; )

http://code.google.com/p/cjass/issues/detail?id=15

Bribe · Aug 25, 2011

I recommend putting your variable names long and if the user (like
yourself) does not want to fuss with the optimizer then you can
run a few find/replaces in the variable name >in your own map<
because then you can also shorten your library names to something
unreadable as well.

Because as it stands now it's just not worth it, most users are not
using natives nor ExecuteFunc/TriggerRegisterVariableEvent with
concatenation.

Nestharus · Aug 25, 2011

One thing I have noticed is that when reading from a variable many times, the reads become faster. This makes me think that rather than a hashtable, wc3 is running globals in something like a treap (variables accessed more often become faster).

A good way to check for this would be to compare 1 variable set 10,000 times or w/e compared to 10,000 variables set 1 time each.

I'm not sure if it's true or not at all (prob not), but it'd be good to check. One thing I noticed was that arrays appear to be identical in speed to hashtables, but because they seem to get faster as they are read, they eventually have instant access.

Another theory could be that JASS caches a certain number of variables for fast finds to make access temporarily faster for them, but it would still be good to check what's going on here =).

Jazztastic · Aug 25, 2011

So, with integers being marginally faster than reals, it prompts the GUI user to ask if saving something as an integer than coverting it once in a trigger loop would be faster than saving it as a real (converted from an integer) and just using it as is.

Example:

I save the level of my ability into an integer. Then later, I convert this integer to a real for use in damage (level x 25, or w/e number). Should I instead save the level of my ability as a real, that uses conversion when being saved, and then just use it as a real?

This situation has a few different factors though. Will the method that I use change depending on the amount of times it is used? Like if I use an integer converted to real 50 times, should I instead just use a converted real, even though integers are faster? Or if it is used just a single time, should I stick with my integer.

This conversation about efficiency is wonderful, but how does it translate to practicality? I am actually concerned with speed, even though im GUI.

Troll-Brain · Aug 26, 2011

First, we are talking about a really irrevelant speed matter in pretty much any real cases.
Secondly, let's face it, the GUI by itself has tons of much more revelant overheads (but still doesn't matter in most cases).

Really, use integers or reals dependably what you need, there is no way that you will improve your map performance with them.

Plus the real <-> integer conversion is probably even more costly, especially in GUI.

I'm wondering if i haven't opened the Pandora's box with this thread, maybe i should put a better warning in the main thread ...

Maker · Aug 26, 2011

Jazztastic, convert to real immediately.

baassee · Aug 26, 2011

Maker said:
Jazztastic, convert to real immediately.

Or use custom script and you will not have to face this I2R GUI issue as it will convert by auto.

Integers are faster than reals, I think DSG explained it to me once but I cannot remember the whole story so I will not post it.

Bribe · Aug 26, 2011

The only time you need to use I2R is to avoid truncating division (integer / integer).

Even then, you could instead use (integer + .0) which is faster and does the same.

baassee · Aug 26, 2011

Bribe said:
Even then, you could instead use (integer + .0) which is faster and does the same.

How can you be that smart in the morning for god's sake?

Bribe · Aug 26, 2011

I get dumber as the day progresses.

Troll-Brain · Aug 26, 2011

Are you all seriously considering using integers when logically you should use reals ?!

Dr Super Good · Aug 26, 2011

The reason behind reals being slower than integers I gave was for a compiled language. The interpreter mechanics of JASS are so slow that such reason means nothing (about the same time if there is a namespace hash collision).

I do however believe they might be slower for a different reason. They are not true floats. I remember people saying that the whole number part behaves like an integer in certain circumstances (even though it does not get typecast). The fractional compnent is eithor BCD or a domain restricted float. Syncronizing these mechancis might be noticably slower than just using floats instead of ints in a compiled language.

Nestharus · Aug 26, 2011

I was the one who mentioned how reals are first interpreted as integers

4294967296 will overflow even though it can be stored into a float.

Troll-Brain · Sep 2, 2011

Hashtable vs global array (read) added.

http://www.hiveworkshop.com/forums/1986990-post20.html

Nestharus · Sep 3, 2011

You did the benchmark incorrectly.

You skipped the part where I said 1 hashtable read vs 1 global array read in a timer had identical speeds and many array reads of the same slot were the same speed as 1 array read.

However, this could be because the timer expiration overhead is much greater than the hashtable/array overheads, making the results appear identical ; P.

Troll-Brain · Sep 3, 2011

Nestharus said:
You did the benchmark incorrectly.

You skipped the part where I said 1 hashtable read vs 1 global array read in a timer had identical speeds and many array reads of the same slot were the same speed as 1 array read.

However, this could be because the timer expiration overhead is much greater than the hashtable/array overheads, making the results appear identical ; P.

I didn't skipped anything, i just not have the appropriate tool right now. Indeed i would create "long" variables names instead of 2 characters in the current version of cJass (yes i could make my own script for that, but that quite fails the purpose of submitting cJass code.

Also as usual it's only a theory on the fly i don't believe that much in that, but i will test it when i could.

For the timer expiration i think it's negligible, as long you use only one and an huge code inside it.
Hell, even with a "0" empty periodic timer (which is equal to 0.0001) i don't get any fps drop.

Nestharus · Sep 3, 2011

Er, multiple timers >.>

Troll-Brain · Sep 3, 2011

Nestharus said:
Er, multiple timers >.>

Then you did your benchmark incorrectly.

Nestharus · Sep 4, 2011

Multiple timers is the only way to bench 1v1 reads.

But that is why I'm thinking that the overhead of the expiring timers far outweighed the operation of the read, hence why they were equal.

Dr Super Good · Sep 4, 2011

Multiple timers is the only way to bench 1v1 reads.

Ultimatly you are after duplicating the opperation as much as possible for every timer expiration.

An benchmark would be to do 4000 variable sets per timer expiration. This way the cost of the timer expiration can be mostly ignored. This is usually impracticle to do without itteration, but a macro generating a function with thousands of lines of the same script would be needed. As JASS is interpreted rather badly, we can ignore any cache degredation that would be associated with such a substitute in a real language (as smaller code means more efficient code in real languages).

Troll-Brain · Sep 4, 2011

Dr Super Good, that's why i'm using cJass.

And Nestharus if i understand correctly what you want mean, i could still use instead X globals with only one read for each one.
But for now the cJass optimizer is not enough, sure i could use wc3mapoptimizer (if it shortens up correctly the variables' length name), but it would be no more a standalone solution.

Switch33 · Sep 21, 2011

Can you do a test for images versus destructables?
(How much faster is it to create/destroy one versus the other)
(Is there a limit for images max, or per player)

How about a test for text splats versus texttags?
(How much faster is it to create/destroy one versus the other)

Troll-Brain · Sep 21, 2011

image vs destructable ?!
What's the point ?
Not something directly related but i've experienced that creating/destroying a destructable makes a leak (i have never used an image), but maybe there is trick to avoid it like killing it/setting it to a certain life before removing it, or whatever else ...

textplat vs texttag :
I would say texttag would be more efficient since it's hardcoded, but anyway it depends what you want, choice one or the other.

Switch33 · Sep 21, 2011

image vs destructable ?!
What's the point ?

--> Well what i'm thinking about doing is converting anarchon's inventory system to use only images. It'd cut out a bunch of object editor data needed which would mean faster load times.

Not something directly related but i've experienced that creating/destroying a destructable makes a leak (i have never used an image)

-->Hmm, I've never heard destructables leak really as long as you destroy old references and I even heard that destructables are faster than images. Can someone else confirm this or can you elaborate?

textsplat versus texttag;

-->Yeah I agree; I mean texttags must for sure be faster; but i kinda want'd to know is it really significantly faster or not? Eh, for now I at least decided i'll stick to texttags though. An 99 limit for texttags or whatever is definitely do-able.

Troll-Brain · Sep 21, 2011

Switch33 said:
--> Well what i'm thinking about doing is converting anarchon's inventory system to use only images. It'd cut out a bunch of object editor data needed which would mean faster load times.

I'm not sure at all that imported images would decrease the loading time.

-->Hmm, I've never heard destructables leak really as long as you destroy old references and I even heard that destructables are faster than images. Can someone else confirm this or can you elaborate?

You can test it yourself, wait for the stability of the wc3 process memory usage, start a test with a periodic timer (with a chat message or whatever) :
RemoveDestructable(CreateDestructable...).
Wait several seconds and pause the timer, then wait again for the stability and check if the memory used by wc3 has increased in a significant way.
Wc3 should not be paused during the process, an easy way is to launch a lan game.

-->Yeah I agree; I mean texttags must for sure be faster; but i kinda want'd to know is it really significantly faster or not? Eh, for now I at least decided i'll stick to texttags though. An 99 limit for texttags or whatever is definitely do-able.

It's even 100 (the first handle id starts to 99 and the 0 handle id is both the last created texttag and a not valid texttag, blame blizzard for this lame convention)

The good new is that it's 100 per player's computer, you can safely create/destroy/move/whatever texttags in local blocks (GetLocalPlayer).
Don't take me wrong that doesn't mean you can create 1200 texttags if you have 12 human players, that just means the texttag harcoded limit is for each human pc, and if a texttag should be visible only for the player X then only create and display it for him with GetLocalPlayer, instead of create it for each player and show it for player X.
This way you will reach the limit later or even never.

Magtheridon96 · Sep 21, 2011

Images are obviously faster than destructables.
Destructables need more data (HP, Facing angle, pathing, etc..)
Images are like destructables (Need to be rendered), but they don't have HP, they don't have a pathing map, they don't have a facing angle, etc...

Troll-Brain · Sep 21, 2011

Magtheridon96 said:
Images are obviously faster than destructables.
Destructables need more data (HP, Facing angle, pathing, etc..)
Images are like destructables (Need to be rendered), but they don't have HP, they don't have a pathing map, they don't have a facing angle, etc...

What about doodads (not targetables destructables without any pathing rule and such) ?
Virtually they are just "images" without any gameplay interact, and they would be more natural.

Magtheridon96 · Sep 21, 2011

Doodads should be faster than destructables and slower than images

Troll-Brain · Sep 21, 2011

That could be true but who knows how images are handled by the wc3 engine, i mean they are not supposed to be used heavily, pretty much almost never in fact, while it's quite the opposite for doodads.
I wouldn't bet that much on it.

Switch33 · Sep 21, 2011

I'm not sure at all that imported images would decrease the loading time.

Well, I think this is a none issue because you can use in-game regular icons.

Also doodads being better than destructables doesn't mean much either because doodads cannot be created by triggers; only destructables can be created with triggers.

From what i've heard so far I am definitely sorta leaning towards switching to only images and texttags for the inventory.

Magtheridon96 · Sep 22, 2011

I've been doing some research for a while and found some benchmarks in C++ that show the following results:

- 1000000 calls: sqrt: ~240 seconds
- 1000000 calls: pow: ~970 seconds

Since SquareRoot and Pow are just wrappers, we can say that using SquareRoot(x) is much faster than using Pow(x,0.5)

I can't link you to the benchmarks cause I closed the window and don't feel like looking for the link xD

Troll-Brain · Sep 22, 2011

Well, we are talking about jass not C++ (even if the jass virtual machine is probably made with C++).

I won't assume it as the truth until there is not a jass valid benchmark.

Magtheridon96 · Sep 23, 2011

Well, if SquareRoot is much faster than Pow(x,0.5) in C++, then we should have similar results here since SquareRoot is obviously a wrapper for the function sqrt(), and pow is obviously a wrapper for the function pow()

But, if you want to be 100% sure, go for it

I can't give you the benchmark code cause I'm kinda tired right now

Troll-Brain · Sep 23, 2011

[troll] jass != logic [/troll]

Nestharus · Oct 10, 2011

Compare first of group loop to linked list loop to group enum with a filter please ; ).

Bribe · Oct 10, 2011

http://www.thehelper.net/forums/showthread.php/167900-FarSightBJ?p=1377595#post1377595

Nestharus · Oct 10, 2011

What about first of group vs linked list?

edit
group -> 30-36 at x2
linked list -> 59.5 at x4