• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

Jass' hashtable == linked list?

Status
Not open for further replies.
Level 13
Joined
Nov 7, 2014
Messages
571
Hashtables/dictionaries/associative arrays are used to map keys to values. They are also supposed to be fast, i.e regardless of how many keys are inserted/added to the hashtable, the amount of time it takes to do lookups stays "constant". This does not appear to be the case for Jass' hashtable type.
The lookup times for the hashtables in Jass seem to increase the more keys get added. While it's possible that the implementation uses an array of linked lists, and these lists get longer and longer, at some point the hashtable should grow and rehash (as far as I know), thus reducing the length of these linked lists, reducing the lookup times.
But it seems to me that lookups take too much time, some of it is because Jass is bytecode interpreted and the rest... taken by iterating linked lists?


JASS:
library slowinghashtable initializer init

function writeln takes string s returns nothing
    call BJDebugMsg(s)
endfunction

globals
    force g_fn_call_force
    timer g_fn_call_timer
endglobals

function fn_call_init takes nothing returns nothing
    set g_fn_call_force = CreateForce()
    call ForceAddPlayer(g_fn_call_force, Player(0))
    set g_fn_call_timer = CreateTimer()
endfunction

function fn_call takes code f returns nothing
    call ForForce(g_fn_call_force, f)
endfunction


globals
    // the more keys we add, the more time it takes for each 'LoadInteger'

    constant integer Max_Keys = 1000 * 1000
    integer g_key = 0
    hashtable g_ht = InitHashtable()
endglobals

function hashtable_init_helper takes nothing returns nothing
    local integer n = 5000
    loop
        exitwhen n == 0
        call SaveInteger(g_ht, 0, g_key, 1)
        set g_key = g_key + 1
        set n = n - 1
    endloop
endfunction

function hashtable_init takes nothing returns nothing
    loop
        exitwhen g_key >= Max_Keys
        call fn_call(function hashtable_init_helper)
    endloop
endfunction

function do_nothing takes hashtable ht, integer k1, integer k2 returns integer
    return 0
endfunction

function do_hashtable_lookups_helper takes nothing returns nothing
    local integer a
    local integer x

static if false then
    // baseline, i.e how much time our setup takes, although calling natives should be slightly faster
    set a = 0
    loop
        exitwhen a == 5000
        set x = do_nothing(g_ht, 0, 0)
        set a = a + 1
    endloop
endif

static if true then
    // lookup the last added key
    set a = 0
    loop
        exitwhen a == 5000
        set x = LoadInteger(g_ht, 0, 999999) // Max_Keys - 1
        set a = a + 1
    endloop
endif

static if false then
    // it seems that looking up the first added key is slower than looking up the last added key,
    // perhaps because the last added key becomes the head of the linked list from which the lookups start?
    //
    // lookup the first added key
    set a = 0
    loop
        exitwhen a == 5000
        set x = LoadInteger(g_ht, 0, 0)
        set a = a + 1
    endloop
endif

static if false then
    // lookup the middle key
    set a = 0
    loop
        exitwhen a == 5000
        set x = LoadInteger(g_ht, 0, 499999)
        set a = a + 1
    endloop
endif
endfunction

function do_hashtable_lookups takes nothing returns nothing
    local integer a = 0
    loop
        exitwhen a == 1000
        call fn_call(function do_hashtable_lookups_helper)
        set a = a + 1
    endloop
endfunction


globals
    integer g_countdown
    timer g_countdown_timr = CreateTimer()
    code g_countdown_fn
endglobals

function countdown takes nothing returns nothing
    set g_countdown = g_countdown - 1

    if g_countdown != 0 then
        call writeln(I2S(g_countdown))
        call TimerStart(g_countdown_timr, 1.0, false, function countdown)
    else
        call fn_call(g_countdown_fn)
        call writeln("DONE")
    endif
endfunction

function begin_countdown takes code f returns nothing
    set g_countdown_fn = f
    set g_countdown = 5
    call writeln(I2S(g_countdown))
    call TimerStart(g_countdown_timr, 1.0, false, function countdown)
endfunction


function on_start takes nothing returns nothing
    call writeln("hashtable initialized with " + I2S(g_key) + " keys")
    call begin_countdown(function do_hashtable_lookups)
endfunction

function init takes nothing returns nothing
    call fn_call_init()
    call hashtable_init()
    call TimerStart(CreateTimer(), 0.0, false, function on_start)
endfunction

endlibrary // slowinghashtable
 

~El

Level 17
Joined
Jun 13, 2016
Messages
556
Could be just that the hashtables never rehash. I wouldn't be surprised considering the state of JASS.
Another possibility is that hashing only occurs at the parent key, and a linked list is used at the child key.
Yet another possibility entirely is that hashtable is actually a binary tree map. That'd explain the increase in lookup times.
 
Level 6
Joined
Jan 9, 2019
Messages
102
It's due to the fact that one hashtable is quite an 'infinite' storage already that made some believe what I also believe, by using only one to its fullest.

This thread just suggested the other way around. So, to speak what's in my head, I'd love to make a new hashtable-handler that handles multiple hashtables and interpret them as only one. Though this depends on how much difference will this make.

I think I'll start experimenting with this.
 
Level 15
Joined
Nov 30, 2007
Messages
1,202
It's due to the fact that one hashtable is quite an 'infinite' storage already that made some believe what I also believe, by using only one to its fullest.

This thread just suggested the other way around. So, to speak what's in my head, I'd love to make a new hashtable-handler that handles multiple hashtables and interpret them as only one. Though this depends on how much difference will this make.

I think I'll start experimenting with this.

You would incur overhead when going through the interface eating a way at the potential benefits, is gonna be my guess.

If you are doing this a start would be to first determine when a hashtable size starts to become problematic: 10.000, 1.000.000, more?
 
Level 19
Joined
Dec 12, 2010
Messages
2,069
It's due to the fact that one hashtable is quite an 'infinite' storage already that made some believe what I also believe, by using only one to its fullest.
Obviously there are nothing infinite in the real world, so kind of lame mistake. Tho chace that you've been saving millions of keys and never cleared them with FlushChild is so low, that you can count on a single one w/o big downsides
 
Level 9
Joined
Jul 30, 2012
Messages
156
As I said before, JASS' hashtables don't actually hash anything, at least it was the case the last time I checked (on patch 1.26)

Quoting a post from 2 years ago:

So... the two arguments (parentKey, childKey) from SaveInteger are converted to a hash with the "IntegerHash function", and then only the lower 3 or 4 bits are used for the index of the bucket

This is the case for object data tables. Jass hashtables don't even do that, they just take the key arguments, strip some bits and use that directly as the index.

Apparently there was also some code to "increase" the storage if the hashtable grows too much, I mean, if there are many many items stored, I suppose it starts using 5 or 6 bits of the keys and so on, as more storage is needed.

But the point is, there's actually no hashing algorithm taking place, it's just a plain array of linked lists. A good hashtable implementation is supposed to have a good hashing function, in order to reduce collisions, but Jass hashtables don't have any.
 
Status
Not open for further replies.
Top