FlushChildHashtable

WaterKnight · Apr 26, 2014

This function seems to be very fast. 0.2 seconds on 1 million entries. Hardly slower than single RemoveSavedX for a single entry -> small overhead. The memory gets released, so it should leave no trail nor slow down further actions.

You could use it as a general cleanup when an object is being destroyed/allocated or for data structures, only that the first key has to be reserved for the object id.

Ceday · Apr 26, 2014

Are you sure the amount of memory stored inside parentkey doesn't effect function speed?

Btw never seen someone use RemoveSavedX for multiple data :O

WaterKnight · Apr 26, 2014

Sure it does, but marginal. I had to raise it to 1mio entries to tell it apart. Just tested 10mio and it took 1.8 secs.

SaveX(hashtable, key1, key2, defaultValue) is a bit faster than RemoveSavedX because it does not need to dissolve/rebuild hashtable structures.

Zwiebelchen · Apr 28, 2014

Hmm... I almost never used RemoveSavedX more than once because in most cases, when you got two values for a childkey, you'd either just use a single struct in that child or have so many values that obviously flushing is the better choice.

So basicly this discovery is nothing special. But it's good to know that the programmers did a good job on the flush function.

WaterKnight · Apr 28, 2014

The secondary keys can be of different scopes, I do not have a private hashtable for every scope. The scopes do not have to know each other and it's rather variable how many child keys a parent has. Also the individual removing may malfunction, which would cause new objects to inherit old data. So a general flush onDestroy would be more secure and, as this suggests, be cheap even if there is little data, and not be spiky on large data amounts either.

You would still remove the specific data in the specific scope to have it more flexible, else you would require extra routines or maybe it's worth to branch-select if the object has been destroyed.

The main use for hashtables besides initial native connection are data structures though.

Zwiebelchen · Apr 28, 2014

That's why I don't recommend people to merge everything into a single hashtable.

Memory is cheap nowadays and as hashtables get slower the more entries they have, it's actually better to have a dedicated hashtable for every scope. That assumes each scope only uses one constant hashtable and does not create/destroy them dynamically, of course.

Plus it comes with the benefit of not having to fear collision if a scope somehow exceeds it's ID range.

WaterKnight · Apr 28, 2014

You can only have 256 hashtables and I do use a few tables for special tasks.

Zwiebelchen · Apr 29, 2014

Yeah but you will never have 256 systems that need a dedicated hashtable.

WaterKnight · Apr 29, 2014

For sure do I have more than 256 scopes currently and a lot of them make use of a hashtable because when I want to assign data of a scope to a foreign object, the policy in place is to connect it via hashtable.

JASS:

class Fireball
    constructor(Unit targetUnit)
        targetUnit.saveData(SCOPE_KEY, this)  <-hashtable

I have done it this way because there was no language support to extend the definition of an object in another scope. Of course reading/writing would be faster if you used arrays like:

JASS:

class Fireball
    thistype array TARGET_UNIT_DATA

    construct (Unit targetUnit)
        TARGET_UNIT_DATA[targetUnit] = this

but this would only be a loose binding, an indirect/passive assignment to the unit. Also it would not quite consider the object's id pool and trust that it fits into a vanilly array.

I would like to know how you tag foreign data to your objects.

Dr Super Good · Apr 30, 2014

You define a collection of general purpose hashtables. Each system that is compatible (uses them in an agreed way) requests one of the collection to use permanently. The request process happens at map initialization and uses an integer "weight" that determines the approximate load a system will put on a hashtable. A balancing algorithm allocates the hashtables to the systems so that each hashtable gets an approximate weight.

The end result is that you have a single location that is responsible for allocating hashtables that you can easily control. Any number of compatible systems can then use one of these hashtables, decided at runtime by the system. This lazy allocation means that systems that are not used do not need to be allocated a hashtable and so balancing is performed better.

Any number of such hashtable pools could exist, each with a specific usage style. You can even deallocate hashtables during a system shutdown to allow for running systems to be changed dynamically while still hashtable loading is balanced (to some degree, it might not actively balance as a result of deallocation of a system but it will be sure to try and re-balance when another system is allocated).

If a system defines a hashtable migration method (allowing lossless movement of hashtable entries from one table to another) and a more dynamic measurement of load (such as how many items are allocated in the hashtable by the system) then you can even perform periodic balancing.

Why must a system only use one hashtable in the same way? As long as it can keep track of which hashtable to use you could use any number of hashtables that are compatible. This could result in a hashtable allocation method and even better balancing as you can notify the hashtable pool of changes in load. However such a method will probably start introducing too much overhead to keep track of hashtables that it may not be worth it.

Zwiebelchen · Apr 30, 2014

@Water Knight:
That's an extremely special case that one out of a million users would want to go for. So I'd still recommend using seperate hashtables for every system to anyone.

WaterKnight · Apr 30, 2014

Zwiebelchen said:
@Water Knight:
That's an extremely special case that one out of a million users would want to go for. So I'd still recommend using seperate hashtables for every system to anyone.

So other people would go the local collection way but with hashtable?

JASS:

class Fireball
    hashtable LOCAL_DATA

    construct (Unit targetUnit)
        LOCAL_DATA.save(targetUnit.globalId, this)

I do not know how it depends on the distribution but in a simple test, the 2nd million saves in the same hashtable would take twice the time, the third one almost 4x. In contrast a new hashtable would be unaffected.

So yeah, you should not put too much in a single table. I agree with Dr Super Good that you could make a central system hand over hashtables for non-special tasks. Just pass them a wrapper type which can organize multiple tables. If you attach data to objects like I posted above, each object could be dealt their sole table (multiple objects can use the same one), then it's easy to get the right table, a single FlushChildHashtable per object still suffices and it's kinda balanced.

Bribe · May 3, 2014

My Retro system does a good job saving coordinates into hashtable. If a unit got skipped to a wrong coordinate that would show errors visually. That's one way to tell.

Zwiebelchen · May 5, 2014

WaterKnight said:
If you attach data to objects like I posted above, each object could be dealt their sole table (multiple objects can use the same one), then it's easy to get the right table, a single FlushChildHashtable per object still suffices and it's kinda balanced.

That's basicly how I'm handling it.

Every system got it's own hashtable. And then there's general purpose hashtables that are used for attachment of any (useful) kind:

a TimerHash, a UnitHash, an ItemHash.

The first stores any amount of 2-dimensional data based on the GetHandleId() parent key. This, however, requires each user to use local timers to keep each handleId unique instead of attaching to general purpose 32fps timers.
Since that was benchmarked to be faster in a lot of cases anyway, there's no reason not to do that.

The UnitHash is only for one-dimensional data, which means I can only put a struct in, as the parent key is reserved for a spell/case/whatever ID to avoid collision.

ItemHash works the same as UnitHash and is also 1-dimensional.

Needless to say, I almost only use TimerHash (for spells and scripted events). Attaching to units or items are extremely rare cases imho, as most data you might want to store to units or items are usually system-exclusive (and those got their own table).

WaterKnight · May 22, 2014

Applied it now. Every object picks a randomized hashtable or in my case set of hashtables at allocation. I figured GetRandomInt would be a fair distribution and is fast to invoke. The only downside is that you have a couple more array reads throughout the data management functions.

FlushChildHashtable

Similar threads