Bits of Interest

LeP · Mar 16, 2012

SStrHash2

Hey, i just wanted to write some things down i've come across lately. According to this post jass-variables are looked up in a simple hashtable. PipeDream only mentions the name of the hash algorithm though. A few days ago a friend of mine and i reversed the code.

Code:

#include <stdint.h>

typedef  uint32_t  ub4;   /* unsigned 4-byte quantities */
typedef  uint8_t ub1;   /* unsigned 1-byte quantities */

#define mix(a,b,c) \
{ \
  a -= b; a -= c; a ^= (c>>13); \
  b -= c; b -= a; b ^= (a<<8); \
  c -= a; c -= b; c ^= (b>>13); \
  a -= b; a -= c; a ^= (c>>12);  \
  b -= c; b -= a; b ^= (a<<16); \
  c -= a; c -= b; c ^= (b>>5); \
  a -= b; a -= c; a ^= (c>>3);  \
  b -= c; b -= a; b ^= (a<<10); \
  c -= a; c -= b; c ^= (b>>15); \
}

ub4 hash( k, length, initval)
register ub1 *k;        /* the key */
register ub4  length;   /* the length of the key */
register ub4  initval;  /* the previous hash, or an arbitrary value */
{
   register ub4 a,b,c,len;

   /* Set up the internal state */
   len = length;
   a = b = 0x9e3779b9;  /* the golden ratio; an arbitrary value */
   c = initval;         /* the previous hash value */

   /*---------------------------------------- handle most of the key */
   while (len >= 12)
   {
      a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]<<24));
      b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]<<24));
      c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k[11]<<24));
      mix(a,b,c);
      k += 12; len -= 12;
   }

   /*------------------------------------- handle the last 11 bytes */
   c += length;
   switch(len)              /* all the case statements fall through */
   {
   case 11: c+=((ub4)k[10]<<24);
   case 10: c+=((ub4)k[9]<<16);
   case 9 : c+=((ub4)k[8]<<8);
      /* the first byte of c is reserved for the length */
   case 8 : b+=((ub4)k[7]<<24);
   case 7 : b+=((ub4)k[6]<<16);
   case 6 : b+=((ub4)k[5]<<8);
   case 5 : b+=k[4];
   case 4 : a+=((ub4)k[3]<<24);
   case 3 : a+=((ub4)k[2]<<16);
   case 2 : a+=((ub4)k[1]<<8);
   case 1 : a+=k[0];
     /* case 0: nothing left to add */
   }
   mix(a,b,c);
   /*-------------------------------------------- report the result */
   return c;
}

static ub4 SStrHash2(ub1 *key){
    ub1 buff[0x400];
    ub4 len=0;
    while(*key){
        if(*key <'a' || *key>'z'){
            if(*key == '/'){
                buff[len] = '\\';
            }else{
                buff[len] = *key;
            }
        }else{
            buff[len] = *key - 0x20;
        }
        key++;
        len++;
    }
    return hash(buff, len, 0);
}

You can read about the actual hashing part here.

I don't know if this has any cool use-cases except the one PipeDream mentioned allready.
But that's the reason i'm writing this: i hope you'll have some cool ideas how to use this.

For convenience i built a ruby interfaces for the hash functions.
ext/blizzard.c (you need to include the above code in one way or another)

Code:

#include <ruby.h>
#include <stdint.h>
    
static uint32_t hashRubyString(VALUE str){
    return SStrHash2(StringValueCStr(str));
}

static VALUE rb_sstrhash2(VALUE self, VALUE str){
    if(RSTRING_LEN(str) > 0x400){
        rb_raise(rb_eRuntimeError, "Key can be at max 1024 chars long.");
        return Qnil;
    }
    return UINT2NUM(hashRubyString(str));
}

static VALUE rb_globalHash(VALUE self, VALUE str){
    if(RSTRING_LEN(str) > 0x400){
        rb_raise(rb_eRuntimeError, "Key can be at max 1024 chars long.");
        return Qnil;
    }
    return UINT2NUM(hashRubyString(str) & 511);
}

static VALUE rb_localHash(VALUE self, VALUE str){
    if(RSTRING_LEN(str) > 0x400){
        rb_raise(rb_eRuntimeError, "Key can be at max 1024 chars long.");
        return Qnil;
    }
    return UINT2NUM(hashRubyString(str) & 3);
}

void Init_blizzard(){
        VALUE rb_blizzard_module = rb_define_module("Blizzard");
        rb_define_module_function(rb_blizzard_module, "strHash2", rb_sstrhash2, 1);
        rb_define_module_function(rb_blizzard_module, "globalHash", rb_globalHash, 1);
        rb_define_module_function(rb_blizzard_module, "localHash", rb_localHash, 1);
}

extconfig.rb

Code:

require 'mkmf'
create_makefile('blizzard', 'ext')

Compile

Code:

$ ruby extconfig.rb
$ make

And use it

Code:

require 'blizzard'
include Blizzard

puts strHash2 "foobar"
puts localHash "foobar"
puts globalHash "foobar"

As you can see in the above code, the string gets "uppercased" and slashes converted to backslashes. Does something spring to your mind?

It reminded me of StringHash. And yes, StringHash and SStrHash2 produce the same values.

Again, except for the obvious use cases like inlining it in an map-optimizer, checking for collisions or using it for your savecode on your website it would be very cool to hear from anyone if you have cool ideas what we could do with this piece of information.

Troll-Brain · Mar 16, 2012

Just a random comment, it will obviously work with ASCII character encoding, but what about the exotic ones, like with Chinese people.
I mean are you sure that the algorithm is as "simple" as that ?

Nestharus · Mar 16, 2012

Yea, you should try it with the extended Ascii thing to see if it still works ; ). Extended ascii includes all characters from all language, Kanji and Kana ofc included ;D.

LeP · Mar 17, 2012

I actually tested StringHash with my locale and it worked. And
tbh. i don't care if it does not work with all characters but i'm sure it will (as long as you encode your strings the same way wc3 does).
But go ahead and prove me wrong. The more infos the merrier.

Troll-Brain · Mar 17, 2012

I don't know character encoding that much but i'm not sure for example that '/' is always < 'a' and > 'z'.

Which one is your locale ?

It's always good to know how a native jass function work behind the scene, but i don't find an other idea which was not already mentionned.

To valid your function i suppose a russian, a chinese, every people which is not using ASCII should test it.

LeP · Mar 17, 2012

Yes, the ifs can be written different.
I tried to test it with chinese chars, but i can't paste them into my worldedit; they just turn to questionmarks. But chars like 'ä' are working fine. And you can't use these chars for variable names anyway and they are really uncommon for StringHash use.

Troll-Brain · Mar 18, 2012

I think you could use external strings.
But anyway i'm just trying to say that 'x' where 'x' is any valid character, doesn't always give the same integer, it depends what character encoding is used with the OS.

LeP · Mar 19, 2012

Yes...
But the fix is easy if you get strange results. Replace the character literals with actual ascii values. But i'll leave it this way. More clearer imo. And it works on my machine^(tm) :clol:

Magtheridon96 · Apr 7, 2013

Opened thread~

muzzel · Apr 7, 2013

If you are getting incorrect values try using int32_t and int8_t instead of the uns. long int / uns. chars:

Replace:

Code:

typedef  unsigned long  int  ub4;   /* unsigned 4-byte quantities */
typedef  unsigned       char ub1;   /* unsigned 1-byte quantities */

with

Code:

typedef uint32_t ub4;
typedef uint8_t ub1;

LeP · Apr 7, 2013

muzzel said:
If you are getting incorrect values try using int32_t and int8_t instead of the uns. long int / uns. chars:

Replace:

Code:

typedef unsigned long int ub4; /* unsigned 4-byte quantities */ typedef unsigned char ub1; /* unsigned 1-byte quantities */

with

Code:

typedef uint32_t ub4; typedef uint8_t ub1;

Yes. In fact in my local version i was already using stdint.h.
I left the old typedefs up because a) they made no difference on my machine b) they were the original code c) i hoped they would have the right size on all machines.

So c) was fixed and since b) doesn't matter anymore, one could do a nice rewrite of the code. But, you know, just for aesthetics. (i hope)

Bits of Interest

LeP

LeP

SStrHash2

Troll-Brain

Troll-Brain

Resources

Nestharus

Nestharus

Resources

LeP

LeP

Troll-Brain

Troll-Brain

Resources

LeP

LeP

Troll-Brain

Troll-Brain

Resources

LeP

LeP

Magtheridon96

Magtheridon96

Resources

muzzel

muzzel

Resources

LeP

LeP

Similar threads