Bits of Interest

LeP

LeP

Level 11
Joined
Feb 13, 2008
Messages
516

SStrHash2


Hey, i just wanted to write some things down i've come across lately. According to this post jass-variables are looked up in a simple hashtable. PipeDream only mentions the name of the hash algorithm though. A few days ago a friend of mine and i reversed the code.

Code:
#include <stdint.h>

typedef  uint32_t  ub4;   /* unsigned 4-byte quantities */
typedef  uint8_t ub1;   /* unsigned 1-byte quantities */

#define mix(a,b,c) \
{ \
  a -= b; a -= c; a ^= (c>>13); \
  b -= c; b -= a; b ^= (a<<8); \
  c -= a; c -= b; c ^= (b>>13); \
  a -= b; a -= c; a ^= (c>>12);  \
  b -= c; b -= a; b ^= (a<<16); \
  c -= a; c -= b; c ^= (b>>5); \
  a -= b; a -= c; a ^= (c>>3);  \
  b -= c; b -= a; b ^= (a<<10); \
  c -= a; c -= b; c ^= (b>>15); \
}

ub4 hash( k, length, initval)
register ub1 *k;        /* the key */
register ub4  length;   /* the length of the key */
register ub4  initval;  /* the previous hash, or an arbitrary value */
{
   register ub4 a,b,c,len;

   /* Set up the internal state */
   len = length;
   a = b = 0x9e3779b9;  /* the golden ratio; an arbitrary value */
   c = initval;         /* the previous hash value */

   /*---------------------------------------- handle most of the key */
   while (len >= 12)
   {
      a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]<<24));
      b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]<<24));
      c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k[11]<<24));
      mix(a,b,c);
      k += 12; len -= 12;
   }

   /*------------------------------------- handle the last 11 bytes */
   c += length;
   switch(len)              /* all the case statements fall through */
   {
   case 11: c+=((ub4)k[10]<<24);
   case 10: c+=((ub4)k[9]<<16);
   case 9 : c+=((ub4)k[8]<<8);
      /* the first byte of c is reserved for the length */
   case 8 : b+=((ub4)k[7]<<24);
   case 7 : b+=((ub4)k[6]<<16);
   case 6 : b+=((ub4)k[5]<<8);
   case 5 : b+=k[4];
   case 4 : a+=((ub4)k[3]<<24);
   case 3 : a+=((ub4)k[2]<<16);
   case 2 : a+=((ub4)k[1]<<8);
   case 1 : a+=k[0];
     /* case 0: nothing left to add */
   }
   mix(a,b,c);
   /*-------------------------------------------- report the result */
   return c;
}

static ub4 SStrHash2(ub1 *key){
    ub1 buff[0x400];
    ub4 len=0;
    while(*key){
        if(*key <'a' || *key>'z'){
            if(*key == '/'){
                buff[len] = '\\';
            }else{
                buff[len] = *key;
            }
        }else{
            buff[len] = *key - 0x20;
        }
        key++;
        len++;
    }
    return hash(buff, len, 0);
}
You can read about the actual hashing part here.

I don't know if this has any cool use-cases except the one PipeDream mentioned allready.
But that's the reason i'm writing this: i hope you'll have some cool ideas how to use this.

For convenience i built a ruby interfaces for the hash functions.
ext/blizzard.c (you need to include the above code in one way or another)
Code:
#include <ruby.h>
#include <stdint.h>
    
static uint32_t hashRubyString(VALUE str){
    return SStrHash2(StringValueCStr(str));
}

static VALUE rb_sstrhash2(VALUE self, VALUE str){
    if(RSTRING_LEN(str) > 0x400){
        rb_raise(rb_eRuntimeError, "Key can be at max 1024 chars long.");
        return Qnil;
    }
    return UINT2NUM(hashRubyString(str));
}

static VALUE rb_globalHash(VALUE self, VALUE str){
    if(RSTRING_LEN(str) > 0x400){
        rb_raise(rb_eRuntimeError, "Key can be at max 1024 chars long.");
        return Qnil;
    }
    return UINT2NUM(hashRubyString(str) & 511);
}

static VALUE rb_localHash(VALUE self, VALUE str){
    if(RSTRING_LEN(str) > 0x400){
        rb_raise(rb_eRuntimeError, "Key can be at max 1024 chars long.");
        return Qnil;
    }
    return UINT2NUM(hashRubyString(str) & 3);
}

void Init_blizzard(){
        VALUE rb_blizzard_module = rb_define_module("Blizzard");
        rb_define_module_function(rb_blizzard_module, "strHash2", rb_sstrhash2, 1);
        rb_define_module_function(rb_blizzard_module, "globalHash", rb_globalHash, 1);
        rb_define_module_function(rb_blizzard_module, "localHash", rb_localHash, 1);
}
extconfig.rb
Code:
require 'mkmf'
create_makefile('blizzard', 'ext')
Compile
Code:
$ ruby extconfig.rb
$ make
And use it
Code:
require 'blizzard'
include Blizzard

puts strHash2 "foobar"
puts localHash "foobar"
puts globalHash "foobar"



As you can see in the above code, the string gets "uppercased" and slashes converted to backslashes. Does something spring to your mind?

It reminded me of StringHash. And yes, StringHash and SStrHash2 produce the same values.

Again, except for the obvious use cases like inlining it in an map-optimizer, checking for collisions or using it for your savecode on your website it would be very cool to hear from anyone if you have cool ideas what we could do with this piece of information.
 
Last edited:

LeP

LeP

Level 11
Joined
Feb 13, 2008
Messages
516
I actually tested StringHash with my locale and it worked. And
tbh. i don't care if it does not work with all characters but i'm sure it will (as long as you encode your strings the same way wc3 does).
But go ahead and prove me wrong. The more infos the merrier.
 
Level 17
Joined
Apr 27, 2008
Messages
2,455
I don't know character encoding that much but i'm not sure for example that '/' is always < 'a' and > 'z'.

Which one is your locale ?

It's always good to know how a native jass function work behind the scene, but i don't find an other idea which was not already mentionned.

To valid your function i suppose a russian, a chinese, every people which is not using ASCII should test it.
 

LeP

LeP

Level 11
Joined
Feb 13, 2008
Messages
516
Yes, the ifs can be written different.
I tried to test it with chinese chars, but i can't paste them into my worldedit; they just turn to questionmarks. But chars like 'ä' are working fine. And you can't use these chars for variable names anyway and they are really uncommon for StringHash use.
 

LeP

LeP

Level 11
Joined
Feb 13, 2008
Messages
516
Yes...
But the fix is easy if you get strange results. Replace the character literals with actual ascii values. But i'll leave it this way. More clearer imo. And it works on my machine(tm) :clol:
 
Level 14
Joined
Jun 27, 2008
Messages
1,325
If you are getting incorrect values try using int32_t and int8_t instead of the uns. long int / uns. chars:

Replace:
Code:
typedef  unsigned long  int  ub4;   /* unsigned 4-byte quantities */
typedef  unsigned       char ub1;   /* unsigned 1-byte quantities */
with
Code:
typedef uint32_t ub4;
typedef uint8_t ub1;
 

LeP

LeP

Level 11
Joined
Feb 13, 2008
Messages
516
If you are getting incorrect values try using int32_t and int8_t instead of the uns. long int / uns. chars:

Replace:
Code:
typedef  unsigned long  int  ub4;   /* unsigned 4-byte quantities */
typedef  unsigned       char ub1;   /* unsigned 1-byte quantities */
with
Code:
typedef uint32_t ub4;
typedef uint8_t ub1;

Yes. In fact in my local version i was already using stdint.h.
I left the old typedefs up because a) they made no difference on my machine b) they were the original code c) i hoped they would have the right size on all machines.

So c) was fixed and since b) doesn't matter anymore, one could do a nice rewrite of the code. But, you know, just for aesthetics. (i hope)
 
Top