[Documentation] String Type

Status
Not open for further replies.
String Type

Introduction

This thread intends to formally document the type string in JASS.​

General Information

The string type in JASS is similar to the string data type in C (a C string).
They are a sequence of characters, followed by a null terminator.[1] Like other
languages, you can perform standard string operations such as concatenation
(ex: "a" + "b" -> "ab"), substring (SubString), and the like. In Warcraft 3,
strings are UTF-8.[5]

Each ASCII character in a string takes 1 byte (as well as the null terminator).
Non-ASCII characters take up anywhere from 2 to 6 bytes.[2] ex:
"abc∑"
Theoretically holds 6 bytes: a:1, b:1, c:1, ∑:2, null terminator:1.

The max length of an individual, unconcatenated string is 1024 bytes. Concatenation
will allow you to bypass this limit, but is limited to ~4099 bytes.[4]

One notable feature of JASS strings is that they serve as pointers to an internal
string table. When Warcraft 3 encounters a new string, it will store it in the
string table under a unique integer id (the table starts at 0 and increments).[3] As
such, all subsequent uses of that string are loaded from the string table, rather
than reallocated. The downside is that the management of the string table is
out of our hands (we cannot directly access/modify it).

Like many programming languages, JASS features certain escape characters,
as well as color coding. [6]

String Table Behavior

The string table works by having key-string pairs. Each unique string
has a unique index pairing. The table starts at 0 and increments by 1 each
time a new string is introduced. For info the basis of my tests, see this.

This is where the practical use comes in. What creates entries? What doesn't?
Hopefully these examples will answer those questions. Keep in mind that the
table only enters new strings.

  • Any string defined in a line will receive an entry:
    JASS:
    local string s = "Hello World!"
    call StringLength("Hello")
    // Generates: 
    //   Hello World!,
    //   Hello
  • Any string that is reused will use a previous entry:
    JASS:
    local string s = "Hello World!"
    call BJDebugMsg("Hello World!") // same internal index as 's'
    // Generates:
    //   Hello World!
  • Case, spelling, punctuation, color, escape sequences, etc. all matter:
    JASS:
    local string array s
    set s[0] = "A"
    set s[1] = "a"
    set s[2] = "A\n"
    set s[3] = "|cffffcc00A|r"
    set s[4] = "A."
    set s[5] = "Á"
    set s[6] = "A" // same as s[0]
    // Generates:
    //   A,
    //   a,
    //   A\n,
    //   |cffffcc00A|r,
    //   A.,
    //   Á
  • Strings that are unused, but nonetheless generated, will be entered:
    JASS:
    call GetObjectName('hsor')
    // Generates:
    //   Sorceress
  • Substrings generate strings:
    JASS:
    call SubString("Hello World", 0, 3)
    // Generates:
    //   "Hello World", 
    //   "Hel"
  • Concatenated strings generate 1 string per addition:
    JASS:
    set s = "a" + "b" + "c" + "d" + "e" + "f"
    // Generates
    //   "a",
    //   "b",
    //   "ab",
    //   "c",
    //   "abc",
    //   "d",
    //   "abcd",
    //   "e",
    //   "abcde",
    //   "f",
    //   "abcdef"
  • As a general rule, strings entered in the table must be unique:
    JASS:
    local string array s
    set s[0] = R2S(5.00)    // new string 
    set s[1] = I2S(5)    // new string
    set s[2] = "5"        // same as s[1]
    set s[3] = "5.00000"    // new string
    // Generates:
    //   5.000,
    //   5,
    //   5.00000

Assumptions/TestsReasoning
Null Terminator
The limit for an individual string (without concatenation) is 1024 bytes.
However, you will exceed the limit (WE will crash) if you use a string of
1024 characters. This is because 1024 characters + 1 null terminator exceeds
the limit. Having 1023 characters will not crash.​
Character Bytes
Assuming the string limit is 1024 bytes, you can perform a simple
test to see how much space is allocated for particular characters. Start
off with a 1023 character string:
JASS:
"abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmno"
If you replace the "o" with some foreign character, such as ø, it will crash WE
(indicating that the string stored over 1024 bytes).

Another proof could be through the use of SubString:
JASS:
call SubString("Hellø World", 4, 5) // empty return
call SubString("Hellø World", 4, 6) // returns ø
It takes up two slots. A third proof could be through StringLength(),
in that each non-ASCII character will count for 2-6 in terms of length.

For example:
JASS:
call StringLength("€")
Will return 3.

String Table
The existence of the string table was proved a long time ago
using the return bug. Post patch 1.23b, you have to use a different method.
Here are some test codes (tested on 1.20e):
JASS:
function Id2String takes integer i returns string
    return i
    return ""
endfunction
// Returns the string at index i in the string table

function GetStringId takes string s returns integer
    return s
    return 0
endfunction
// Takes a string and returns its index in the string table

function ReadStringTable takes nothing returns nothing
    local integer i = 0
    loop
        exitwhen i == 50 // increase to read more
        call BJDebugMsg(I2String(i))
        set i = i + 1
    endloop
endfunction
// Reads the string table
In patch 1.24 through 1.27a, you can read the string table using the script found here.
String Limit
A bit frustrating to test. When you hit 1024 characters
in a string in WE and save, it will crash. The most characters you can have are
1023. Non-ASCII characters will take up 2 slots, escape sequences take up 1,
and ASCII characters take up 1 slot. Brute force testing.

You can avoid this limit through the use of concatenation:
"ab" + "bc"
With this, you can achieve up to 4099 characters. (Aniki | edo)
Special Characters

  • \n will break to the next line.
    JASS:
    call BJDebugMsg("Hello\nWorld")
    // Outputs:
    //     Hello
    //     World
  • \t is accepted, but has no effect.
  • \\ will place a backslash into a string. For example:
    JASS:
    call BJDebugMsg("C:\\Program Files\\Warcraft III")
    // Outputs:
    //    C:\Program Files\Warcraft III
  • \" will place a doublequote character into the string. Example:
    JASS:
    call BJDebugMsg("The cat says: \"Hello World!\"")
    // Outputs:
    //    The cat says: "Hello World"
  • |cAARRGGBB denotes a hex code, which will apply a color to text.
    The values range from 0-255 in hex notation. AA refers to alpha (no effect),
    RR refers to the red, GG refers to the green, BB refers to the blue.
    For example, |cffffcc00 (tooltips) makes a yellow hue (255 red, 204 green, 0 blue).
    The coloring will stop once the string encounters |r. For more info, see:
    Warcraft III Color Tags And Linebreaks

Credits

  • phyrex1an - Documented the string type pretty well, in addition to the other primitives:
    Jass Primitive Types
  • LeP - Some extra info on string bytes.
  • Zepir and others at wc3c.net - Documenting w3m and w3x file formats.

If you have any suggestions, feel free to make it before I switch back 1.26. If you want to run tests yourself, you can use version switcher (google it). It'll throw errors when you save, but just run test map after closing the errors and it should run fine as long as you don't have *actual* errors.
 
Last edited:
Level 22
Joined
Sep 24, 2005
Messages
4,821
OT: Is it ok to post comments here?

Anyway, I used to use string pointers as array indices, I even tried to integrate that to CS's gamecache usage but then the patch came.
 

LeP

LeP

Level 13
Joined
Feb 13, 2008
Messages
542
Assuming wc3-strings are utf-8 encoded a single character can take up to 4 bytes.
StringLength("€") == 3 holds true.
You should also write that wc3-strings are unicode strings in utf-8 encoding. You have to search for sources yourself though, so don't just simply take my word.

@chobibo: yeah i'd rly like to have the old returnbug sometimes…
 

LeP

LeP

Level 13
Joined
Feb 13, 2008
Messages
542
Have a look here (it's the up-to-date version mentioned here).

I guess you can use this as a valid source (but maybe make a copy and host it somewhere else).
 
Have a look here (it's the up-to-date version mentioned here).

I guess you can use this as a valid source (but maybe make a copy and host it somewhere else).

Wow, nice! You have a backup of that (I attached it to my post in the wc3c thread, just as an extra backup).

Although, it is a bit odd that they specify 1-6 bytes. Do you know of any examples that yield 6 bytes? I thought UTF-8 used 1-4 bytes, but I could be wrong.

edit: Apparently UTF-8 had 5/6-byte sequences prior to November, 2003. Perhaps that is the case. I'll try some CJK characters.
edit2: I forgot that I need to install a foreign version of wc3 to make those tests. Anyone here have a chinese version of wc3? xD Maybe I'll switch mine later. It is just tedious.
 
Last edited:
Level 23
Joined
Apr 16, 2012
Messages
4,041
sorry for seminecroing, but this is the test Ive done:

JASS:
library lib initializer init
    
    private function init takes nothing returns nothing
        local string s1 = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghi"
        local string s3 = s1 + s1
        set s3 = s3 + s1
        set s3 = s3 + s1
        set s3 = s3 + "12345678"
        
        /*
            This will make the size of string 4101 counting the null terminated character, which
            will crash the game
        */
        //set s3 = s3 + "9"
        call BJDebugMsg("Length: " + I2S(StringLength(s3)))
        call BJDebugMsg(SubString(s3, StringLength(s3)-1, StringLength(s3)))
    endfunction
    
endlibrary

Ive made the string 4099 + null terminate long without crashing, but after appending 4100th character, it crashed when loading
 
Last edited:
Level 19
Joined
Aug 8, 2007
Messages
2,765
Question on this

JASS:
call GetObjectName('hsor')
// Generates:
//   Sorceress

Isn't Sorceress already in the string table? Or does the UI use a seperate string table from the one used in jass
 
"Sorceress" is declared within a text file within the mpq. That is where wc3 reads it from (I assume). However, if you use the actual text "Sorceress" in a string, wc3 doesn't check if the string exists in the mpq (that'd probably lead to extra overhead). Instead, it just checks if it exists in the string table for that map session--if it doesn't, it adds it.

As for object editor strings, those are stored within the map's mpq (e.g. in the wts file, or the object editor files).

----

For the rest of the posts, I'll probably update this soon.
 
  1. The 1024th character doesn't crash WE it crashes Warcraft (typo?).
  2. StringLengthdoesn't actually return the length, it returns how many bytes the string is. I know you went over this in the documentation but maybe you should explicitly state it?
  3. According to my test, strings max size is 4096. Maybe it's different for Mac?


    string_results.jpg




    JASS:
    // native GetStringByteSize takes string str returns integer
    
    struct Test
        
        readonly static boolean running = false
        static boolean stop    = false
        
        static method onGameStart takes nothing returns nothing
            local integer i = 0
            local integer a = 0
            local string s  = ""
            local boolean b = false
            
            set thistype.running = not thistype.running
            
            if (not thistype.running) then
                set thistype.stop = true
                call BJDebugMsg("Disabling test...")
                return
            else
                call BJDebugMsg("Test started\n")
            endif
            
            loop
                exitwhen false
    
                if (thistype.stop) then
                    set s = s + "b"
                    set a = StringLength(s)
                    call BJDebugMsg("\nForce Stopped.\n\nLength: " + I2S(a))
                    //call BJDebugMsg("Bytes: " + I2S(GetStringByteSize(s)))
                    call BJDebugMsg("Last 4 Chars: " + SubString(s, a - 4 , a + 99))
                    set thistype.stop    = false
                    set thistype.running = false
                    return
                endif
                
                set s = s + "."
                set i = i + 1
                
                if (StringLength(s) != i) then
                    set a = StringLength(s)
                    call BJDebugMsg("Found max length (" + I2S(a) + " != " + I2S(i) + ")")
                    call BJDebugMsg("\nLength: " + I2S(i-1))
                    //call BJDebugMsg("Bytes: " + I2S(GetStringByteSize(s)))
                    call BJDebugMsg("Last 4 Chars: " + SubString(s, a - 4 , a + 99))
                    return
                endif
                
                set a = a + 1
                
                if (a == 500) then
                    set a = 0
                    call BJDebugMsg("Pausing Thread at " + I2S(i) /*+ " (" + I2S(GetStringByteSize(s)) + ")"*/)
                    call TriggerSleepAction(3)
                endif
                
                
            endloop
            
        endmethod
        
        static method onInit takes nothing returns nothing
            local trigger t = CreateTrigger()
            call TriggerRegisterPlayerEvent(t, Player(0), EVENT_PLAYER_END_CINEMATIC)
            call TriggerAddAction(t, function thistype.onGameStart)
        endmethod
    endstruct
 
  1. The 1024th character doesn't crash WE it crashes Warcraft (typo?).
  2. StringLengthdoesn't actually return the length, it returns how many bytes the string is. I know you went over this in the documentation but maybe you should explicitly state it?
  3. According to my test, strings max size is 4096. Maybe it's different for Mac?


  1. 1. Sorry, I meant it crashes when saving. But I tested that with the standard editor. Does it crash JNGP when you save a map with a string with > 1024 characters? I'm guessing it doesn't. It is probably because of PJASS. The regular syntax checker will fart.

    2. Yeah, I should.

    3. I tested it on Windows. Maybe I need to re-test. But anyway, edo got a similar result here:
    http://www.hiveworkshop.com/forums/2441361-post8.html
    Not sure why it ended up on 4099 rather than 4096. 4096 seems to make sense. Maybe it is some overflow or something.
 
Level 23
Joined
Apr 16, 2012
Messages
4,041
dont know, but I could supposedly write string incrementor and let it run and see what happens. The problem here is that the table could be able to chew threw quite a few entries, maybe even millions, so test like this in this language could take unimaginable amount of time, and so I dont think you have to worry about it
 
What happens when the string table has too much data?

I'm not sure. Someone should test it. :)

Two guesses:
(1) Wc3 has a hardcoded limit for how large the string table can be, and it clears it once it takes up that much space (or it overwrites existing entries)
(2) Or wc3 just slaps an out of memory error in your face
 
Level 23
Joined
Feb 6, 2014
Messages
2,466
Fill Table String Test

So I tested how much the string table can handle..
According to the test result,
String table entries reaches 63488 and nothing happened..


JASS:
scope Test initializer OnInit
    
    globals
        private string array char
        private integer charIndex = 1
        private integer tableEntries
    endglobals
    
    private function Expire takes nothing returns nothing
        local string s = char[charIndex]
        local integer i = 1
        loop
            exitwhen i > 1023
            set s = s + char[charIndex]
            set tableEntries = tableEntries + 1
            set i = i + 1
        endloop
        call BJDebugMsg("Table Entries = " + I2S(tableEntries))
        set charIndex = charIndex + 1
        if charIndex > 62 then
            call DestroyTimer(GetExpiredTimer())
            call BJDebugMsg("Test Finished")
        endif
    endfunction
 
    private function Main takes nothing returns nothing
        call TimerStart(CreateTimer(), 0.5, true, function Expire)
    endfunction

    //===========================================================================
    private function OnInit takes nothing returns nothing
        local trigger t = CreateTrigger(  )
        call TriggerRegisterTimerEvent(t, 0.01, false )
        call TriggerAddAction(t, function Main )
        
        set char[1] = "a"
        set char[2] = "b"
        set char[3] = "c"
        set char[4] = "d"
        set char[5] = "e"
        set char[6] = "f"
        set char[7] = "g"
        set char[8] = "h"
        set char[9] = "i"
        set char[10] = "j"
        set char[11] = "k"
        set char[12] = "l"
        set char[13] = "m"
        set char[14] = "n"
        set char[15] = "o"
        set char[16] = "p"
        set char[17] = "q"
        set char[18] = "r"
        set char[19] = "s"
        set char[20] = "t"
        set char[21] = "u"
        set char[22] = "v"
        set char[23] = "w"
        set char[24] = "x"
        set char[25] = "y"
        set char[26] = "z"
        set char[27] = "A"
        set char[28] = "B"
        set char[29] = "C"
        set char[30] = "D"
        set char[31] = "E"
        set char[32] = "F"
        set char[33] = "G"
        set char[34] = "H"
        set char[35] = "I"
        set char[36] = "J"
        set char[37] = "K"
        set char[38] = "L"
        set char[39] = "M"
        set char[40] = "N"
        set char[41] = "O"
        set char[42] = "P"
        set char[43] = "Q"
        set char[44] = "R"
        set char[45] = "S"
        set char[46] = "T"
        set char[47] = "U"
        set char[48] = "V"
        set char[49] = "W"
        set char[50] = "X"
        set char[51] = "Y"
        set char[52] = "Z"
        set char[53] = "0"
        set char[54] = "1"
        set char[55] = "2"
        set char[56] = "3"
        set char[57] = "4"
        set char[58] = "5"
        set char[59] = "6"
        set char[60] = "7"
        set char[61] = "8"
        set char[62] = "9"
        set tableEntries = 62
    endfunction

endscope
 

Attachments

  • Test String Table Limit.w3m
    16.8 KB · Views: 62
It is difficult to test on this patch. When I made the tests in the original post, I switched back to patch 1.20 and used the return bug to read the string table.

If you want to test whether it eventually crashes, you can probably just use an integer and keep doing I2S. If it goes up beyond the integer limit without crashing, then it probably isn't something to worry about. If memory consistently goes up, then that means it is expanding the table. If memory stops at a certain point, then they probably have some sort of recycling or garbage collection implementation.
 
Level 13
Joined
Nov 7, 2014
Messages
571
Maybe someone can figure out a "nice hack" around Blizzard's return bug fix that would allow reading again from the internal string table? I tried the loop return bug from (here) but there it's from real to integer (but it works), here it doesn't ... :/

JASS:
function attempt_to_read_from_internal_string_table_patch_1_26 takes integer i returns string
    // passes type checking (when saving in WE), but map doesn't load (Blizzard's return bug fix)
    /*
    return i
    return ""
    */
    
    // map doesn't load
    /*
    if false then
        return ""
    else
        return i
    endif
    */

    // FATAL ERROR on map load...
loop
    if false then
        return ""
    else
        return i
    endif
endloop

    // any ideas how to resurect this nice but lost feature of Jass2? =)
    
    return ""
endfunction
 
Level 13
Joined
Nov 7, 2014
Messages
571
The existence of the string table was proved a long time ago
using the return bug. Post patch 1.23b, it is impossible to perform this test.

Maybe someone can figure out a "nice hack" around Blizzard's return bug fix that would allow reading again from the internal string table?

Someone (@leandrotp) did 'figure out a "nice hack"', so it's again possible to read from the string table even in patch 1.27


Also about the max string length, this test:
JASS:
library foo initializer bar

globals
    string array sub_strings
endglobals
function init_sub_strings takes nothing returns nothing
    set sub_strings[1] = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
    set sub_strings[2] = "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"
    set sub_strings[3] = "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC"
    set sub_strings[4] = "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD"
    set sub_strings[5] = "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
    set sub_strings[6] = "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"
    set sub_strings[7] = "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"
    set sub_strings[8] = "HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH"
    set sub_strings[9] = "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"
    set sub_strings[10] = "JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ"
    set sub_strings[11] = "KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK"
    set sub_strings[12] = "LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL"
    set sub_strings[13] = "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM"
    set sub_strings[14] = "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"
    set sub_strings[15] = "OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"
    set sub_strings[16] = "PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP"
    set sub_strings[17] = "QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ"
    set sub_strings[18] = "RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR"
    set sub_strings[19] = "SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS"
    set sub_strings[20] = "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT"
    set sub_strings[21] = "UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU"
    set sub_strings[22] = "VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV"
    set sub_strings[23] = "WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW"
    set sub_strings[24] = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    set sub_strings[25] = "YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
    set sub_strings[26] = "ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
    set sub_strings[27] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    set sub_strings[28] = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
    set sub_strings[29] = "cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc"
    set sub_strings[30] = "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    set sub_strings[31] = "eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee"
    set sub_strings[32] = "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"
    set sub_strings[33] = "gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg"
    set sub_strings[34] = "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh"
    set sub_strings[35] = "iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"
    set sub_strings[36] = "jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"
    set sub_strings[37] = "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk"
    set sub_strings[38] = "llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll"
    set sub_strings[39] = "mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm"
    set sub_strings[40] = "nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn"
    set sub_strings[41] = "oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo"
    set sub_strings[42] = "pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp"
    set sub_strings[43] = "qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq"
    set sub_strings[44] = "rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr"
    set sub_strings[45] = "ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss"
    set sub_strings[46] = "tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt"
    set sub_strings[47] = "uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu"
    set sub_strings[48] = "vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv"
    set sub_strings[49] = "wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww"
    set sub_strings[50] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    set sub_strings[51] = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
    set sub_strings[52] = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
    set sub_strings[53] = "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
    set sub_strings[54] = "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"
    set sub_strings[55] = "2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222"
    set sub_strings[56] = "3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333"
    set sub_strings[57] = "4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444"
    set sub_strings[58] = "5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555"
    set sub_strings[59] = "6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666"
    set sub_strings[60] = "7777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777"
    set sub_strings[61] = "8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888"
    set sub_strings[62] = "9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999"
endfunction

globals
    string S = ""
endglobals

function lazy_init takes nothing returns nothing
    local integer i

    set i = 1
    loop
        // exitwhen i > 41 // crash
        exitwhen i > 40

        set S = S + sub_strings[i]

        set i = i + 1
    endloop
    // After the loop StringLength(S) == 4000

    // set S = S + SubString(sub_strings[1], 0, 95)
    // set S = S + SubString(sub_strings[1], 0, 96)
    // set S = S + SubString(sub_strings[1], 0, 97)
    // set S = S + SubString(sub_strings[1], 0, 98)
    // set S = S + SubString(sub_strings[1], 0, 99)

    // set S = S + SubString(sub_strings[1], 0, 100) // crash

    // crash
    // set S = S + SubString(sub_strings[1], 0, 50)
    // set S = S + SubString(sub_strings[1], 0, 50)

    // crash
    // set S = S + SubString(sub_strings[1], 0, 50)
    // set S = S + SubString(sub_strings[1], 0, 46)
    // set S = S + "WXYZ"

    // set S = S + SubString(sub_strings[1], 0, 50)
    // set S = S + SubString(sub_strings[1], 0, 46)
    // set S = S + "X"
    // set S = S + "Y"
    // set S = S + "Z"
    // call BJDebugMsg(SubString(S, 4098, 4099)) // => (null)
    // call BJDebugMsg(I2S(StringLength(S))) // 4097

    set S = S + SubString(sub_strings[1], 0, 50)
    set S = S + SubString(sub_strings[1], 0, 46)
    set S = S + "XYZ"
    call BJDebugMsg(SubString(S, 4098, 4099)) // => Z
    call BJDebugMsg(I2S(StringLength(S)))

    // it seems that:
    //     set S = S + "<char>"
    // is different than:
    //     set S = S + "<more-than-1-char>"

    // it seems the max-string-length is 4099;
    // internally it could be 4100, but index 4099 reserved for the "\0"?

    // call BJDebugMsg(SubString(S, 4000, 4099))
endfunction

function bar takes nothing returns nothing
    call ExecuteFunc(init_sub_strings.name)
    call TimerStart(CreateTimer(), 0.0, false, function lazy_init)
endfunction

endlibrary

suggests that the max string length is 4099, and also the concatenation of a single character string
JASS:
    set S = S + "A"
    set S = S + "B"

is different from the concatenation of <more-than-1> character strings
JASS:
    set S = S + "AB"

the difference is noticeable when the string has a length between 4097 .. 4099
 
judging by our code, you cannot just convert a string to it's handle, it seems to return string's address or some trash. we have no implementation of reading from string hashtable. but why would anyone need it anyway?

There isn't much use to it beyond debugging/testing how wc3 handles strings, especially now that we have StringHash.
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,241
There isn't much use to it beyond debugging/testing how wc3 handles strings, especially now that we have
StringHash
.
It is worth noting that string hash is only a very loose hash. It is case insensitive (StringHash("A") and StringHash("a") return same value) and is prone to hash collisions (with 1000 strings there starts to be quite a fair chance of collision). This is due to how weak the hashes hare compared to industrial use cryptographic hash algorithms which can be 160 bits or more.
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,241
but it fast, and as we always knew, blizzard didn't do anything they didn't need themselves. They needed fast "almost unique" hashing, they did it.
It was done for us? It was intended to allow hashtables to operate in a way similar to a gamecache where both parent and child indices were strings instead of integers. This was to allow the hashtables to be a 1:1 replacement of gamecache based systems like the famous HandleVars after the type casting exploit such systems used was fixed.

However due to the nature of hashtable API this was not really the case. Where as gamecache might internally use a hashtable structure it probably has its own has collision management system built in making hash collisions a non-concern. Hashtables with keys sourced using StringHash do not have such collision avoidance and so can suffer from collisions which ultimately can lead to some pretty serious or hard to track down bugs.

This is why features such as vJASS "key" were created by users rather than hashing a string constant. Such keys are guaranteed unique and can serve the same purpose as string constants used to.

StringHash has its uses, eg for fast character lookup for save/load systems, but if using to hash general purpose strings (eg chat messages, or lots of string constants) one must be prepared to cope with collisions.

In many ways a function to return the string table entry number of a string would be more useful. That number is always guaranteed unique with respect to other strings so could be used without the risk of any collisions.
 
Level 19
Joined
Dec 12, 2010
Messages
2,070
In many ways a function to return the string table entry number of a string would be more useful. That number is always guaranteed unique with respect to other strings so could be used without the risk of any collisions.
but could not be shared whatsoever with other clients in case you need that, so thats another issue. tho I cant think of situation anyone would need that
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,241
but could not be shared whatsoever with other clients in case you need that, so thats another issue. tho I cant think of situation anyone would need that
Is the string table not synchronous or is it just the strings in the table?

For example would "footman" on a german client end up on a different index to footman on an English client? Or would they end up the same index but with a different string value?

Also surely this would also apply to hashing localized strings with StringHash?
 
Thanks.

Some questions regarding the test scenario about the string table. (assuming wc3's c++ usage)
  • If all used strings perma exist in the string table.. in which part of memory are dynamically created entries located then (not its reference), since it must not be static-, read-only memory. (where usually only string literals are stored in c++ at compiletime) It must be their own string library implementation which also allocates on heap, but .. never frees it? ;o
  • May there be some periodical collector going through the string table, freeing entries that match a condition?
  • If not, is then small string optimization even a thing then? If string table was tested for wc3, with which strings was it tested? Short ones, long ones,.. both? This optimization would be very good for damange amount texts, string iterations, and alike. So also small strings are always required to allocate memory on the heap?
Has someone who experimented with string table maybe some closer description of the test scenario?
 
Last edited:
Level 19
Joined
Dec 12, 2010
Messages
2,070
afaik there are nothing like gc, just the same reason why big databases never clear out it's data but just mark it as "removed" on backend - to avoid fragmentizing and improve search/memory consumption.
I don't believe there are any optimizations like you've mentioned either, because each string is located as a raw string with 0-byte ending inside common memory regions. String table only provides shortcuts and improve comparing speed. Like a simple linked list, it doesn't contain any extra logic behind. At least it would be noticable, if there were, when working with memhack, but so far I've never seen a string being re-allocated, which means its rather a big heap with alloc() when needed.

It must be their own string library implementation which also allocates on heap, but .. never frees it? ;o
when you close the game (map/campaign), it clears out everything out of memory, including string table. Corrupting string table with fake links (aka replacing string address directly inside the table) will cause memory errors later, I had this shit before.
 
Last edited:
afaik there are nothing like gc, just the same reason why big databases never clear out it's data but just mark it as "removed" on backend - to avoid fragmentizing and improve search/memory consumption.
string libs should take care of cleaning, when pointer on stack is out of scope. Just the way you use normaly strings in c++, memory gets allocated on heap (for longer strings), and after the function ends, you do not call something like a string destructor yourself. But in warcraft, if the string perma exists, this does mean they use their own perma allocation, and hence do never care if strings references are still a thing. (would explain why nulling strings would make no sense)

Not sure it is related how for example customers gets data provided from big data bases, entries marked internally as removed, not being technically removed. This should have usually useability difference, that for example customer can't see entries that are not meant to exist anymore, because of privacy or what ever.. but what would be the relation of string table to wc3 coders? Jassers just always can use any string they want, and no masking is needed, and the question is about its allocation or new referencing.

When program ends, then I guess it's normal that also the memory for constants and literals is cleared, just like everything should be cleared that is used by the program. But in runtime is interesting, how dynamically created strings behave.

But if the table doesn't include any extra string logics, then it confuses me a bit. string table seems like a weird concept for me. So there needs to be some mapping to the string finaly, as how else would I find the correct entry through the reference dynamically, when not looping through the table.

If you never experienced any new allocation with memhack, then maybe it's really just it that they never free their memory again when ever a new string is allocated on heap.
 
Level 19
Joined
Dec 12, 2010
Messages
2,070
they dont care about cleaning because there are no real case when you'll have issues with too many unique strings, only if thats your intention. I manged to slow down the game generating ~200k unique strings. Thats way too many for any normal map. Nulling strings ,like nulling "code" references, wont make any sense since the object is never destroyed.
String Table provides faster compare, and wc3 is all about comparing. Variable ID, function ID from jass bytecode morphs into a string and then searched through the table to find the string with the same hash. Theres no ops with raw strings, it's always StringHash - and string table stores that hash next to the string's address. I've found about the purpose of string table on Stackoverflow back then, did never care since then. It's just better for engine to work.
 
200k entries sounds a lot for the engine to slow down, but meh, a few thousends sounds plausible, and I would not understand the argument on doing it on purpose, only because of a feel that it's maybe acceptable.

Mind to elaborate what exactly is compared faster instead of what? In this relation I mean, why is it required to literally never free up the string's memory until application end forces a free? It would make sense to remove entries from table from time to time, that are not being used and/or match an other condition (being dynamically created from stack refs), as the point from above, keeping every data internally inside memory seems not reasonable.

Because of this, people are spreading to keep unique strings at mininal, and honestly that's bullshit for me. Like noone should care for such low level mangement, when making string iterations and comparisons, trying not to create unique strings, or string2real coversions. But technically those people would have a point.

I'll try to find it, but you could also share the exact article about string table in c++ if you remember.
 
Last edited:
Level 19
Joined
Dec 12, 2010
Messages
2,070
nobody cares about strings, idk where did you get that. Bless god they know about leaks at first place.
Optimized C++
Also a table reduce load by simplifying duplicate strings, for instance. You can google for "c++ string table management" for more articles, most of the times inventing yet another wheel is recommended, people don't advice to work with raw strings at any matter. So I assume blizzard, being non-newbies, went with their own implementation as well.
Maybe GC have been planned, but never finished, because, once again, you dont do the job you dont need. No maps ever suffered of strings overflow.

comparing string's hash is faster than comparing the string itself with another string. "a"=="A" is kinda simple, but table allows to compare 1024+ symbols strings faster, just because it won't retrieve string into CPU low-level cache (risking to replace probably much more needed data). There are a lot of benefit of keeping cache intact for the future ops. Basically low-level optimization. Plus don't forget the year of when engine has been programmed.
 
Strings and leaks - string concatinations
[code=jass] - [possible leak] strings used in functions - damage texttags
What and How do Strings Leak? - gametime
[General] - String contains - substrings

^sure, some are older, but one can see posts alike from time to time, people spread it, and it's something someone cares.

edit:

But thanks for your thoughts. : ) What I take with me is
  • fast hash comparisons vs long string comparisons
  • might reduce string copies, re-allocations, for better performance
  • perma allocation on heap is an accepted "leak-risk", or just no GC was created back then to remove potential removeable entries
 
Last edited:
Status
Not open for further replies.
Top