• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

Reforged's luahelper.lua and broken FourCC

Status
Not open for further replies.
Level 19
Joined
Jan 3, 2022
Messages
320
I won't just feed you, I will teach you how to fish

1. How to extract luahelper.lua​

You need at least strings (Sysinternals or POSIX/GNU) to extract readable text from the .exe file. The following example shows GNU strings in shell (install bash and coreutils for the "Linux console" of your choice):
Bash:
user$ strings -8 -t d -d "Warcraft III.exe"|grep -C2 --color -i __jarray
#Output:
37392056 Af%X!Sdy
37392449 -- Jass2 array that returns an empty default value when the key does not exist
37392528 function __jarray(default)
37392555     return setmetatable({}, {
37392585         __index = function()
--
37595672  returns %s
37595688 function %s(
37595752 = __jarray(
37595784 ) then break end
37595808 while (true) do
This searched all text strings for "__jarray" to find luahelper.lua. You can of course redirect the output (STDOUT) of strings to file and search with notepad. There're two matches: one at offset 37392449 (that's where the real luahelper.lua starts) and apparently another lua script at 37595752.

2. luahelper.lua​

...Then let's print the file starting at these positions (start at byte 37392400 | only readable text | max 50 lines):
Bash:
user$ tail --bytes +37392400 "Warcraft III.exe" | strings -n 1 | head -n 80
Lua:
-- Jass2 array that returns an empty default value when the key does not exist
function __jarray(default)
    return setmetatable({}, {
        __index = function()
            return default
        end
    })
end
-- Math random functions should come from the game engine
function math.randomseed(seed)
    return SetRandomSeed(seed // 1)
end
function math.random(m, n)
    if m and n then
        return GetRandomInt(m // 1, n // 1)
    elseif m then
        return GetRandomInt(1, m // 1)
    else
        return GetRandomReal(0.0, 1.0)
    end
end
if DisplayTextToPlayer then
    function print(...)
        local sb = {}
        for i = 1, select('#', ...) do
            sb[i] = tostring(select(i, ...))
        end
        DisplayTextToPlayer(GetLocalPlayer(), 0, 0, table.concat(sb, '    '))
    end
end
-- Helper function that enables the following syntax: ARCHMAGE = FourCC('Hamg')
function FourCC(id)
    return  0x1000000 * string.byte(id:sub(1,1)) +
              0x10000 * string.byte(id:sub(2,2)) +
                0x100 * string.byte(id:sub(3,3)) +
                        string.byte(id:sub(4,4))
end
-- Debug build?
_DEBUG = $debug$
$debug$
luahelper.lua
Af%X!Sdy
$BQu
RaFC
xbgCRHI
internal error in the LUA transpiler
Lines 2-42 are obviously the luahelper.lua file. $debug$ there is apparently substituted by the game itself, before the file gets loaded as Lua code. The last lines are garbage, not part of file.
Hm looks like the jass2lua transpiler also has vJass support. Whatever, it's a Lua thread here.
Code:
user$ tail --bytes +37595700 "Warcraft III.exe" | strings | head -n 40
 then
elseif
else
local
 = {}
 = __jarray(
 = nil
if (
) then break end
while (true) do
%sreturn
 end
TypeDefine('%s'
, '%s'
globals.
globals(function(_ENV)
end)
TriggerRegisterVariableEvent
null_node_ptr
program
library
interface
module
scope
textmacro
runtextmacro
return_
exitwhen
keyword
delegate
implement
literal_value
variable_decl
addr_of
logical_not
logical_and
logical_or
logical_less
logical_less_eq
logical_greater
Ok what do we see here?
  1. __jarray for internal use
  2. math.random/seed functions delegated to Jass' random functions to enable interoperation between two scripting languages, to have a common state.
  3. Inefficient replacement for print()
  4. Broken implementation of FourCC() - þou shalt not use þis

Blizzard's FourCC() function is broken​

Why? Refer to Jeff Pang's JASS Manual:
Integers can also be expressed as four character strings enclosed in single quotes, in which case the value is equal to the bit-string formed by the ASCII bytes (4 bytes = 32 bit value). These values are usually used to reference unit/upgrade/etc. identifiers which are enumerated in the *.slk files. For example, 'abcd', 'AhGn', 'EEEE'.
Hard to understand and harder to find still. What this means is that the four ASCII characters are the literal 4-byte representation of the uint32. However, 3/2/1-char long values are legal and perfectly understood by Jass. For example 'd' will turn into 100 (decimal). Because the byte value for ASCII 'd' is 100. This is the algorithm behind Blizzard's object type IDs. Extra: Due to standard A-Za-z0-9 limitations, these can represent <2^24 values, so they're probably serialized as 24-bit long shorts for network transfer.
This means:
(FourCC) 'dddd' = (100 << 24) + (100 << 16) + (100 << 8 ) + 100 = 1677721600 + 6553600 + 25600 + 100 = 1684300900
(not allowed) 'ddd' = (0 << 24 ) + (100 << 16) + (100 << 8) + 100 = 0 + 6553600 + 25600 + 100 = 6579300
(not allowed) 'dd' = (0 << 24) + (0 << 16) + (100 << 8) + 100 = 0 + 0 + 25600 + 100 = 25700
(understood by Jass) 'd' = (0 << 24) + (0 << 16) + (0 << 8) + 100 = 100

Ok, then this is what happens with their broken code:
Lua:
> print(FourCC("d"))
1684300900
> print(FourCC("d"))
stdin:4: attempt to perform arithmetic on a nil value
stack traceback:
        stdin:4: in function 'FourCC'
        stdin:1: in main chunk
        [C]: in ?
To be honest, you shouldn't encounter a single-letter character codes while in Lua, the Jass2Lua transpiler will convert them to integers in the code.

A Working FourCC() implementation. Use this:​

Here's a proper, working substitute. Compatible with Lua 5.3+
Lua:
function unreforgedFourCC(str)
    local n = 0
    local len = #str
    for i = len, 1, -1 do
        n = n + (str:byte(i,i) << 8*(len-i)) -- shift by 0,8,16,24
    end
    return n
end

PS: The thread FourCC breaks in V1.31 shows Blizzard's broken function without attribution and it's taken for granted that it works. No, it doesn't.
 
Last edited:

Wrda

Spell Reviewer
Level 26
Joined
Nov 18, 2012
Messages
1,888
Might as well provide the reverse function.
Lua:
function FourCCId2S(id)
    local s = ""
    local i = id
    while i > 0 do
        s = s..string.char(i  & 0xFF)
        i = math.floor(i/256)
    end
    return string.reverse(s)
end
FourCC("hfoo") = 1751543663
FourCCId2S(1751543663) = "hfoo"
The function name is questionable...I don't have such brilliant ideas for such names.
 
Level 18
Joined
Jan 1, 2018
Messages
728
What this means is that the four ASCII characters are the literal 4-byte representation of the uint32. However, 3/2/1-char long values are legal and perfectly understood by Jass. For example 'd' will turn into 100 (decimal). Because the byte value for ASCII 'd' is 100. This is the algorithm behind Blizzard's object type IDs.
This means:
'dddd' = (100 << 24) + (100 << 16) + (100 << 8 ) + 100 = 1677721600 + 6553600 + 25600 + 100 = 1684300900
'ddd' = (0 << 24 ) + (100 << 16) + (100 << 8) + 100 = 0 + 6553600 + 25600 + 100 = 6579300
'dd' = (0 << 24) + (0 << 16) + (100 << 8) + 100 = 0 + 0 + 25600 + 100 = 25700
'd' = (0 << 24) + (0 << 16) + (0 << 8) + 100 = 100
If by 3/2/1-char long bytes you meant characters that add up to exactly 4 bytes you would be correct, but JASS actually does not support 'dd' or 'ddd'.
An example of a 2-char long FourCC is 'd™', this is valid syntax because ™ is 3 bytes.

I don't know about Blizzard but I have made my own JASS syntax/parser library and I have different syntax classes for character literals (for example 'd') and fourCC integer literals. Since these are different things and because of the name, it makes sense that the lua FourCC function only supports fourCC literals and not character literals.

EDIT: Did a quick test in world editor for science:
call BJDebugMsg(I2S('d™')) => 1675723682
BJDebugMsg(I2S(FourCC("d™"))) => 1692566690
 
Last edited:
Level 19
Joined
Jan 3, 2022
Messages
320
@Drake53 thanks, I removed that part. It was my speculation I was just getting into Jass and Wacraft 3 API by deobfuscating a map and that was before I analyzed the Jass2Lua transpiler :peasant-work-work:


@Wrda I forgot to update the post, I've ran benchmarks since and different versions including string.pack. Although non-printable ASCII is technically possible, I think it should be filtered from the output albeit this slows the function to 0.5x speed. But then that's still 2x as fast as the original FourCC.
Lua:
-- 100k runs over 2Ki data: 128.1s
function FourCC(id)
    return  0x1000000 * string.byte(id:sub(1,1)) +
              0x10000 * string.byte(id:sub(2,2)) +
                0x100 * string.byte(id:sub(3,3)) +
                        string.byte(id:sub(4,4))
end

-- 10k runs on 1k data: 4.914s
-- 100k runs on 2Ki data: 86.86s
function FourCC2Int(str)
    local n = 0
    local len = #str
    for i = len, 1, -1 do
        n = n + (str:byte(i,i) << 8*(len-i))
    end
    return n
end

-- 10k runs on 1k data: 3.4s
-- 100k runs on 2Ki data: 64.25s
function Int2FourCCMatch(int)
    return string.char((int & 0xff000000)>>24, (int & 0x00ff0000)>>16, (int & 0x0000ff00)>>8, int & 0x000000ff):match("[^\0]+")
end

-- 10k runs on 1k data: 1.84s
-- 100k runs on 2Ki data: 34.46s
function Int2FourCCWithoutMatch(int)
    return string.char((int & 0xff000000)>>24, (int & 0x00ff0000)>>16, (int & 0x0000ff00)>>8, int & 0x000000ff)
end

-- 100k runs on 2Ki data: 34.4s
function i2fcwoms(i)
return string.char((i&0xff000000)>>24,(i&0xff0000)>>16,(i&0xff00)>>8,i&0xff)
end

-- 100k runs on 2Ki data: 31.79s
local c = string.char
function i2fcwomss(i)
return c((i&0xff000000)>>24,(i&0xff0000)>>16,(i&0xff00)>>8,i&0xff)
end

-- 10k runs on 1k data: 1.92s
-- 100k runs on 2Ki data: 34.20s
function Int2FourCCPack(int)
    return string.pack(">I4", int)
end

-- 10k runs on 1k data: 3.07s
-- 100k runs on 2Ki data: 57.36s
function Int2FourCCPackMatch(int)
    return string.pack(">I4", int):match("[^\0]+")
end

-- 10k runs on 1k data: 1.56s
-- 100k runs on 2Ki data: 30.23s
function FourCC2IntPack(str)
    return (string.unpack(">I4", str))
end

Lua:
do
    FCC = {}
    local p, u, m, b = string.pack, string.unpack, string.match, string.byte
    function FCC.to(int)
        return m(p(">I4", int), "[^\0]+")
    end
    -- very fast
    function FCC.from4(str)
        return (u(">I4", str))
    end
    function FCC.from(str)
        local n = 0
        local len = #str
        for i = len, 1, -1 do
            n = n + (b(str, i, i) << 8*(len-i))
        end
        return n
    end
end
Though with what @Drake53 said, this could be simplified and optimized to only support codes that are 4-char or 1-char. I don't know and the names are hard too... I tell you, Blizzard started it! They gave the first bad name!
 
Level 18
Joined
Jan 1, 2018
Messages
728
Though with what @Drake53 said, this could be simplified and optimized to only support codes that are 4-char or 1-char. I don't know and the names are hard too... I tell you, Blizzard started it! They gave the first bad name!
If you want the exact same behavior as JASS you'd actually have to support 1, 2, 3, and 4 character codes. As long as they all add up to 4 bytes it's a valid fourCC literal.

I did another test in the world editor to see how Blizzard's transpiler handles this:
JASS:
//! endusercode
function test takes nothing returns nothing
local integer a = 'd'
local integer b = 'abcd'
local integer c = 'd™'
local integer d = '¡¡'
local integer e = '¡aA'
local integer f = '𒀀'
endfunction
//! beginusercode

The result:
Lua:
function test()
    local a = 100
    local b = FourCC("abcd")
    local c = FourCC("cტ")
    local d = FourCC("Á Á¡")
    local e = FourCC("Á¡aA")
    local f = FourCC("ï‘€")
end

Here you can clearly see the difference between a character literal (1 character that must be 1 byte), and a fourCC literal (up to 4 characters that must be 4 bytes).
The character literal is transpiled to a number, while the fourCC literals are transpiled using FourCC.

I did not test if the values are the same (kinda doubt it), but you can see that when you use characters that are more than 1 byte long, it gets transpiled to some seemingly random string that is 4 characters long, but since some of the characters are also multiple bytes, they are longer than 4 bytes, so if you convert it back to JASS (eg 'cტ') it will be invalid.
 
Level 19
Joined
Jan 3, 2022
Messages
320
@Drake53 I meant 1-char == 8-bit (extended) ASCII. Lua operates at a byte level in all string operations unless you go out of your way to work with utf8. As long your Unicodes add up to 32-bits, it'll be a valid FourCC and no changes required in the code above. With the exception that \0 and generally non-printable characters shouldn't be worked with in text mode... Hint: just don't unless you need a "private" namespace that Warcraft won't generate FourCC for (will Warcraft work fine with FourCC outside of A-Z, a-z, 0-9 range?)
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,199
Do 1 character long types exist? Surely Lua can correctly represent these as it does not use null terminated strings? Such as by adding 3 null characters to the 4CC string.

To me this looks like a solution to a problem that does not exist. Blizzard probably purposely choose to avoid validation before parsing to keep overhead low. If you are worried about validation, such as from user input, you can always write your own custom function to "sanitise" the string before parsing it to the 4CC conversion function. Ultimately for constant code you will want to use a hex integer primitive anyway as it will be faster.
 
Last edited:
Level 19
Joined
Jan 3, 2022
Messages
320
@Ricola3D described object ID types here:
Custom object ids is a subset of all ids:
- Fourth byte takes values in [0-9A-Za-Z]
- First 3 bytes take values only in [A-Z0-9]
Ex:
- 'A01Z' is a possible custom id
- 'A01z' is not a possible custom id
\0\0\0_ (3 zero bytes followed by char): 1-byte IDs don't exist afaik, but map protectors/minimizers use these for obfuscation, they're just read as integers. So someone might legitimately want to encode/decode not just 4-byte but also the single byte value (even though you should use string.char/string.byte in that case, if you know about it)
 
Last edited:
Status
Not open for further replies.
Top