• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

JASM - Let's dive into bytecode

Status
Not open for further replies.
Level 3
Joined
May 19, 2010
Messages
35
Now that we have unlimited possibilities thanks to memhack it's time to dwelve a bit deeper.
You may have seen this thread by @leandrotp about compiling directly to bytecode.
I'm really into that idea, because it allows for greatly increased performance and unlocks new features (like allocating memory).

The warcraft bytecode (I call it jasm, jass+asm) is not exactly complex or difficult, it's just that there exists pretty much NO documentation what-so-ever. That makes sense, it's not an official feature of Jass.

To get an insight I have written an assembler/disassembler for jasm. The tool can convert between readable jasm and executable bytecode. Also I ported the disassembler code into jass itself. Now you can look at the bytecode of your functions without getting completly mindfucked.

The code to call bytecode from jass and an documented example are in the attached map.
You can find an overview to the jasm instructions further down in this post.

JASS:
scope CustomBytecode initializer init
    /*
   
    This example shows how to create a footman in bytecode
   
     To run custom bytecode two textmacros can be used. Both are defined in JasmExecuteUtils.
   
     The first textmacro sets up all necessary functions and globals to run custom bytecode.
     It creates 2 functions for us. The name of these function depends on the Parameter to the textmacro.
   
        function ExecuteBytecodeExample takes nothing returns nothing
        function GetBytecodeExampleAsCode takes nothing returns code
   
    ExecuteBytecodeExample() runs the bytecode.
    GetBytecodeExampleAsCode() returns the bytecode as a code variable that can be used with the native functions like ForGroup
   
    This textmacro also defines an array l__BytecodeExample. That array contains the bytecode that should be executed.
    This array can only be used between the call to JasmSetupGlobals and JasmSetupExec
   
    */
    //! runtextmacro JasmSetupGlobals("BytecodeExample")
   
    globals
        private integer ID = 'hfoo'
        unit created_footman = null
    endglobals
   
    // DO NOT TOUCH THIS FUNCTION!
    // it's generated bytecode is used to get the internal ids of
    // the global variables "ID" and "created_footman"
    private function GlobalTable takes nothing returns nothing
        set ID = 0
        set created_footman = null
    endfunction
   
    //! novjass
        We want to create a unit and save it to created_footman in bytecode. It should be the equivalent of
       
        set created_footman = CreateUnit(Player(0), ID, 0.0, 0.0, 270.0)
   
        First we need to get a handle to player 0, the call to Player(0)
       
        literal R0 int 0        // Put literal 0 into register R0
        push R0                 // Push that register on the stack
        callnative &Player      // Call the native function Player()
       
        // The result of the call to Player is in R0
       
        // Because the first parameter of CreateUnit is the player, we directly push R0 on the stack.
        // the stack now contains [player0]
        push R0                 
       
        getvar R0 int &ID       // get the global variable ID and put its value into R0. (The variable id of ID is set in l__BytecodeExample[9])
        push R0                 // push the id on the stack. the stack is now [player0, 'hfoo']
       
       
        literal R0 real 0.0     // put the real 0.0 into R0
        push R0                 // push it twice on the stack for the x and y parameter of CreateUnit
        push R0                 // stack is now [player0, 'hfoo', 0.0, 0.0]
       
        literal R0 real 270.0   // put the real 270.0 into R0 and push it onto the stack
        push R0                 // stack is now [player0, 'hfoo', 0.0, 0.0, 270.0]
       
       
        callnative &CreateUnit  // Call CreateUnit. This native takes its 5 parameters from the stack and puts its return value int R0
       
        // stack is now []
       
        setvar R0 &created_footman  // Write R0 to created_footmap
       
        ret                     // Return from this function. Will crash Warcraft if this is missing.
    //! endnovjass
   
    private function InitJasmArray_l__BytecodeExample takes nothing returns nothing
        set l__BytecodeExample[   0] = 0x0c000400 // literal      R0     int    -      0     
        set l__BytecodeExample[   1] = 0x00000000
        set l__BytecodeExample[   2] = 0x13000000 // push         R0     -      -      -     
        set l__BytecodeExample[   3] = 0x00000000
        set l__BytecodeExample[   4] = 0x15000000 // callnative   -      -      -      &Player
        set l__BytecodeExample[   5] = 0x00000538
        set l__BytecodeExample[   6] = 0x13000000 // push         R0     -      -      -     
        set l__BytecodeExample[   7] = 0x00000000
        set l__BytecodeExample[   8] = 0x0e000400 // getvar       R0     int    -      &ID
        set l__BytecodeExample[   9] = GetGlobalIdCode(function GlobalTable, 0) // get the first variable that is used in the function GlobalTable ("ID")
        set l__BytecodeExample[  10] = 0x13000000 // push         R0     -      -      -     
        set l__BytecodeExample[  11] = 0x00000000
        set l__BytecodeExample[  12] = 0x0c000500 // literal      R0     real   -      0     
        set l__BytecodeExample[  13] = 0x00000000
        set l__BytecodeExample[  14] = 0x13000000 // push         R0     -      -      -     
        set l__BytecodeExample[  15] = 0x00000000
        set l__BytecodeExample[  16] = 0x13000000 // push         R0     -      -      -     
        set l__BytecodeExample[  17] = 0x00000000
        set l__BytecodeExample[  18] = 0x0c000500 // literal      R0     real   -      270   
        set l__BytecodeExample[  19] = 0x43870000
        set l__BytecodeExample[  20] = 0x13000000 // push         R0     -      -      -     
        set l__BytecodeExample[  21] = 0x00000000
        set l__BytecodeExample[  22] = 0x15000000 // callnative   -      -      -      &CreateUnit
        set l__BytecodeExample[  23] = 0x00000415
        set l__BytecodeExample[  24] = 0x11000000 // setvar       R0     -      -      &created_footman
        set l__BytecodeExample[  25] = GetGlobalIdCode(function GlobalTable, 1)
        set l__BytecodeExample[  26] = 0x27000000 // ret          -      -      -      -     
        set l__BytecodeExample[  27] = 0x00000000
    endfunction
   
    /*
        The second textmacro finalizes the generation of the bytecode.
        It also creates one function:
       
            function JasmInitBytecodeExample takes nothing returns nothing
           
        This function must be called before ExecuteBytecodeExample() or GetBytecodeExampleCode() can be used
    */
    //! runtextmacro JasmSetupExec("BytecodeExample")
   
    private function init takes nothing returns nothing
        local trigger trg = CreateTrigger()
       
        // fill array with bytecode
        call InitJasmArray_l__BytecodeExample()
       
        // Setup necessary variables to call bytecode
        call JasmInitBytecodeExample()
       
        // when player(0) presses escape
        call TriggerRegisterPlayerEventEndCinematic(trg, Player(0))
       
        // then execute the bytecode
        call TriggerAddAction(trg, GetBytecodeExampleAsCode())
    endfunction
   
endscope

To use my tool either call jasm.exe -h on the command line or drag and drop a file on the included *.bat files. My map dumps the bytecode of the last printed function into a file "CustomMapData/jasm_dump.txt". This file can be converted to jasm by dragging it onto "preload_to_jasm.bat" (there is also "preload dump example.txt" that you can use to try it out). Dragging a *.jasm file onto "jasm_to_array.bat" converts the jasm to a jass function that fills an integer array with the bytecode. You can then simply copy and paste that code into your map.

Sourcecode of jasm.exe is on github.

It looks like this:
disassemble.jpg

Displaying the code of your function works with a simple one-liner:
vJASS:
    private function WithIf takes nothing returns integer
       if true then
           return 1
       else
           return 2
       endif
   endfunction

<snip>

//! runtextmacro DumpFunction("WithIf")


Arguments that contain only " - " are ignored by the instruction.
Possible types are void, code, int, real, string, handle, bool, int[], real[], string[], hdl[], bool[].
The assembler has a list of all natives and does name lookup. That means &Player is the same as writing F_538.

This table is not complete and may contain errors!

NameIdArg 1Arg 2Arg 3Arg 4DescriptionExample
endprogram0x01 - - - - Used to signal end of parsing.
Ignored by the VM
jmp_deprecated0x02 - - - label Behaves exactly like jmp
func0x03type - - function Used by parser to signal
start of a function.
Ignored by the VM
func void F_109f
endf0x04 - - - - Like func, just for the end endf
local0x05type - - variable Declares a new local variable local int V_10a1
global0x06type - - variable Declares a new global variable global real V_10a2
const0x07type - - variable Like global const handle V_10a3
poparg0x08typeinteger - variable Get a function parameter.
Arg2 is the number of the parameter,
with the rightmost parameter as number 1
poparg boolean 2 V_10a4
cleanstack0x0binteger - - - Remove Arg1 many parameters
from the stack
cleanstack 3
literal0x0cregistertype - integer Set Arg1 to the value of Arg4 literal R5 real 3.1415
mov0x0dregisterregister - - Arg1 = Arg2 mov R0 R5
getvar0x0eregistertype - variable Copy variable Arg4 to register Arg1 getvar
code0x0fregistertype - function code code
getvar[]0x10registerregistertypevariable getvar[] getvar[]
setvar0x11register - - variable Copy register Arg1 to variable Arg4 setvar R83 V_10a0
setvar[]0x12registerregister - variable setvar[] setvar[]
push0x13register - - - Push Arg1 onto the stack push R87
pop0x14register - - - Remove the topmost value
of the stack and put it into Arg1
pop R87
callnative0x15 - - - function Calls a native function.
Parameters of that function first need to be pushed
push R1
callnative &Player
calljass0x16 - - - function Like callnative but with jass functions.
Stack must be cleared afterwards.
push R1
push R2
calljass F_109f
cleanstack 2
i2r0x17register - - - Convert the value
in the register to a float
i2r R85
and0x18registerregisterregister - Arg1 = Arg2 and Arg3
Boolean Operation, not bitwise
and R1 R1 R2
or0x19registerregisterregister - Arg1 = Arg2 or Arg3
Boolean Operation, not bitwise
or R1 R1 R2
eq0x1aregisterregisterregister - If Arg2 and Arg3 are equal
then set Arg1 to 1 else to 0
eq R1 R1 R2
ne0x1bregisterregisterregister - Not equal, compare eq ne R1 R1 R2
le0x1cregisterregisterregister - Less equal, compare eq le R1 R1 R2
ge0x1dregisterregisterregister - Greater equal, compare eq ge R1 R1 R2
lt0x1eregisterregisterregister - Less than, compare eq lt R1 R1 R2
gt0x1fregisterregisterregister - greater than, compare eq gt R1 R1 R2
add0x20registerregisterregister - Arg1 = Arg2 + Arg3 add R89 R89 R88
sub0x21registerregisterregister - Arg1 = Arg2 - Arg3 sub R89 R89 R88
mul0x22registerregisterregister - Arg1 = Arg2 * Arg3 mul R89 R89 R88
div0x23registerregisterregister - Arg1 = Arg2 / Arg3 div R89 R89 R88
mod0x24registerregisterregister - Arg1 = Arg2 modulo Arg3 mod R89 R89 R88
neg0x25register - - - Arg1 = -Arg1 neg R39
not0x26register - - - if Arg1 == 0: Arg1 = 1
else Arg1 = 0
not R22
ret0x27 - - - - Return from the function.
Returnvalue is in R0
ret
label0x28 - - - label Used as a marker for a jump instruction.
Ignored by VM
label L_588
jmpt0x29register - - label Jump if true.
Continue evaluation at label Arg4 if Arg1 is true
jmpt L_588
jmpf0x2aregister - - label Jump if false, compare jmpt jmpf L_588
jmp0x2b - - - label Unconditional jump.
Continue execution at label Arg4
jmp L_588



And as a little bonus, dynamic dispatch:
vJASS:
    function PoorMansJumptable takes nothing returns nothing
       call FunA() // 0
       return
       call FunB() // 1
       return
       call FunC() // 2
       return
       call FunD() // 3
       return
       call FunE() // 4
       return
   endfunction
 
   function EvalJumptable takes code table, integer num returns nothing
       call ForForce(bj_FORCE_PLAYER[0], I2C(C2I(table) + 2 * 8 * num))
   endfunction

   // 5 functions so i is between 0 and 4 inclusive
   function Test takes integer i returns nothing
       call EvalJumptable(function PoorMansJumptable, i)
   endfunction
 

Attachments

  • jasm.zip
    78.2 KB · Views: 240
  • jasm_disassemble.w3x
    81.6 KB · Views: 165
Last edited:
Level 9
Joined
Jul 30, 2012
Messages
156
Please let me be the first here to thank you!

I wanted so much to make this (and much more) by myself, but you did everyone a favor and brought my ideas to the real world.

I've been promising to write the JASS VM documentation for a long time, it's all in my head, but it's so much text that I don't really know what to begin writing.

So I must thank you for not waiting for me to share it. You figured it out just by yourself, and you probably did much better than I ever would.

And while I don't get the documentation fully written, I guess I can help you with some things.

First, I need to write something about the string table. The information found here is not complete, what most people don't know is that string variables are actually HANDLES! There are basically 2 tables: the String table itself, and the string handle table.

Also, the string table holds much more than you think. Every function name and variables used in the map scripts is also present in the string table. But normally those strings won't get a handle assigned, so we can't access them using this.

Finally, the string table takes a key role in jass bytecode. You will see that every JASS opcode that deals with functions or variables takes an id. And this id is actually the string table id of that name! The string table is static and created at compilation time (when map script is translated to bytecode). All strings used in the map (all function and variable names, as well as string literals) are placed into string table at this time, then they get an Id assigned, and this id is then used in JASS bytecode when that name is referenced.

Obviously the string table can grow at runtime, when new strings are generated (with things like Substring or GetObjectName). But as I said, all strings used directly in the map script are inserted into the string table at compilation stage. Then, when the function referencing those strings is actually executed, only a string handle is generated.

If you typecast a string into an integer, what you get is a handle, not the actual string id. There's no easy way to obtain the id for a generic string, but if you want to know the id of a specific name (to run bytecode from array, for example), you can read it directly from a bytecode instruction.

Second, the information on Grimoire's source code is not fully accurate. Here is the true list of all JASS opcodes:
JASS:
struct opcode {
   char arg3
   char arg2
   char arg1
   char OP
   int data
}

enum OPCODES {
   OP_ENDPROGRAM=0x1,       //Signal to the parser that code has ended. Ignored by the VM
 
   OP_JUMP_DEPRECATED=0x2,  //Might have been used in past, now has the same behaviour as OP_JUMP
 
   OP_FUNCTION=0x3,         //Function declaration. Arg1 = return type, Data = function name. Used only
                            //at parsing stage.
 
   OP_ENDFUNCTION=0x4,      //Denotes end of function. Used only at parsing stage, ignored by VM
 
   OP_LOCAL=0x5,            //Create local variable. Arg1 = variable type, Data = Variable name
 
   OP_GLOBAL=0x6,OP_CONSTANT=0x7,   //These two are actually the same! There's no difference between
                                    //globals and constants from VM's point of view. Parameters are
                                    //the same as above.
 
   OP_POPFUNCARG=0x8,       //Create a local var and assign a value to it directly from the caller's
                            //stack. Arg1 = Type, Arg2 = #FuncArg, Data = Variable Name
 
   OP_TYPE=0x9,OP_EXTENDS=0xA,    //Used only by parser, ignored by VM
 
   OP_CLEANSTACK=0xB,       //Pops <Arg1> values from the stack. Used after calling Jass functions,
                            //not needed when calling natives.
 
   OP_LITERAL=0xC,          //Set value of register to a literal. Arg1 = DestReg, Arg2 = DestType,
                            //Data = Literal. If type is J_STRING, a string handle is immediately
                            //created, and that's what the register actually gets.
                         
   OP_MOV=0xD,              //Moves 1 register to another. Curiously it's only used when returning
                            //a value (by moving to R0). Arg1 = DestReg, Arg2 = SrcReg
 
   OP_GETVAR=0xE,           //Read variable. Arg1 = DestReg, Arg2 = DestType, Data = Variable name.
                            //DestType doesn't need to match the variable type.
 
   OP_CODE=0xF,             //Get function address. Arg1 = DestReg, Data = Function name. Value is
                            //stored with type J_CODE.
 
   OP_GETARRAY=0x10,        //Read from array. Arg1 = DestReg, Arg2 = IndexReg, Arg3 = DestType,
                            //Data = Array name. DestType doesn't need to match array type.
 
   OP_SETVAR=0x11,          //Write variable. Arg1 = SrcReg, Data = Variable name. Type of SrcReg MUST
                            //MATCH the type of variable, unless it's J_NULL.
 
   OP_SETARRAY=0x12,        //Write to array. Arg1 = IndexReg, Arg2 = SrcReg, Data = Array name. Type
                            //of SrcReg must match the array type.
 
   OP_PUSH=0x13,            //Pushes register <Arg1> onto the stack
 
   OP_POP=0x14,             //Pops the value from the top of the stack to register <Arg1>. God knows
                            //why Blizzard uses PUSH/POP on math operations.
 
   OP_NATIVE=0x15,          //Call a native. Data = Native name. No arguments,
 
   OP_JASSCALL=0x16,        //Call a JASS function. Same as above.
 
   OP_I2R=0x17,             //Read register <Arg1>, convert it to a float, and store the result in the
                            //same register, with type J_REAL. This is faster than native I2R

   //Boolean operations, Arg1 = DestReg, Arg2 and 3 are source operands.
 
   //These are not bitwise operations, but just boolean, they return either 0 or 1.
   OP_AND = 0x18,
   OP_OR = 0x19,
 
   //Comparison operations, return either 0 or 1
   OP_EQUAL=0x1A,
   OP_NOTEQUAL=0x1B,       // check
   OP_LESSEREQUAL=0x1C,OP_GREATEREQUAL=0x1D,
   OP_LESSER=0x1E,OP_GREATER=0x1F,
 
   //End boolean operations

   OP_ADD=0x20,OP_SUB,OP_MUL,OP_DIV, //Math operations, Arg1 = DestReg, Arg2 and 3 are source operands.
 
   OP_MODULO = 0x24, //Same as above, but this one is deprecated and never emitted by compiler (WHY?)
 
   OP_NEGATE=0x25,   //Negate the value of register <Arg1> and store in the same register. Notice that
                     //writing negative literals in the script will always produce this operation, when
                     //it could just emit a literal already negated.
                         
   OP_NOT = 0x26,    //Boolean NOT. Returns 1 if register <Arg1> is 0, and 0 in all other cases. Result
                     //is stored in the same register.
 
   OP_RETURN=0x27,   //No arguments
 
   OP_LABEL=0x28,    //Defines a jump label. Data = Label Id. Used at parsing stage, ignored by VM
   
   OP_JUMPIFTRUE=0x29,OP_JUMPIFFALSE=0x2A, //Jump to label <Data> depending on contents of register <Arg1>
 
   OP_JUMP=0x2B       //Unconditional jump. Data = Label Id
 
Last edited:
There is also a list of op codes here: YDWE/opcode.h at e4f3b8390acc6779b984d59878b0473d5c0489f3 · actboy168/YDWE · GitHub

There are many things related to the JASS VM there as well.

C++:
    void jass_get_global_variable(lua_State* L, jass::OPCODE_VARIABLE_TYPE opt, uint32_t value)
    {
        switch (opt)
        {
        case jass::OPCODE_VARIABLE_NOTHING:
        case jass::OPCODE_VARIABLE_UNKNOWN:
        case jass::OPCODE_VARIABLE_NULL:
            lua_pushnil(L);
            break;
        case jass::OPCODE_VARIABLE_CODE:
            jassbind::push_code(L, value);
            break;
        case jass::OPCODE_VARIABLE_INTEGER:
            jassbind::push_integer(L, value);
            break;
        case jass::OPCODE_VARIABLE_REAL:
            jassbind::push_real(L, value);
            break;
        case jass::OPCODE_VARIABLE_STRING:
            jassbind::push_string(L, get_jass_vm()->string_table->get(value));
            break;
        case jass::OPCODE_VARIABLE_HANDLE:
            jassbind::push_handle(L, value);
            break;
        case jass::OPCODE_VARIABLE_BOOLEAN:
            jassbind::push_boolean(L, value);
            break;
        default:
            lua_pushnil(L);
            break;
        }
    }
}
 
Last edited:
Level 5
Joined
Jun 16, 2004
Messages
108
OP_POP=0x14, //Pops the value from the top of the stack to register <Arg1>. God knows
//why Blizzard uses PUSH/POP on math operations.

They probably could deal with it differently, or optimize out a lot of those types of pointless push/pop operations if they did an optimization pass, but I think those are to deal with cases like this:
JASS:
function bigmath takes nothing returns integer
    local integer big = 1
    set big = big * big + big / big - big + big + big + big + big - big - big - big - big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big + big
    set big = 1
    return big
endfunction

function watchout takes nothing returns nothing
    local integer whoa = 55 + bigmath()
    call BJDebugMsg("bigmath was: " + I2S(whoa))
endfunction

Though not quite as contrived as this. Really all it takes is some random function using the same register. If everything else was left the same, but you took out the push/pops, then the 55 would be lost in this assignment.
 
Level 3
Joined
May 19, 2010
Messages
35
I attached my jasm (dis)assembler tool to the first post and also added a bit documentation about the bytecode.

Also @leandrotp can you explain in a bit more detail how you got access to the memory? And especially how you managed to execute the bytecode you wrote in your Memory lib. I already read your post here but it's still arcane magic for me.

Also is it possible to add new labels or functions to the stringtable? Because if we can't we are at a dead end with the compiling plans.

And do you know of it's possible to change the bytecode of an existing function? That would allow for a very simple interface between bytecode and (v)jass because we can just create a stub function and replace its code with our bytecode.
 
Last edited:
Level 9
Joined
Jul 30, 2012
Messages
156
Also @leandrotp can you explain in a bit more detail how you got access to the memory? And especially how you managed to execute the bytecode you wrote in your Memory lib. I already read your post here but it's still arcane magic for me.

Of course! Well, I still have plans to write a vJass library to ease the process of executing bytecode from arrays, but while I don't make it, I'll explain how to do it manually.

Step 1 - Get address of Array Struct

Basically JASS arrays are regular global variables that hold a pointer a special struct:
JASS:
struct JassArray<T>
{
    void * vTable
    unsigned int maxSize
    unsigned int currentSize
    T * pData
}
The memory address of the array data is stored in the pData member. But to read it, we first need to get the address of the JassArray struct from the array variable. To do that, we just need to typecast the array variable into an integer!

There are 2 ways to do that. The permanent and the temporary typecast.

JASS:
globals
    integer Bytecode //Used just to fool Jasshelper
    integer array l__Bytecode
    integer StructAddress
endglobals

function InitBytecode takes nothing returns nothing
    set l__Bytecode[0] = <my>
    set l__Bytecode[1] = <bytecode>
    set l__Bytecode[2] = <here>
    ...
endfunction

function Typecast takes nothing returns nothing
   local integer Bytecode //Jasshelper will implicitly rename this to l__Bytecode
endfunction

//# +nosemanticerror
function GetStructAddress takes nothing returns nothing
   set StructAddress = l__Bytecode
   return
endfunction

function init takes nothing returns nothing
   call InitBytecode()
   call GetStructAddress()
endfunction
As I have previously explained, typecasting is based on the fact that you can declare a local variable with the same name of a global, and it will change the type of the global. So we can declare a local integer named l__Bytecode, and it will cause the global array l__Bytecode to become an integer. Reading that integer will then return the address of the JassArray struct for that array.

This method is called permanent because all code that comes after the Typecast function will treat the array as an integer! Which means that it can no longer be used as an array, and because of that, you must initialize the array in a function that comes before the Typecast function.

This is fine for most cases, but if for some reason you want to access the array from other places in your code, there's an alternative below.
JASS:
globals
    integer Bytecode //Used to fool Jasshelper
    integer array l__Bytecode
    integer StructAddress
endglobals

function InitBytecode takes nothing returns nothing
    set l__Bytecode[0] = <my>
    set l__Bytecode[1] = <bytecode>
    set l__Bytecode[2] = <here>
    ...
endfunction

//Jasshelper will rename the function argument to l__Bytecode
function GetStructAddress takes integer Bytecode returns nothing
   set StructAddress = Bytecode //l__Bytecode
endfunction

//# +nosemanticerror
function init takes nothing returns nothing
   call InitBytecode()
   call ForForce(bj_FORCE_PLAYER[0], I2C(8+C2I(function GetStructAddress)))
endfunction
This method makes use of a function argument instead of a local variable to make the typecast. Because of that, the typecast only works within the scope of that function - all other code will still treat variable l__Bytecode as an integer array.

However, we normally can't use functions that take arguments as code. This is because the first instruction of those functions is always a poparg instruction, which crashes the game if there are no arguments. So we need to use I2C and C2I to jump over that instruction, by skipping 8 bytes from the beginning of the function.

Doing that, the function will store the address of the JassArray struct into variable StructAddress, and the array will still be usable from the rest of your code.

Step 2 - Getting the address of data and execute

Once we have the address of the struct we just need to read the pData member:

set ArrayAddress = ReadMemory(StructAddress + 12) or set ArrayAddress = Memory[StructAddress/4 + 3]

Notice that when you use the Memory array, all addresses must be divided by 4. So the function ReadMemory is provided for convenience, if you don't want to worry about that. I like to use the array because it's faster, but if the script optimization of Jasshelper is turned on, the function call will be inlined, so there's no difference.

After getting the array address we can easily execute it:
JASS:
set BytecodeTrigger = CreateTrigger()
call TriggerAddCondition(BytecodeTrigger, Condition(I2C(ArrayAddress)))
call TriggerEvaluate(BytecodeTrigger)
Or you can just use ForForce(bj_FORCE_PLAYER[0], I2C(ArrayAddress)) if you want, it's better if you're going to run it only once.

Step 3 - Dealing with saved games

When a saved game is loaded, pretty much everything is located in different places than before. This is not a problem for normal Jass programming, but when we use bytecode we are working directly with memory, and this means that our bytecode array will certainly be located in a different memory address now.

To deal with this we need to register a trigger for EVENT_GAME_LOADED to retrieve the location of Bytecode and update our trigger.

JASS:
function OnGameLoaded takes nothing returns boolean
   call GetStructAddress()
   set ArrayAddress = ReadMemory(StructAddress + 12)
   call TriggerClearConditions(BytecodeTrigger)
   call TriggerAddCondition(BytecodeTrigger, Condition(I2C(ArrayAddress)))
   return false
endfunction

function init takes nothing returns nothing
   local trigger t = CreateTrigger()
   call TriggerRegisterGameEvent(t, EVENT_GAME_LOADED)
   call TriggerAddCondition(t, Condition(function OnGameLoaded))
endfunction
Also is it possible to add new labels or functions to the stringtable? Because if we can't we are at a dead end with the compiling plans.

And do you know of it's possible to change the bytecode of an existing function? That would allow for a very simple interface between bytecode and (v)jass because we can just create a stub function and replace its code with our bytecode.

Since Memory hack was fixed, there's no way to modify those tables. Also it's not possible to modify the bytecode of an existing function after 1.27b, because now we have just read-only access to memory.

However it's possible to produce a fake Jump table in a jass array, and create new jump labels to use with the jump instructions. This could possibly allow to emulate function calls using JMP instead of a regular call, and we could also pass parameters in registers for better performance.

But still there's no way for normal JASS code to call bytecode other than using natives. A good practice for using bytecode is to pass it directly to Trigger conditions and timers, since it can call other JASS code normally, but not the opposite way.
 
Level 3
Joined
May 19, 2010
Messages
35
How would a "Bytecode compiler" setup/create/modify the string table (not the string handle table) such that the instructions (OP_GETARRAY, OP_SETARRAY, etc.) that require a variable/function name (i.e an id from the string table) would have something to work with?
For local variables we can just use already existing variables in the string table. I haven't done extensive testing but I did not notice any problems. To create new entries for global variables it should be possible to create a new entry by calling a string function, e.g I2S(some_id_here). If it's used only in bytecode the name doesn't matter. For functions the plan is to use only jumps.

However it's possible to produce a fake Jump table in a jass array, and create new jump labels to use with the jump instructions.
Can you explain a bit how that is possible?

Also I found a way to read the string table:
JASS:
library StringFromId initializer init uses JasmCore
   globals
       integer jasm  // Not used, it's here just to fool Jasshelper
       integer array l__jasm
        integer jasm_address
        private integer offset = 1
        private string ret_string
        private code bytecode
   endglobals
      
    private function GetGlobalIds takes nothing returns nothing
        set ret_string = "" // start + 2 instr
    endfunction

   function InitJasmArray_l__jasm takes integer ret_id returns nothing
        set l__jasm[   0] = 0x0c010600 // literal      R1     string -    
        set l__jasm[   1] = ret_id
        set l__jasm[   2] = 0x11010000 // setvar       R1     -      -    
        set l__jasm[   3] = ret_id
        set l__jasm[   4] = 0x27000000 // ret          -      -      - 
        set l__jasm[   5] = 0x00000000
    endfunction
  
    private function GetBytecodeAddress takes nothing returns integer
        return Memory[jasm_address/4+3]
    endfunction
  
    function ReadStringFromId takes integer id returns string
        set l__jasm[offset] = id
        call ForForce(bj_FORCE_PLAYER[0], bytecode)
        return ret_string
    endfunction
  
    function GetFunctionName takes code c returns string
        return ReadStringFromId(GetJasmInstrB(GetJasmCodeAddr(c),0))
    endfunction
  
   private function Typecast takes nothing returns nothing
       local integer jasm
   endfunction

    //# +nosemanticerror
    private function GetJasmAddress takes nothing returns nothing
        set jasm_address = l__jasm
        return
    endfunction

   private function init takes nothing returns nothing
        local integer globidfun = GetJasmCodeAddr(function GetGlobalIds)
        local integer ret_str_id = GetJasmInstrB(globidfun, 2)
        call InitJasmArray_l__jasm(ret_str_id)
        call GetJasmAddress()
        set bytecode = I2C(GetBytecodeAddress())
   endfunction

endlibrary

And the JasmCore lib:
JASS:
library JasmCore uses Memory
   function GetJasmCodeAddr takes code c returns integer
       // function address is first bytecode instruction of function
       // we start at the previous instruction to get the function declaration
       return C2I(c) - 8
   endfunction

   function GetJasmInstrA takes integer addr, integer instruction returns integer
       return Memory[addr/4 + 2*instruction]
   endfunction

   function GetJasmInstrB takes integer addr, integer instruction returns integer
       return Memory[addr/4 + 2*instruction + 1]
   endfunction
endlibrary
 
Level 13
Joined
Nov 7, 2014
Messages
571
To create new entries for global variables it should be possible to create a new entry by calling a string function, e.g I2S(some_id_here).
I2S(<some-integer>) would add a new entry (unless the string was already in the table?) but how would you obtain the string's id (maybe it's obvious, I just don't understand much about bytecode).

Also I found a way to read the string table:
"Now you're thinking with bytecode" =)...
 
Level 3
Joined
May 19, 2010
Messages
35
I only know a very hacky way to get the id. These ids are all sequential, so when we know the biggest id and make the call to I2S(<something>) (with <something> as a string that is not in the table) then the new id is just <biggest id> + 1. Maybe @leandrotp knows a better way, he is the memory expert:thumbs_up:

I also updated the attached map and my first post with an documented bytecode example.
 
Level 13
Joined
Nov 7, 2014
Messages
571
Not sure why but the "Temporary Typecast" crashes for me with patch 1.28.1.
It seems to crash while trying to execute the instructions from set StructAddress = Bytecode //l__Bytecode,
after jumping/skipping the popfuncarg instruction.

I only know a very hacky way to get the id. These ids are all sequential, so when we know the biggest id and make the call to I2S(<something>) (with <something> as a string that is not in the table) then the new id is just <biggest id> + 1.
Using your StringFromId I dumped the string table (attached .txt file) and it seems the entries in there are whatever names (function, parameter, global/local variables) and string literals the parser has encountered while parsing (in parsing order).
So I suppose a "Bytecode compiler" would have to use custom common.j and blizzard.j files (as leandrotop has described here) in order to know the id of a newly created string which could be used for allocating a new global array.


PS: My "hello-bytecode" script (which doesn't print "hello-world" =)...)
JASS:
library Foo initializer init uses Typecast, Memory

globals
    integer result_integer
    integer result_integer_id
endglobals
function result_integer_get_id takes nothing returns nothing
    set result_integer = 0
    set result_integer_id = Memory[C2I(function result_integer_get_id)/4 + 3]
endfunction

globals
    // it seems we can't declare these to be private nor public
    // because the variable names won't match with that of the
    // "local integer instructions"
    //
    // i.e we have to come up with different variable names each time
    // we want to execute different instructions =)?
    //
    integer instructions
    integer array l__instructions
endglobals

private function instructions_init takes nothing returns nothing
    call result_integer_get_id()
    set l__instructions[0] = 0x0C010400
    set l__instructions[1] = 1234
    set l__instructions[2] = 0x11010000
    set l__instructions[3] = result_integer_id
    set l__instructions[4] = 0x27000000
    set l__instructions[5] = 0x00000000
endfunction

private function typecast_instructions takes nothing returns nothing
    local integer instructions
endfunction

//# +nosemanticerror
private function instructions_execute takes nothing returns nothing
    call instructions_init()
    call ForForce(bj_FORCE_PLAYER[0], I2C(Memory[l__instructions/4 + 3]))
endfunction

private function init takes nothing returns nothing
    call instructions_execute()
    call BJDebugMsg("result_integer: " + I2S(result_integer))
endfunction

endlibrary
 

Attachments

  • example-string-table-dump.txt
    122.4 KB · Views: 123
Level 3
Joined
May 19, 2010
Messages
35
Not sure why but the "Temporary Typecast" crashes for me with patch 1.28.1.
Yeah I also didn't get it to work.

I have not tested it, but I think the names of new globals can be arbitrary, as long we don't have any collisions with existing global/local variables. So I think we could get a "good enough" start index in the string table if we insert a function at the end of war3map.j which contains an unique identifier. Then the start in the stringtable would be the id of that string +X.

This is completely untested, but something like this may work:
JASS:
function calculate_free_string_id takes nothing returns integer
    call ExecuteFunc("calculate_free_string_id")
    return free_id
endfunction

// at the end of war3map.j
function <some_unique_id> takes nothing returns nothing
endfunction

function calculate_free_string_id takes nothing returns nothing
    set free_id = Memory[C2I(function <some_unique_id>)/4 -1] // free_id now contains the string table id of <some_unique_id>
    set free_id = free_id + 2
endfunction

PS: My "hello-bytecode" script (which doesn't print "hello-world" =)...)
Nice. Now we just need to find some applications for bytecode that are actually useful :wink:

Edit: Found something more or less useful :grin:
It's just a proof of concept, but with bytecode we can emulate closures. Funktions that take a bit of state with them. It can be used to attach data to a timer, a ForGroup call, a TriggerCondition, etc

JASS:
    //# +nosemanticerror
    private function DoSomething takes integer i, unit u returns nothing
        local code closure = create_closure(ModuloInteger(i*i+1, 255), u, function DoSomething)
        call SetUnitVertexColor(u, i, i, i, 255)
        call TimerStart(GetExpiredTimer(), 0.5, false, closure)
    endfunction
   
    //# +nosemanticerror
    private function Example takes nothing returns nothing
        local unit u = CreateUnit(Player(0), 'hfoo', 0, 0, 270)
        call TimerStart(CreateTimer(), 5, false, create_closure(42, u, function DoSomething))
    endfunction

JASS:
scope JumpTest initializer init

//! runtextmacro JasmSetupGlobals("JasmTimerAttach")

    globals
        private integer array free_stack
        private integer free_stack_ptr = 0
        private integer bytecode_free_ptr = 0
    endglobals

   
    private function add_handle_parameter takes integer offset, handle h returns integer
        set l__JasmTimerAttach[offset  ] = 0x0C010700 // literal R1 handle
        set l__JasmTimerAttach[offset+1] = GetHandleId(h)
        set l__JasmTimerAttach[offset+2] = 0x13010000 // push R1
        set l__JasmTimerAttach[offset+3] = 0x00000000
        return offset + 4
    endfunction
   
    private function add_integer_parameter takes integer offset, integer i returns integer
        set l__JasmTimerAttach[offset] = 0x0C010400 // literal R1 int
        set l__JasmTimerAttach[offset+1] = i
        set l__JasmTimerAttach[offset+2] = 0x13010000 // push R1
        set l__JasmTimerAttach[offset+3] = 0x00000000
        return offset + 4
    endfunction
   
    private function add_calljass takes integer offset, code fun returns integer
        set l__JasmTimerAttach[offset  ] = 0x16000000 // calljass
        set l__JasmTimerAttach[offset+1] = GetJasmFunctionId(fun)
        return offset + 2
    endfunction
   
    private function add_ret takes integer offset returns integer
        set l__JasmTimerAttach[offset  ] = 0x27000000 // ret
        set l__JasmTimerAttach[offset+1] = 0x00000000
        return offset + 2
    endfunction   
   
    private function add_cleanstack takes integer offset, integer size returns integer       
        set size = Bitwise.shiftl(size, 16)
       
        set l__JasmTimerAttach[offset  ] = Bitwise.OR32(0x0b000000, size) // cleanstack
        set l__JasmTimerAttach[offset+1] = 0x00000000
        return offset + 2
    endfunction
   
    private function cleanup takes integer offset returns nothing
        set free_stack[free_stack_ptr] = offset
        set free_stack_ptr = free_stack_ptr + 1
    endfunction
   
    //# +nosemanticerror
    private function create_closure takes integer i, handle h, code fun returns code
        local integer offset
        local integer start
        local boolean update_ptr = false
        local integer size
       
        if free_stack_ptr > 0 then
            set free_stack_ptr = free_stack_ptr - 1
            set offset = free_stack[free_stack_ptr]
        else
            set offset = bytecode_free_ptr
            set update_ptr = true
        endif
       
        set start = offset
       
        // register parameters
        set offset = add_integer_parameter(offset, i)
        set offset = add_handle_parameter(offset, h)
       
        // call original function
        set offset = add_calljass(offset,fun)
        set offset = add_cleanstack(offset, 2)
       
        // add cleanup code
        set offset = add_integer_parameter(offset, start)
        set offset = add_calljass(offset,function cleanup)
        set offset = add_cleanstack(offset, 1)
       
        set offset = add_ret(offset)
       
        if update_ptr then
            set bytecode_free_ptr = offset
        endif
       
        set size = offset - start
       
        return I2C(GetJasmTimerAttachBytecodeAddress() + start*4)
    endfunction
   
    private function InitJasmArray_l__JasmTimerAttach takes nothing returns nothing
        set l__JasmTimerAttach[  0] = 0x27000000 // ret          -      -      -      -     
        // Jass arrays are not allocated all at once
        // we access the last index to force it to allocate the full memory
        set l__JasmTimerAttach[8191] = 0x00000000
    endfunction

//! runtextmacro JasmSetupExec("JasmTimerAttach")

    //# +nosemanticerror
    private function DoSomething takes integer i, unit u returns nothing
        local code closure = create_closure(ModuloInteger(i*i+1, 255), u, function DoSomething)
        call SetUnitVertexColor(u, i, i, i, 255)
        call TimerStart(GetExpiredTimer(), 0.5, false, closure)
    endfunction
   
    //# +nosemanticerror
    private function Example takes nothing returns nothing
        local unit u = CreateUnit(Player(0), 'hfoo', 0, 0, 270)
        call TimerStart(CreateTimer(), 5, false, create_closure(42, u, function DoSomething))
    endfunction
   
    private function init takes nothing returns nothing
        call InitJasmArray_l__JasmTimerAttach()
        call JasmInitJasmTimerAttach()
        call Example()
    endfunction

endscope
 
Last edited:
Level 13
Joined
Nov 7, 2014
Messages
571
Found a silly way to reference strings in bytecode and wrote the bytecode version of "hello world" =):
JASS:
library Foo initializer bc_0001_execute uses Typecast, Memory, stringtableidfromhandle

globals
    integer bc_0001
    integer array l__bc_0001
    integer bc_0001_addr
    integer bc_0001_offset = -1
endglobals

private function X takes nothing returns integer
    set bc_0001_offset = bc_0001_offset + 1
    return bc_0001_offset
endfunction

globals
    integer func_print_stid
endglobals
private function print takes string s returns nothing
    call BJDebugMsg(s)
endfunction

private function bc_0001_init takes nothing returns nothing
    set l__bc_0001[X()] = 0x0C010600
    set l__bc_0001[X()] = stid_from_handle("hello world =)")
    set l__bc_0001[X()] = 0x13010000
    set l__bc_0001[X()] = 0x00000000
    set l__bc_0001[X()] = 0x16000000
    set l__bc_0001[X()] = func_print_stid
    set l__bc_0001[X()] = 0x0B010000
    set l__bc_0001[X()] = 0x00000000
    set l__bc_0001[X()] = 0x27000000
    set l__bc_0001[X()] = 0x00000000
endfunction

private function bc_0001_allocate takes nothing returns nothing
    set l__bc_0001[8190] = 0
endfunction
private function bc_0001_typecast takes nothing returns nothing
    local integer bc_0001
endfunction

//# +nosemanticerror
private function bc_0001_init_vars takes nothing returns nothing
    call bc_0001_allocate()
    set func_print_stid = Memory[C2I(function print)/4 - 1]
    set bc_0001_addr = Memory[l__bc_0001/4 + 3]
endfunction

private function bc_0001_execute takes nothing returns nothing
    call bc_0001_init_vars()
    call bc_0001_init()
    call ForForce(bj_FORCE_PLAYER[0], I2C(bc_0001_addr))
endfunction

endlibrary

JASS:
library stringtableidfromhandle initializer init uses Typecast, StringFromId

globals
    private hashtable cache = InitHashtable()
    private integer offset // offset in the string table from which we start the search
    private string ef_s
    private integer ef_result
endglobals
private function init takes nothing returns nothing
    local string s
    local integer i

    set s = I2SH(1)
    if s == "" then
        set s = I2SH(2)
    endif

    set i = 1
    loop
        exitwhen ReadStringFromId(i) == s
        set i = i + 1
    endloop

    set offset = i
endfunction

private function stid_from_handle_exec takes nothing returns nothing
    local string s = ef_s
    local integer i

    set i = offset
    loop
        if ReadStringFromId(i) == s then
            set ef_result = i
            return
        endif
        set i = i - 1
        if i == 0 then
            exitwhen true
        endif
    endloop

    set i = offset + 1
    loop
        if ReadStringFromId(i) == s then
            set ef_result = i
            return
        endif
        set i = i + 1
    endloop

    // unreachable
endfunction

function stid_from_handle takes string s returns integer
    local integer sh = StringHash(s)
    local integer i = LoadInteger(cache, 0, sh)
    if i != 0 then
        return i
    endif

    set ef_s = s
    call ExecuteFunc(SCOPE_PRIVATE + "stid_from_handle_exec")
    call SaveInteger(cache, 0, sh, ef_result)
    return ef_result
endfunction

endlibrary
 
Level 13
Joined
Nov 7, 2014
Messages
571
That looks like a linear search in the table, right?

What's up with this part?

Yes it is linear search, starting from the string-handle-table's first non-empty
entry's string-table-id which should be ~= 4K (assuming default common.j and blizzard.j).
Starting from there its goes backwards in the string-table until it goes to 0 and then it searches
forward, when it finds the string's string-table-id it caches it. It doesn't need to be fast, I use it
for testing stuff out. Its kind of the opposite of the ReadStringFromId function =).

It would be nice to know how the warcraft engine handles the lookup. Then we could mimic that, it's faster than searching.
Yes, I guess.

After tinkering around, it seems that registers can be of type 0x02 ("any"):
JASS:
    set l__bc_0001[X()] = 0x0C010200 // R[1].any = 1
    set l__bc_0001[X()] = 0x00000001

    set l__bc_0001[X()] = 0x11010000 // var[result_integer] = R[1]
    set l__bc_0001[X()] = stid_from_handle("result_integer")

    set l__bc_0001[X()] = 0x11010000  // var[result_real] = R[1]
    set l__bc_0001[X()] = stid_from_handle("result_real")

    set l__bc_0001[X()] = 0x11010000  // var[result_string] = R[1]
    set l__bc_0001[X()] = stid_from_handle("result_string")

    set l__bc_0001[X()] = 0x11010000  // var[result_bool] = R[1]
    set l__bc_0001[X()] = stid_from_handle("result_bool")

    set l__bc_0001[X()] = 0x27000000
    set l__bc_0001[X()] = 0x00000000

...

call BJDebugMsg(I2S(result_integer)) // 1
call BJDebugMsg(R2S(result_real)) // 0.00000000... (0x00000001 is a very tiny real)
call BJDebugMsg(result_string) // "" == StringHandleTable[1]
call BJDebugMsg(bool2str(result_bool)) // "true"

Its doesn't seem that useful because one can typecast a register by assigning its value to
a global variable of the wanted type and the read from it with the 0x0E (getvar) instruction
which doesn't type check:
JASS:
    local integer ri = stid_from_handle("result_integer")
    set l__bc_0001[X()] = 0x0C010400 // R[1].integer = 0x3F800000
    set l__bc_0001[X()] = 0x3F800000
    set l__bc_0001[X()] = 0x11010000 // var[result_integer] = R[1]
    set l__bc_0001[X()] = ri
    set l__bc_0001[X()] = 0x0E010500 // R[1].real = var[result_integer]; R[1] changed its type from 0x04 (integer) to 0x05 (real)
    set l__bc_0001[X()] = ri
    set l__bc_0001[X()] = 0x11010000 // var[result_real] = R[1] = 1.0
    set l__bc_0001[X()] = stid_from_handle("result_real")


Also is it possible to add new labels or functions to the stringtable? Because if we can't we are at a dead end with the compiling plans.
I see what you mean now, and I agree.

The jump instructions (0x02, 0x29, 0x2A, 0x2B) work with label ids which seem to live in a table
but the table can't be modified (unless one has write access I suppose).
 
Level 9
Joined
Jul 30, 2012
Messages
156
That looks like a linear search in the table, right?

It would be nice to know how the warcraft engine handles the lookup. Then we could mimic that, it's faster than searching.
The game doesn't do any lookup. It doesn't need to do it. String Ids are only used by bytecode, and bytecode is only generated at the compilation stage. As the map script is read by the compiler, strings are inserted into the table, and the returned id goes to bytecode. There's no mechanism to retrieve the id of a specific string except by linear search.

But as you know, these ids are sequential, so in theory you can determine the id of all strings in the script even before the game runs. You can also obtain the ids by reading the bytecode of a function, as you have all being doing now. But for strings generated at runtime, no way other than searching.

Maybe someday we could have a tool like Jasshelper with a macro that returns the string id of a name at compile time. This could help a lot to work with bytecode.

Not sure why but the "Temporary Typecast" crashes for me with patch 1.28.1.
It seems to crash while trying to execute the instructions from set StructAddress = Bytecode //l__Bytecode,
after jumping/skipping the popfuncarg instruction.
It's strange, I swear I remember to have it working in the past, but now it's crashing for me too. I'll remove that part from post for now.

So if you really need an alternative for "Permanent Typecast" the only way I can think of is to use bytecode itself to read an array as integer. Obviously this requires you to first have a working bytecode array using the permanent method, then you can use that array to run some code and obtain the struct addresses of other arrays, through the GETVAR instruction.

Also, just a tip on your "hello world" snippet: you don't need to make a wrapper for BJDebugMsg, you can call it directly from bytecode. It's even easier because you already know its id, you dumped the string table yourself, and the strings from common.j and blizzard.j will always have those ids, so it's easier to use them when possible.

Another tip: you can use bj_forLoopAIndex and bj_forLoopBIndex to transfer data between the JASS world and bytecode. You even have SetForLoopIndexA and SetForLoopIndexB as wrappers, which can also be very useful. Since function arguments are not typesafe, you can PUSH a value of any type into these functions.

I have not tested it, but I think the names of new globals can be arbitrary, as long we don't have any collisions with existing global/local variables. So I think we could get a "good enough" start index in the string table if we insert a function at the end of war3map.j which contains an unique identifier. Then the start in the stringtable would be the id of that string +X.
Yes, the variable table, just like the function and native tables, is a hashtable internally. The names can be arbirtrary, then can contain spaces and special characters, and can even conflict with already existing variables! But if you do that, the already existing variable will be lost forever, it will be like a leak, since it's still in memory, but there will be no way to access it.

My memory library takes advantage of this to unlock mem-reading: basically my bytecode declares a new global called "Memory", which overrides the already existing global array with this name. This way I can point it to any address, and when regular jass code tries to use the Memory[] array, it will actually read the new global and use that address as an Array Struct. In that struct the size of array is a very big number, and pData is 0 - this allows access to the entire address space of the process.

But you can't write to that array because the newly declared global is not a true array, but just an integer. So the type-checking fails when you try to write. Before patch 1.27b, I didn't need to declare a new global, as I could tamper the already existing array directly. But this has been fixed for good.


Edit: Found something more or less useful :grin:
It's just a proof of concept, but with bytecode we can emulate closures. Funktions that take a bit of state with them. It can be used to attach data to a timer, a ForGroup call, a TriggerCondition, etc

Yes, that's one of my ideas, though it would require a good system to allocate memory for the instructions and recycle it later when the closure is no longer needed. But it's definitely a possibility

After tinkering around, it seems that registers can be of type 0x02 ("any"):

This type is only used when storing a null value into a handle-type variable. It's used to signal the VM that the value is not reference-counted. When you write a value of type 0x07 to a variable, the VM increases its ref count, and if the previously stored value also had type 0x07, its ref count is decreased too. But none of this happens for values of type 0x02.

The jump instructions (0x02, 0x29, 0x2A, 0x2B) work with label ids which seem to live in a table but the table can't be modified (unless one has write access I suppose).

You can't modify it but you can overflow it. This table is linear and the VM has no bounds checking when accessing it. So we can produce a table of our own, using a normal Jass array, then calculate the difference between the array address and the start of the original table, and use that offset as a label. I'm currently writing the API for this.
 
Last edited:
Level 13
Joined
Nov 7, 2014
Messages
571
So if you really need an alternative for "Permanent Typecast" the only way I can think of is to use bytecode itself to read an array as integer. Obviously this requires you to first have a working bytecode array using the permanent method, then you can use that array to run some code and obtain the struct addresses of other arrays, through the GETVAR instruction.

I guess it kind of saves typing (and the bc array can be private too) =)
JASS:
library Foo initializer my_bc_execute uses Memory, bc

globals
    private integer array my_bc
    private integer offset = -1
    private integer result_integer
endglobals

private function X takes nothing returns integer
    set offset = offset + 1
    return offset
endfunction

private function bc_init takes nothing returns nothing
    set my_bc[X()] = 0x0C010400
    set my_bc[X()] = 69105
    set my_bc[X()] = 0x11010000
    set my_bc[X()] = stid_from_handle(SCOPE_PRIVATE + "result_integer")
    set my_bc[X()] = 0x27000000
    set my_bc[X()] = 0x00000000
endfunction

private function my_bc_execute takes nothing returns nothing
    local integer array_struct
    call bc_init()
    set array_struct = get_array_struct_from_name(SCOPE_PRIVATE + "my_bc")
    call ForForce(bj_FORCE_PLAYER[0], I2C(Memory[array_struct/4 + 3]))
    call BJDebugMsg("result_integer: " + I2S(result_integer))
endfunction

endlibrary

JASS:
library bc initializer init uses Memory, stringtableidfromhandle

globals
    integer bc_0001
    integer array l__bc_0001
    private code bc_0001_addr

    private integer result_stid
    private integer result
endglobals

private function bc_0001_init takes integer array_name_stid returns nothing
    set l__bc_0001[0] = 0x0E010400
    set l__bc_0001[1] = array_name_stid

    set l__bc_0001[2] = 0x11010000
    set l__bc_0001[3] = result_stid

    set l__bc_0001[4] = 0x27000000
    set l__bc_0001[5] = 0x00000000
endfunction

private function bc_0001_allocate takes nothing returns nothing
    local integer no_vjass_inline
    set l__bc_0001[7] = 0
endfunction
private function bc_0001_typecast takes nothing returns nothing
    local integer bc_0001
endfunction

//# +nosemanticerror
private function init takes nothing returns nothing
    call bc_0001_allocate()
    set bc_0001_addr = I2C(Memory[l__bc_0001/4 + 3])
    set result_stid = stid_from_handle(SCOPE_PRIVATE + "result")
endfunction

function get_array_struct_from_name takes string array_name returns integer
    call bc_0001_init(stid_from_handle(array_name))
    call ForForce(bj_FORCE_PLAYER[0], bc_0001_addr)
    return result
endfunction

function get_array_struct_from_stid takes integer array_name_stid returns integer
    call bc_0001_init(array_name_stid)
    call ForForce(bj_FORCE_PLAYER[0], bc_0001_addr)
    return result
endfunction

endlibrary
 
Level 3
Joined
May 19, 2010
Messages
35
Looks like the parameters are the names of the types. I printed a few string IDs and got this:
Unbenannt.png
0x2 is the string agent. So the lines -4189/-4188 could be type hashtable extends agent. Thats a bit backwards to what I said above, but I think thats a quirk of the parser. So the meaning of 0x09 and 0x0A is probably exactly the reverse of what I said.

Edit: I improved my pretty printer, here is a part of common.j:
In contrast to jass in the bytecode the extends preceeds the corresponging type definition.
Unbenannt.png
 
Last edited:
Level 9
Joined
Jul 30, 2012
Messages
156
I have started developing an automatic detection mechanism for version-specific addresses. The goal is to make it in a way that doesn't rely on manually typed offsets, but instead uses some leaked information and stack searching to find them. I have confirmed that it is possible to detect both the Jass Context (via stack searching) and the main game class (via information leaked from natives). I have posted the experimental version on my thread, currently it only detects Jass Context (which is already enough to solve the jump table problem), but the main game class will be detected in the future.

Here I'm attaching a Testmap to demonstrate this. I have tested it on versions 1.26, 1.27a, 1.27b and 1.28.0 under Windows, but it's expected to work on all versions of WC3 ever released, on both Windows and Mac. It will also probably work on future versions if Blizzard doesn't change too much stuff. Please test it and tell me if it works, I'd like someone to test it on a Mac as the detection is slightly different on that platform.
 

Attachments

  • AddressDetection.w3m
    25.4 KB · Views: 89
Level 9
Joined
Jul 30, 2012
Messages
156
So has this been patched? That'd be RIP but pretty lulsy.
No it wasn't patched, that was just a small mistake of my part.

I have some bad news, this doesn't work with version 1.28.5. Warcraft crashes while loading the map.
I changed the test map, please test it again now. I tested here under 1.28.5 and it worked, if it doesn't work for you, that means it's not fully stable yet.
 
Level 9
Joined
Jul 30, 2012
Messages
156
OK, I did a bit more testing. Starting the map from Warcraft directly or from the vanilla WE (via test map) works, starting the map via WEX always crashes.

Hmm, so it may not be my fault :p

I suppose that when you run the map from WEX it automatically injects the sharpcraft runtime into WC3, does it? That could make things behave differently.

Did you also get a crash on the first Testmap I posted (yesterday night)? Because on that map I forgot to remove a "return" statement, and the stack searching code was not being run at all. So if it crashed, it's because of something else.

I suppose I need to make a better testmap, outputting messages to an external file. After all, BJDebugMsg is useless if the game crashes on the loading screen. Either that, or delay the searching until after the loading screen.

Btw, @Quilnez are you trying to run it from WEX too? What exactly is the OS and WC3 version that you are runnning?
 
Last edited:
Level 3
Joined
May 19, 2010
Messages
35
IIRC the first map did also crash, but I may have removed that return myself, can't remember if I did that before or after testing.

If I insert a return in StackSearcher after the line call BJDebugMsg("You are running on Windows. Searching for Jass context...") then it works fine with WEX. It's the searching for the jass context that crashes.
 
Status
Not open for further replies.
Top