[Concept] Compiling the map script directly to JASS bytecode

leandrotp · Mar 7, 2017

While people are discussing if WC3 is going to die, I, on the other hand, want to give it a new life

The Bytecode compiler - compiling the map script directly to VM instructions

When I first discovered how to typecast in the latest WC3 patch, the first thing that came to my mind was to use I2C and run JASS bytecode from arrays. And I was very happy to find that it works, but then I went a step further and revived the old Memory exploit from 1.23b.

After that, I just gave up of the dream of running a full map script entirely from VM bytecode. Because Blizzard would certainly remove the typecasting once again, and all the ideas I had by that time would simply not work in the future patches.

But to everyone's surprise, Blizzard just fixed the real vulnerability in the VM, without removing the ability to typecast values. And this means that typecasting and bytecode execution are now officially supported by them, and is expected to work in all future patches! So my dream is more alive than ever!

Why bytecode?

The JASS language is a very basic and simple scripting syntax. It doesn't have many features, it lacks OOP and many other things that are essential for advanced development.

To address this gap, many tools have been developed through all these years to implement new features and extend the functionality of the language. JASS has been extended to vJASS, and new languages like Zinc and Wurst have also been developed.

But all these tools have one thing in common: internally they all compile your code to the basic JASS script. Which means that all of their extended capabilities must be implemented somehow with the existing features of basic JASS. Structs are implemented with arrays, dynamic code execution is implemented with triggers. And there isn't much more they can do, as they are limited by the underlying JASS engine.

Now what if we could remove all those limitations? Compile our code directly to the VM that runs behind the scenes? Working with pointers and memory, accessing the VM registers and even calling code variables directly, without the overhead of creating a new VM instance every time...

Just like in C/C++, that are compiled to x86 machine instructions, we can have a development tool that compiles directly to VM bytecode, unlocking the full potential of the JASS VM in a way never seen before.

Benefits of executing bytecode

Ultimate Code Protection
Many WC3 maps use some form of code protection to prevent people from seeing/modifying their code. Be it to prevent cheating, or to prevent someone's work from being stolen, code protection tools have been developed and widely used by map makers.

But there is no protection tool that can hide the source code of a map completely. They all rely on obfuscation methods, renaming variables and functions, which makes the code difficult, but not impossible, to be read and understood.

With bytecode, this is different. When we compile directly to VM instructions, the original source code is destroyed. All that's available to the end user is the compiled bytecode. It's not readable by humans, and even if someone developed a "JASS Disassembler", a tool that could translate the bytecode into a human-readable representation, it's still a low level code. There's no way to recover the original source from it.
Dramatic Speed Increase
The WC3 internal JASS compiler, that translates the map script into bytecode at runtime, is very very inefficient. It produces too much overhead, generates many unnecessary instructions, and doesn't expose the full power of the VM.

If we can create the compiled bytecode ourselves, we will be able to unleash the full power of the engine. Not only there won't be any more overheads, but we can also make optimizations like assigning variables directly to VM registers, and inlining constants directly in the code.

You may think that speed is not a real concern these days, as PCs are much more powerful than before, but this thing has the potential to dramatically reduce the loading times of maps, as well as save a lot of processing power (and consequently, saving battery power too)
No More LEAKS!!!
Sounds like a dream come true, doesn't it? The allocation, assignment, and destruction of local variables is a costly operation, the game needs to compute a hash of the variable's name every time you use it, and you even need to manually clean all your variables at the end of functions, if you don't want to produce leaks.

But if we run our entire map from bytecode, we won't use JASS local variables anymore! We will do what every machine compiler already does: assign VM registers for all local variables used in the code. Not only this will result in an incredible speed increase, but also there will be no need to null those variables anymore, since registers are always cleaned up at the end of execution.
Direct memory access
Through the use of type-casting, it's possible to achieve read-only memory access from the regular JASS script. But with bytecode, we can also allocate new blocks of memory, and write data to them (we can only write to memory that we allocated ourselves). This unlocks the potential of unlimited data storage, which could have some new applications not yet researched.
Dynamic code generation/execution
As demonstrated here, it's possible to generate new chunks of bytecode at runtime, write them to an array and execute. If we run the entire map from bytecode, this process becomes much easier, as we can dynamically allocate memory for these new code blocks, and even call them DIRECTLY, without the use of triggers, as well as being able to JUMP or CALL any part of the code with a single VM instruction.

The implementation

Basically this tool will work as an external compiler, which could possibly be integrated into Jass NewGen Pack, and maybe even replace Jasshelper completely. It would work similar to WurstScript, which has a compiler and a Standard Library, to provide some utility functions. This library would include things like API for Memory allocation, Bytecode execution, and some other useful stuff.

The resulting map will also have custom common.j and Blizzard.j scripts imported. Common.j will contain only the native declarations, all constants will be removed from the script. Because the JASS VM wastes time to initialize constants, we will remove all the constant declarations and inline their values directly on the bytecode.

Blizzard.j will also be nearly empty. Constants removed, and all those unnecessary BJ functions will also be stripped. Only the APIs that are actually referenced by the map script will be kept, but not in that file. Instead they will be reimplemented in pure bytecode, and compiled together with the main map script.

Then the map script file (war3map.j) just needs to have an initialization stub. All this code does is to initialize the Bytecode array and execute it. It may also contain string literals, since strings must be initialized before Bytecode can use them. But it's also possible to turn all strings into GetObjectName calls, like some map protectors already do, and then GetObjectName can be called directly from bytecode.

Notice that the compiled bytecode itself doesn't need to be inserted into the map script, like this. It can also be loaded from a custom file inside the map MPQ! In that case it needs to be pre-processed, to generate a Jump Table and patch CALL/JMP instructions, and after that it will be copied to a normal JASS array. All this work will be done by the initialization stub.

Conclusion

This thread's intention is to present the idea and concept of running an entire map from pre-compiled bytecode. I am posting it because this is a big project, and it's gonna take a really long time to see the light of day, if ever.

Since I lack the time to develop all these things just by myself, I'm making it public so that people can have their own ideas. Soon I will be posting internal details about how the JASS VM works, as well as a detailed explanation of every bytecode instruction of the VM.

I'd like very much to know your opinion about this. I have never written a compiler before, so I'd like to hear some ideas. From which language should I be generating code? vJass, Wurst, maybe even Lua? Would you like some new language features like a switch case, or inline functions? If you have any doubts or suggestions feel free to post here.

leandrotp · Mar 7, 2017

Reserved for (near-)future use. Here I will be posting the specifications of the JASS VM and details about every bytecode instruction.

TriggerHappy · Mar 7, 2017

Great idea.

Any chance we could get some sort of data storage back with the memory exploits and 1.27b (and future patches)? I'd like to update Codeless Save and Load (Multiplayer) - v1.3.5 to not require local files, again.

Also I was kind of waiting for the next big patch to see if they keep the return bug, and if we should use it's features like code arrays through typecasting safely in our maps. I saw your last post on how they fixed the bug in the way you suggested, but still.

I get your idea is to move away from traditional JASS but still having certain features for the original is good.

I was also thinking they should fix the old return bug syntax, which will fix many broken maps.

LeP · Mar 7, 2017

tbh i'm not a fan of bytecode. I feel like every programming aspect is pretty much doable in jass, and reasonable fast too. And i like the discoverbility of wc3; if i dont know how a map does something i can just look it up. That is still true with bytecode but much more of a hassle.

DracoL1ch · Mar 7, 2017

small reminder - you can't save game if you modified it's memory. it would either leads to fatal error (if we talking about extended array) either won't save altered state of things you patched

fenix140 · Mar 8, 2017

Very interesting.
I assume you could import drives or spells to a map protected by .txt or .j files, to have your own resources for a single map, such as creating your real-time objects or decorations something that can not be done 100% through of the Normal editor. Now with respect to what kind of programming language would be used, in my opinion it would be Lua since most of the Warcraft community is more familiar with this one.
Greetings ... (I hope you can solve the problem of saving)

leandrotp · Mar 8, 2017

TriggerHappy said:
Great idea.

Any chance we could get some sort of data storage back with the memory exploits and 1.27b (and future patches)? I'd like to update Codeless Save and Load (Multiplayer) - v1.3.5 to not require local files, again.

Also I was kind of waiting for the next big patch to see if they keep the return bug, and if we should use it's features like code arrays through typecasting safely in our maps. I saw your last post on how they fixed the bug in the way you suggested, but still.

I get your idea is to move away from traditional JASS but still having certain features for the original is good.

I was also thinking they should fix the old return bug syntax, which will fix many broken maps.

Unless someone discovers a new vulnerability that revives the Memory hack in 1.27b, there's no way get any form of File I/O without Local Files enabled. And about the future patches, I'm pretty sure Blizzard never touches any stuff if the current state of things doesn't demand changes.

I mean, the current state is that the game is safe, so they will not touch anything related to JASS in future patches, and consequently typecasting will continue to work. And they will not restore the old return bug syntax either, because there's no real need for it. But if a new vulnerability is found, they will be forced to patch it, and maybe they are not as generous as they were in this patch, and decide to remove typecast completely.

I also think that the use of bytecode to bring new features to regular JASS is good, and I plan to release some snippets that demonstrate this, but I just wanted to show that moving away from JASS completely is even better.

LeP said:
tbh i'm not a fan of bytecode. I feel like every programming aspect is pretty much doable in jass, and reasonable fast too. And i like the discoverbility of wc3; if i dont know how a map does something i can just look it up. That is still true with bytecode but much more of a hassle.

I know that you have the skills to look at bytecode and analyze it, but most people don't. Protection is never too much, the average mapmaker or hacker doesn't know much about programming, they just copy/paste some snippets and press Compile. By migrating to bytecode, this kind of people won't be able to copy the map script nor modify it.

As for the other reasons, speed is not the only benefit. Bytecode also introduces some new features to the language.

DracoL1ch said:
small reminder - you can't save game if you modified it's memory. it would either leads to fatal error (if we talking about extended array) either won't save altered state of things you patched

The WC3 innate Save/Load system is already bugged by itself. So until Blizzard fixes those bugs, there's no reason to worry about that, as most maps don't use it already.

And that's also the case with Memory Hack, even in read-only mode, things may not work correctly after loading a saved game (like if you save the address of an object in variable, that object might be in a different place after loading saved game), so it would be no different than what we have already.

Waffle · Mar 8, 2017

impl suggestion:

implement a new LLVM backend, then you can use any of the llvm supported source languages and get powerful optimisations as well as compatibility.
then anybody can write a new frontend for any jass-preprocessor dialect they desire. (vJASS is probably mandatory if any existing maps would migrate to use this, tho Wurst looks nice too.. but if you have a llvm backend you can just as easily use a llvm python impl (i know at least 1 exists).

a problem with many of the current preprocessors is that they focus too much on tiny syntax tweaks.. like cjass barely adds anything new and just makes stuff more concise by using c-like syntax in stead of the default verbose jass.. and vJASS has a lot of it as regex replacements as far as i can tell (if somebody could tell which pascal setup i'd need to actually compile it that would help tho)

also.. many languages have a hard time mapping to JASS2, if we get direct access to bytecode we might implement more languages, more diverse features at acceptable complexity/performance.

for example.. are jass arrays at the vm level always 8k? or is the jass-> bytecode interpreter just written to always use 8k arrays for some reason? true native bitwise ops possibly? or string ordinal without epic lookup tables?
that would be awesome.

hmm basically we could also make it so that only tiny snippets are bytecode.

eg we have some function that has a very arcane jass impl... eg A2I or I2A (or ord() and chr() in python)
if the bytecode if more flexible than jass then many such functions could be implemented in bytecode for .. potentially much greater efficiency.

i would rather not write bytecode by hand tho, we rly need a way to generate goot bytecode reliably and without too much fuss if anything significant will be done with it.
also need good docs for the entire bytecode format and supported operations!

IcemanBo · Mar 8, 2017

Won't it be too much effort for making Warcraft 3 maps? Going lower and lower gives you less limits, but it's also kind of counter productive in regards of comfort.

Such points like having more performant functions like ord() etc is maybe unimportant. Also protection of (free) wc3 maps is maybe no major point I would focus at myself. Much more interesant is that working with pointers, registers and memory blocks I think.

But really much respect for the work you do, it's extremly interesting to follow everything about it.

Aniki · Mar 8, 2017

Direct memory access

Through the use of type-casting, it's possible to achieve read-only memory access from the regular JASS script. But with bytecode, we can also allocate new blocks of memory, and write data to them (we can only write to memory that we allocated ourselves).

Is it really possible to allocate new memory blocks from Jass2 bytecode though? I wonder how that would work...

PS: I don't know Jass2 bytecode (or any other bytecode) =)

Deleted member 219079 · Mar 8, 2017

Aniki said:
Is it really possible to allocate new memory blocks from Jass2 bytecode though? I wonder how that would work...

I guess by fiddling with stack pointer.

DracoL1ch · Mar 8, 2017

we already have malloc() working

Aniki · Mar 8, 2017

we already have malloc() working

Does it work with read-only memory access?

DracoL1ch · Mar 8, 2017

definitely not. but it's up to you to decide, if you wanna climb on aritificial complexity brought by modern patch or not.
I personally can see blizz patch out typecasting eventually, as even this thread shows there's still vulnerability of executing code with unknown purposes. WC3 death is really really near

Waffle · Mar 8, 2017

simple bitwise ops is something that many people would expect to have trivial cost (because they are like 1-cycle ops in hardware) in jass tho they need expensive software emulation on top of a slow interpreter.
many algorithm are optimised to take advantage of operations that are cheap in hardware.

many algorithms* would have exceptionally poor performance if implemented in jass.
it should be quite confusing if what is sopposed to be a fast operation suddenly dominates your runtime and causes perf issues on your map.

these are the sort of operations that you often find in tight loops that dominate your runtime. optimizing them can have a huge impact.

*examples: crypto algorithms(look at the bitwise and/or/xor/nand etc libs on hive, they often need entire sub-libs of their own to manage the complexity!), variable length integer encode/decode(sb wanna implement protobuf maybe?), usage of integers as bitfields(idiomatic code in some languages, people expect this to be trivial and will be surprised at how expensive it is), base-64 encoding/decoding might reasonably use it since it has a base that is a multiple of 2 and chopping bits is expected to be faster than some other ops eg mul/div.

also it would be nice to have (void*) because somethings being able to point to a piece of data without specifying what the type of said data is... can be rly fucking valuable.
eg lets say you want the equivalent in jass..

my_heterogenious_list = [1, 3.14, "foo", foo]

how would you express that in jass2?
have 4 parallel arrays for data of each possible type and then a 5th array to select things maybe..? you just can't do it elegantly at all. (toy example is silly like all toy examples, but that does not mean there are no more useful use cases)

or maybe you want

def serialize(object:"any type can go here, perhaps you feed this param from a list? like maybe a list of player state consisting of many strings/numbers in to the load code")->str:
return str(object) # potentially logic depends type here?

to make a load_save system that is actually elegant to use is bloody difficult in jass.

leandrotp · Mar 8, 2017

Aniki said:
Does it work with read-only memory access?

It definitely does. You're mistaken this time Draco, memory allocation does not require access to OS functions, it can be implemented with dynamic creation of JASS arrays at runtime (the malloc function of Memory hack is implemented this way).

Basically, with bytecode, we have the power to create new global variables at runtime, as opposed to regular JASS where all globals are statically created at map initialization. Therefore all we need to do is declare a new global array variable, and we instantly get access to a fresh 8k block of memory.

Waffle said:
impl suggestion:

implement a new LLVM backend, then you can use any of the llvm supported source languages and get powerful optimisations as well as compatibility.
then anybody can write a new frontend for any jass-preprocessor dialect they desire. (vJASS is probably mandatory if any existing maps would migrate to use this, tho Wurst looks nice too.. but if you have a llvm backend you can just as easily use a llvm python impl (i know at least 1 exists).

a problem with many of the current preprocessors is that they focus too much on tiny syntax tweaks.. like cjass barely adds anything new and just makes stuff more concise by using c-like syntax in stead of the default verbose jass.. and vJASS has a lot of it as regex replacements as far as i can tell (if somebody could tell which pascal setup i'd need to actually compile it that would help tho)

also.. many languages have a hard time mapping to JASS2, if we get direct access to bytecode we might implement more languages, more diverse features at acceptable complexity/performance.

Man I really forgot about the LLVM thing. It sounds like the perfect idea, however I'm not sure if this translation of LLVM->JASS Bytecode can really be performed, LLVM is just too complex, it has a quite big instruction set, and the JASS VM is very very simple (something like ~40 operations only). I am not familiar with LLVM, but I'll take a look at it, and if I find ways to work around this, it will definitely be my choice.

Waffle said:
for example.. are jass arrays at the vm level always 8k? or is the jass-> bytecode interpreter just written to always use 8k arrays for some reason? true native bitwise ops possibly? or string ordinal without epic lookup tables?
that would be awesome.

hmm basically we could also make it so that only tiny snippets are bytecode.

eg we have some function that has a very arcane jass impl... eg A2I or I2A (or ord() and chr() in python)
if the bytecode if more flexible than jass then many such functions could be implemented in bytecode for .. potentially much greater efficiency.

Jass arrays are always 8k. Bitwise operations are not natively supported by the VM, but I have implemented them with natives and mem-reading in my Bitwise library. It's certainly better than any alternative that uses some complicated arithmetic and array-lookup. And string ordinal is already implemented too, it's very straightforward: we just obtain the Memory-address of a JASS string, and read it as an integer.

Waffle said:
also need good docs for the entire bytecode format and supported operations!

Complete documentation will come soon.

Waffle said:
also it would be nice to have (void*) because somethings being able to point to a piece of data without specifying what the type of said data is... can be rly fucking valuable.

In bytecode world there are no types, it's just binary data. If I can really make this thing as a LLVM backend, you'll be able to use the void* from C just fine. If not, we can certainly add this feature to whatever language we're going to use, when it's compiled to bytecode, all type-safety disappears.

Flux · Mar 8, 2017

jondrean said:
I guess by fiddling with stack pointer.

Hmm, didn't expect JASS Bytecode would be powerful enough to change the stack pointer itself. Also, the stack is mostly used for function calling and returning to the last instruction you left off so I doubt it would be a good place to dynamically allocate memory.

Anyway, nice thread. Hoping this will turn out to something good. If this "development tool that compiles to bytecode" can take JASS/vJASS script, it would be very promising.

leandrotp · Mar 8, 2017

Flux said:
Hmm, didn't expect JASS Bytecode would be powerful enough to change the stack pointer itself. Also, the stack is mostly used for function calling and returning to the last instruction you left off so I doubt it would be a good place to dynamically allocate memory.

Anyway, nice thread. Hoping this will turn out to something good. If this "development tool that compiles to bytecode" can take JASS/vJASS script, it would be very promising.

Look at the post right above, this has nothing to do with stack pointer

fenix140 · Mar 9, 2017

A side note:
(I suppose CS would not exist if there was no initiative to create a game based on another using external tools, right ?.
In fact, this tool is certainly a great help for the Modding of the community, although it is counterproductive in several aspects, these can be solved for the most part, taking as an example that Blizzard accepted the suggestion of leandrotp not to completely eliminate this option Of your code. In the future Blizzard consider putting more limitations or not to this, so I could say that we finally have a technical support for this tool.)
Greetings..

Dr Super Good · Mar 20, 2017

leandrotp said:
We will do what every machine compiler already does: assign VM registers for all local variables used in the code.

Machine compilers do not assign registers for local variables. Instead it allocates a register to hold the content of a local variable when required. If it is possible for a register to hold a local variable value for the duration of its use then it does not get memory allocated on the stack, otherwise it still needs stack storage. The more suitable registers available, the more likely it is that a local variable could completely reside in registers. This is best seen with x86 vs x86-64 since in x86 most local variables have to be written to stack due to how few registers are available while in x86-64 the large number of general purpose registers allow for some local variables to be completely in registers, and even for function parameters to be transferred in registers allowing for a minor speedup as well as improved code density.

Unless JASS has an insane number of general purpose registers, it will be necessary to use local variables for some complex functions. However a good compiler should try to use registers to hold local values as much as possible and if local variables do have to be used then it should choose highly optimized values for fast hashtable lookup.

The problem with this entire approach is how hacky it is. Chances are using it is not very future proof.

Clamp · Mar 20, 2017

Does that mean you work on some tool which will obfuscate (well, even compile) code into bytecode, or it's only concept for now?

As I can see, there lies is a lot of hard work in order to create such tool, which should literally be standalone IDE.

By the way, when mapmaker reach such high level of knowledge it's will prevent him from switching to standalone game development, IMHO.

leandrotp · Mar 20, 2017

Dr Super Good said:
Unless JASS has an insane number of general purpose registers, it will be necessary to use local variables for some complex functions. However a good compiler should try to use registers to hold local values as much as possible and if local variables do have to be used then it should choose highly optimized values for fast hashtable lookup.

The problem with this entire approach is how hacky it is. Chances are using it is not very future proof.

The VM has 256 registers. No function uses that many locals, so it's certainly possible to put all locals and even function arguments in registers.

Basically I'm thinking of creating a new calling convention for this machine. Like in the x86 architecture, we have rules for register usage, like volatile and non-volatile registers, return address in EAX, and so on. So, this is what I have in mind:

Register R0 is reserved for return address. This is already the case in the current implementation, all natives and JASS functions already return their values through this register, so we're gonna keep it this way.
Registers R1 through R127 will be treated as volatile registers. Which means their values are not expected to be preserved across function calls. They will be used as function arguments, and for general computations throughout the code. Not sure if I need that many registers, but honestly I don't know what to do with f**king 256 registers.
Registers R128 through R254 will be treated as non-volatile registers. They will be used to hold the local variables of functions. So, they are expected to be always preserved, which means that every function that uses any of them, must backup their previous value, by doing a PUSH before using the register, and a POP before returning. Maybe we can change this division, to allow for more than 127 locals, since we probably don't neeed that many volatile registers.
Register R255 will be reserved for the return address of the function. This is a change from the current implementation, because I'm going to implement all function calls with JMP instructions instead. So basically every call will be turned into "MOV R255, <retaddr>/JMP <dest>", and every return instruction will become "JMP R255".

As for being future-proof, I don't think Blizzard will ever touch anything related to the JASS VM. They follow the "If it ain't broke, don't fix it" method, so it's very unlikely they're gonna change something in the VM that breaks this.

Clamp said:
Does that mean you work on some tool which will obfuscate (well, even compile) code into bytecode, or it's only concept for now?

As I can see, there lies is a lot of hard work in order to create such tool, which should literally be standalone IDE.

By the way, when mapmaker reach such high level of knowledge it's will prevent him from switching to standalone game development, IMHO.

I haven't begun any development yet, I'm just posting my ideas so I can get some feedback while I don't put my hands to work.

As for the IDE, I intend to make something that integrates with World Editor, not sure if it will complement the functionality of Jasshelper or completely replace it.

Clamp · Mar 20, 2017

leandrotp said:
As for the IDE, I intend to make something that integrates with World Editor, not sure if it will complement the functionality of Jasshelper or completely replace it.

I am not sure if you even need WE, your module can be ST3 package for example. What do you think about such approach?

Lizreu · Mar 25, 2017

leandrotp said:
The VM has 256 registers. No function uses that many locals, so it's certainly possible to put all locals and even function arguments in registers.

Basically I'm thinking of creating a new calling convention for this machine. Like in the x86 architecture, we have rules for register usage, like volatile and non-volatile registers, return address in EAX, and so on. So, this is what I have in mind:

Register R0 is reserved for return address. This is already the case in the current implementation, all natives and JASS functions already return their values through this register, so we're gonna keep it this way.

Registers R1 through R127 will be treated as volatile registers. Which means their values are not expected to be preserved across function calls. They will be used as function arguments, and for general computations throughout the code. Not sure if I need that many registers, but honestly I don't know what to do with f**king 256 registers.

Registers R128 through R254 will be treated as non-volatile registers. They will be used to hold the local variables of functions. So, they are expected to be always preserved, which means that every function that uses any of them, must backup their previous value, by doing a PUSH before using the register, and a POP before returning. Maybe we can change this division, to allow for more than 127 locals, since we probably don't neeed that many volatile registers.

Register R255 will be reserved for the return address of the function. This is a change from the current implementation, because I'm going to implement all function calls with JMP instructions instead. So basically every call will be turned into "MOV R255, <retaddr>/JMP <dest>", and every return instruction will become "JMP R255".

As for being future-proof, I don't think Blizzard will ever touch anything related to the JASS VM. They follow the "If it ain't broke, don't fix it" method, so it's very unlikely they're gonna change something in the VM that breaks this.

I haven't begun any development yet, I'm just posting my ideas so I can get some feedback while I don't put my hands to work.

As for the IDE, I intend to make something that integrates with World Editor, not sure if it will complement the functionality of Jasshelper or completely replace it.

Do you have any plans to handle ops limit resetting? Like, currently the only way to do heavy computations in JASS is through various, rather hacky, means, which start a new "thread", such as using ExecuteFunc, or TriggerExecute, etc. (there's plenty). I know next to nothing about the JASS VM, but I suspect that what these methods do is instantiate a new (sub-?)instance of the VM (or a thread? whichever is proper). What do you know about this, and do you know if registers carry over or not?

Sorry if I sound clueless - I really am, and I know very little about this whole ordeal, but the ops limit was something that bothered a lot back when I was mapping for wc3 and it would be REALLY nice to have some language-level way to handle it (perhaps a 'reset' keyword that marks a point where such an operation should occur, transferring registers if needed?)

P.S. On an unrelated note, I don't see how using a single register (and not a stack, for example) for the return address of the function can work.
Say you call some function F2 from function F1. F1 stores the return address (for simplicity's sake call it 0x0001) in R255. F2 then calls F3, and overwrites R255 with 0x0002, so the original return address is lost.

There's a reason a 'stack overflow' from too many recursive calls can occur.

Dr Super Good · Mar 25, 2017

Is Jass byte code safe? What security risks does it expose? Maybe requesting that WC3 supports compiled JASS files would be a viable request?

Kakerate · Mar 25, 2017

Maybe requesting that WC3 supports compiled JASS files would be a viable request?

good luck with that ;o

This whole project is very interesting, but the massive amount of work it would take to even a leg up on this is insane. If you have the means of creating this though, by all means!

Tbh, I'm surprised no one has elaborated on this before. I suppose as far as me using this goes, it really just boils down to what Blizzard can bring to the table in their new patches. If they come up with something more tantalizing than Memory Hack, then I'm all in ;3

Lord of theDing · Apr 13, 2017

I really like that idea. But to even get started with that two things are probably necessary:

First a documentation of the VM. All existing Opcodes, how operands and parameters are handled, how the current compiler does function calls, etc. To get a few more people on board it's really important to give a starting point. I also may take a try at it, but I don't want to reverse engineer all of Jass myself. You seem to know the engine pretty well, please write it down somewhere for others.

Second I would add some rudimentary functions to call bytecode from Jass. Something like a textmacro to "call" an integer array and store the result into a local variable. Something like this

JASS:

library MyAwesomeLib initializer init

    globals
          integer array bytecode
    endglobals

    public function Foo takes int bar returns integer
          local integer result
          //! runtextmacro CALL_BYTECODE(bytecode, result)
          return result
    endfunction

    private function init takes nothing returns nothing
         set bytecode[0] = <my>
         set bytecode[1] = <bytecode>
         set bytecode[2] = <here>
         ...
    endfunction
endlibrary

This would enable some madmans, who need all performance they can get, to write optimized code for their systems and also give a starting point to test and debug external compilers for Jass. If we have something like this, then we don't need fancy magic to insert the bytecode into the map, just dump the set bytecode = X lines into a file and copy them into the map. Once that works we can write more sofisticated tools.

Don't run before you know how to walk.

Also, if you didn't notice till now, I also am very interested in writing a tool myself

Lizreu · Apr 15, 2017

Lord of theDing said:
I really like that idea. But to even get started with that two things are probably necessary:

First a documentation of the VM. All existing Opcodes, how operands and parameters are handled, how the current compiler does function calls, etc. To get a few more people on board it's really important to give a starting point. I also may take a try at it, but I don't want to reverse engineer all of Jass myself. You seem to know the engine pretty well, please write it down somewhere for others.

Second I would add some rudimentary functions to call bytecode from Jass. Something like a textmacro to "call" an integer array and store the result into a local variable. Something like this

JASS:

library MyAwesomeLib initializer init globals integer array bytecode endglobals public function Foo takes int bar returns integer local integer result //! runtextmacro CALL_BYTECODE(bytecode, result) return result endfunction private function init takes nothing returns nothing set bytecode[0] = <my> set bytecode[1] = <bytecode> set bytecode[2] = <here> ... endfunction endlibrary

This would enable some madmans, who need all performance they can get, to write optimized code for their systems and also give a starting point to test and debug external compilers for Jass. If we have something like this, then we don't need fancy magic to insert the bytecode into the map, just dump the set bytecode = X lines into a file and copy them into the map. Once that works we can write more sofisticated tools.

Don't run before you know how to walk.

Also, if you didn't notice till now, I also am very interested in writing a tool myself

I have a slightly alternative suggestion - after more info about the jass vm is released (huge kudos to the people reversing this stuff, btw), one should be relatively easy to draft up some rudimentary asm-style language (so just opcodes), and then modify pjass or whatever the current vjass compiler is to allow bytecode inserts. Behind the scenes it will initialize the bytecode array in some place and do all kinds of stuff necessary for that, but in the code it will look just like a C-style asm insert. That will make experimenting with bytecode much easier and potentially lower the entry level for it, thus meaning more people messing around with it, thus more progress.

Trigger.edge · Apr 20, 2017

@leandrotp: Can you post more about the bytecode to understand how it works?

EDIT: I am currently reading this JASS - Bytecode and I2C | The Helper

Lord of theDing · Apr 23, 2017

I had a look at the sources of grimoire and found this little gem:

Code:

enum TYPES {
   J_NOTHING=0x0,J_NULL=0x2,J_CODE,J_INTEGER,J_REAL,J_STRING,J_HANDLE,J_BOOLEAN,
   J_INTARRAY=0x9,J_REALARRAY,J_STRARRAY,J_HARRAY,J_BOOLARRAY
};

enum OPCODES { OP_ENDPROGRAM=0x1,
   OP_FUNCTION=0x3, // _ _ rettype funcname
   OP_ENDFUNCTION=0x4,
   OP_LOCAL=0x5, // _ _ type name
   OP_GLOBAL=0x6,OP_CONSTANT=0x7,
   OP_POPFUNCARG=0x8, // _ srcargi type destvar
   OP_CLEANSTACK=0xB, // _ _ nargs _
   OP_LITERAL=0xC, // _ type destreg srcvalue
   OP_SETRET=0xD, // _ srcreg _ _
   OP_GETVAR=0xE, // _ type destreg srcvar
   OP_CODE=0xF,
   OP_GETARRAY=0x10,
   OP_SETVAR=0x11,   // _ _ srcreg destvar
   OP_SETARRAY=0x12,
   OP_PUSH=0x13, // _ _ srcreg _
   OP_SETRIGHT=0x14,
   OP_NATIVE=0x15, // _ _ _ fn
   OP_JASSCALL=0x16, // _ _ _ fn
   OP_I2R=0x17,
   OP_AND = 0x18,
   OP_OR = 0x19,
   OP_EQUAL=0x1A,
   OP_NOTEQUAL=0x1B,       // check
   OP_LESSEREQUAL=0x1C,OP_GREATEREQUAL=0x1D,
   OP_LESSER=0x1E,OP_GREATER=0x1F,
   OP_ADD=0x20,OP_SUB,OP_MUL,OP_DIV,
   OP_MODULO = 0x24,
   OP_NEGATE=0x25,
   OP_NOT = 0x26,
   OP_RETURN=0x27,   // _ _ _ _
   OP_JUMPTARGET=0x28,
   OP_JUMPIFTRUE=0x29,OP_JUMPIFFALSE=0x2A,
   OP_JUMP=0x2B,

   /*
// statically typed mods
   SOP_ADD_INT=0x80,SOP_ADD_REAL,SOP_ADD_STRING,
   SOP_SUB_INT=0x84,SOP_SUB_REAL,
   SOP_MUL_INT=0x88,SOP_MUL_REAL,
   SOP_DIV_INT=0x8C,SOP_DIV_REAL,
   SOP_NEGATE_INT=0x90,SOP_NEGATE_REAL,

   SOP_GE_INT=0x94,SOP_GE_REAL,
   SOP_LE_INT=0x98,SOP_LE_REAL,
   SOP_G_INT=0x9C,SOP_G_REAL,
   SOP_L_INT=0xA0,SOP_L_REAL
   */
};

typedef struct opcode {
   union {
       unsigned char r1;
       unsigned char nfarg;
   };
   union {
       unsigned char r2;
       unsigned char usereturn;
   };
   union {
       unsigned char r3;
       unsigned char rettype;
   };
   unsigned char optype;
   union {
       int arg;
       opcode *dest;
   };
} opcode;

It's not a documentation, but already pretty much information. Looks like each bytecode instruction consists of 8 byte/ 2 entries in an integer array. I may just hack a very basic assembler/disassembler for jass.

Chaosy · Apr 23, 2017

Sounds like a pain to develop.

Lord of theDing · Apr 23, 2017

I have written a little script to print the Bytecode of an JASS function ingame. It does some rudimentary translation to readable text, but it's neither finished nor very sophisticated. I attached a map with the code.

Edit: Added screenshot.

Nestharus · May 23, 2017

I've done quite a bit of work on compilers and symbol tables and stuff. I could perhaps help out with some stuff.

Here's a video on some stuff I came up with that I don't think is being used anywhere ->

Lord of theDing · May 24, 2017

Currently we still have a bit of a roadblock.
Long story short: We can't jump (yet). All jump instructions have label-ids as target, and we can't create new ids. But @leandrotp said, that the table that translates these ids to addresses has no bounds checking, so we can overflow it and use data we can control. But the big problem here is to find the address of the jump table.
If you haven't seen it yet, have a look at this thred JASM - Let's dive into bytecode

Aniki · May 25, 2017

But @leandrotp said, that the table that translates these ids to addresses has no bounds checking, so we can overflow it and use data we can control.

How would one overflow it though?

Lord of theDing · May 25, 2017

If I understood it correctly, then a label ID is just the offset into the table. So something like jmp 0x42 would jump to jumptable[0x42].

So lets assume the table starts at address 0x0000beef and we have an jass array at 0xdeadbeef. So the differenze between the two is 0xdead0000. If we now have the instruction jmp 0xdead0042 it would jump to jumptable[0xdead0042] and that would be the same as jass_array[0x42].

We know how we get the address of a jass array (thats how we execute it as code), but the address of the table is still a problem.

Aniki · May 25, 2017

If we now have the instruction jmp 0xdead0042 it would jump to jumptable[0xdead0042] and that would be the same as jass_array[0x42].

But how would we modify the jumptable such that its 0xdead0042 entry points to jass_array[0x42]?

Lord of theDing · May 25, 2017

We don't have to modify it. Jumptable is a simple array, but the array bounds are not checked. Let's say the table has 1000 entries. If we then access jumptable[100000] warcraft will happily read that location and try to jump to it. And at some point after the jump table the jass array lies in memory. So if we carefully calculate the index, we can access the values we put into jass_array by reading jumptable.

Aniki · May 25, 2017

Lord of theDing said:
We don't have to modify it. Jumptable is a simple array, but the array bounds are not checked. Let's say the table has 1000 entries. If we then access jumptable[100000] warcraft will happily read that location and try to jump to it. And at some point after the jump table the jass array lies in memory. So if we carefully calculate the index, we can access the values we put into jass_array by reading jumptable.

Oh... got it, thanks! =)

csh · May 27, 2017

Lord of theDing said:
If I understood it correctly, then a label ID is just the offset into the table. So something like jmp 0x42 would jump to jumptable[0x42].

So lets assume the table starts at address 0x0000beef and we have an jass array at 0xdeadbeef. So the differenze between the two is 0xdead0000. If we now have the instruction jmp 0xdead0042 it would jump to jumptable[0xdead0042] and that would be the same as jass_array[0x42].

We know how we get the address of a jass array (thats how we execute it as code), but the address of the table is still a problem.

You can find the jump table address in GetJassContext(1)+0x2870.

By the way, i want to say the memory writing is still work on patch 1.28 via using fake handle and any handle related natives.

Lord of theDing · May 27, 2017

What is GetJassContext()? Also is that offset specific to one version of warcraft? Because I don't want to use some magic offsets that break with each new patch of warcraft.

csh · May 27, 2017

Lord of theDing said:
What is GetJassContext()? Also is that offset specific to one version of warcraft? Because I don't want to use some magic offsets that break with each new patch of warcraft.

You can find GetJassContext in DracoL1ch's Memory Hack library, which pointer to Jass VM structure.
The offset 0x2870 is not specific to the version, but the GetJassContext related offset is different for each version.

Lord of theDing · May 27, 2017

Thanks. This should at least help to create a prototype for an compiler. But we still need a more robust way to get the address for that table, maybe as an offset to the first Jass function.

Also what you said about memory writing, do I understand it correctly that I can forge a handle id (ex unit u = 0xdeadbeef) and then use that id to write at arbitrary addresses (call SetWidgetLife(u, 42.1337))?

csh · May 27, 2017

The jass VM structure can be

Lord of theDing said:
Thanks. This should at least help to create a prototype for an compiler. But we still need a more robust way to get the address for that table, maybe as an offset to the first Jass function.

Also what you said about memory writing, do I understand it correctly that I can forge a handle id (ex unit u = 0xdeadbeef) and then use that id to write at arbitrary addresses (call SetWidgetLife(u, 42.1337))?

Using an address of integer array, and convert it to handle，the context of array is target address(an integer array structure address), handle checking function address, and specific asm code address, when native function fired, such as SetWidgetLife, it will check handle's validation, and these address in integer array will work together to make the specific array have memory writing access.

Lord of theDing · May 27, 2017

csh said:
The jass VM structure can be

Sorry, but what did you mean by that?

csh said:
Using an address of integer array, and convert it to handle，the context of array is target address(an integer array structure address), handle checking function address, and specific asm code address, when native function fired, such as SetWidgetLife, it will check handle's validation, and these address in integer array will work together to make the specific array have memory writing access.

Does this allow the same stuff as the old exploit for memory writing? Especially executing arbitrary code, calling windows API functions and loading DLLs? Then we should report that bug to Blizzard.

csh · May 27, 2017

Lord of theDing said:
Sorry, but what did you mean by that?

Does this allow the same stuff as the old exploit for memory writing? Especially executing arbitrary code, calling windows API functions and loading DLLs? Then we should report that bug to Blizzard.

That is not finished, i mean the jass VM structure can be find in JNGP's source code.
Yes, it allow jass to load dll, i mean blz didn't fix the exploit at all.

Lord of theDing · May 27, 2017

Do you have a link to the source? My google-fu is weak today.

csh · May 27, 2017

Lord of theDing said:
Do you have a link to the source? My google-fu is weak today.

Here is the link, W3Grimoire ,the open source version is old, the offset of each table in VM will be a little different with latest war3 version, but you can fix it yourself easily.

Lord of theDing · May 27, 2017

Ah thanks. I think you meant this part, right?

JASS:

class VM {
   public:
       char unk1[0x20];   // 0x00
       opcode *curop;       // 0x20
       char unk2[0x10];   // 0x24
       int zero_me3;       // 0x34
       int zero_me2;       // 0x38
       int zero_me1;       // 0x3C
       int unk_x2;       // 0x40
       int maxop;       // 0x44
       int unk_x1;       // 0x48
       Variable t[0x100];   // 0x4C t[0] reserved for returns
       char unk3[0x8];
       Symtable *symtable; // 0x2854
       Hashtable *globaltable;   //0x2858
       int b;       //0x285C
       int c;       //0x2860   //circularly linked list
       LocalScope *localtable;   // 0x2864 contains a hashtable of the locally scoped variables
       int e;       //0x2868
       int jumptable;       //0x286C
       NSTABLE *nstable;       //0x2870 used for nstr_of_jString
       char unk5[0x0C];   //0x2874
       Hashtable *functable;    //0x2880
       int g;           //0x2884
       int h;           //0x2888
       int field_288C;   //0x288C   // used for preparing code arguments
       char unk[0xC];
       int DecrementHandleFunc;   //0x289C

       HandleTable **handleTable;   //0x28A0
      
       int unktable1;
       int unktable2;
       int unktable3; //opcode address -->> int  (19?)

       // My utility functions here
       int Execute(opcode *funcstart, int unk1, int maxops, int unk2);
       NS1_T *NativeStrMap(const char *s);

       NS1_T *nstr_of_jString(jString s);
       const char *char_of_jString(jString s);

       const char *get_fn_name(opcode *op);

       Variable *GetLocalStruct(const char *name);
       Variable *GetGlobalStruct(const char *name);
       bool GetVarValue_dumb(int *out, const char *name);
       std::string GetLocalVars();
       std::string GetLocalVarList(Variable *v);
       std::string prettyop(opcode *op);
       std::string GetStringOfVarValue(Variable *v);
};

Currently I'm testing, if we can use the internal stack of the vm to read the address of the jumptable (NSTABLE *nstable in the code above). I found that calling a function with 32 parameter crashes the game, but I haven't found an application for it.

csh · May 31, 2017

Lord of theDing said:

Ah thanks. I think you meant this part, right?

JASS:

class VM {
   public:
       char unk1[0x20];   // 0x00
       opcode *curop;       // 0x20
       char unk2[0x10];   // 0x24
       int zero_me3;       // 0x34
       int zero_me2;       // 0x38
       int zero_me1;       // 0x3C
       int unk_x2;       // 0x40
       int maxop;       // 0x44
       int unk_x1;       // 0x48
       Variable t[0x100];   // 0x4C t[0] reserved for returns
       char unk3[0x8];
       Symtable *symtable; // 0x2854
       Hashtable *globaltable;   //0x2858
       int b;       //0x285C
       int c;       //0x2860   //circularly linked list
       LocalScope *localtable;   // 0x2864 contains a hashtable of the locally scoped variables
       int e;       //0x2868
       int jumptable;       //0x286C
       NSTABLE *nstable;       //0x2870 used for nstr_of_jString
       char unk5[0x0C];   //0x2874
       Hashtable *functable;    //0x2880
       int g;           //0x2884
       int h;           //0x2888
       int field_288C;   //0x288C   // used for preparing code arguments
       char unk[0xC];
       int DecrementHandleFunc;   //0x289C

       HandleTable **handleTable;   //0x28A0
     
       int unktable1;
       int unktable2;
       int unktable3; //opcode address -->> int  (19?)

       // My utility functions here
       int Execute(opcode *funcstart, int unk1, int maxops, int unk2);
       NS1_T *NativeStrMap(const char *s);

       NS1_T *nstr_of_jString(jString s);
       const char *char_of_jString(jString s);

       const char *get_fn_name(opcode *op);

       Variable *GetLocalStruct(const char *name);
       Variable *GetGlobalStruct(const char *name);
       bool GetVarValue_dumb(int *out, const char *name);
       std::string GetLocalVars();
       std::string GetLocalVarList(Variable *v);
       std::string prettyop(opcode *op);
       std::string GetStringOfVarValue(Variable *v);
};

Currently I'm testing, if we can use the internal stack of the vm to read the address of the jumptable (NSTABLE *nstable in the code above). I found that calling a function with 32 parameter crashes the game, but I haven't found an application for it.

NSTABLE is a string id table, the id is value of string variables, NSTABLE's offset is 0x2870 in patch 1.20, but it is 0x2874 in 1.24 and other higher versions, so the jump table offset should be 0x2870 in patch 1.28c.
Another point, i think it is hard to find a more robust way to get this offset.

[Concept] Compiling the map script directly to JASS bytecode

Ultimate Code Protection

Dramatic Speed Increase

No More LEAKS!!!

Direct memory access

Dynamic code generation/execution

Deleted member 219079

Deleted member 219079

Attachments

Similar threads