In this post we're going to look at all the things related to the bytecode.
The bytecode was custom-designed to work reasonably fast in our jass context,
so that it would actually be useable and provide a net-positive for map
development.
One key insight is that the interpreter works on only one global hashtable.
We use the same technique as many other table libraries: use an unique integer
as the first index for the table and our provided key for the second.
Another great feature of hashtables that we use extensively is that you can use
any integer as any index. We will later see how that is usefull.
Now let's first look at the two definitions of our bytecode.
We have one in Haskell which we compile to and one corresponding file in Jass.
You can find them
here and
here.
Code:
data Instruction
-- three regs
= Lt Type Register Register Register
| Le Type Register Register Register
| Gt Type Register Register Register
| Ge Type Register Register Register
| Eq Type Register Register Register
| Neq Type Register Register Register
| Add Type Register Register Register
| Sub Type Register Register Register
| Mul Type Register Register Register
| Div Type Register Register Register
-- Mod only works on integers but for ease we still include the Type
| Mod Type Register Register Register
| SetGlobalArray Type Register Register Register
| GetGlobalArray Type Register Register Register
| SetLocalArray Type Register Register Register
| GetLocalArray Type Register Register Register
-- two regs
| Negate Type Register Register
| Set Type Register Register
| SetGlobal Type Register Register
| GetGlobal Type Register Register
| Bind Type Register Register
-- special
| Literal Type Register (Hot.Ast Var Expr) -- encoded as: lit ty reg len string
| Call Register Label Name
| Convert Type Register Type Register
-- one label
| Label Label
| Jmp Label
| Function Label Name
| JmpT Label Register
| Not Register Register
| Ret Type
deriving (Show)
A quick glance and we can see that we use four different data-types in the
instructions: Type, Register, Label, Name.
Let's look at them one by one.
Type
Type is just a plain old jass type like
handle, integer, widget
etc. There are two reasons we have a typed bytecode. The first one is once
again because we use
hashtable
s to store everything and the
hashtable API has us use the correct native. The second reason is that we avoid
redundancy in our instruction set (only one Add-Instruction etc.).
Register
The Instructionset used in JHCR is a register based instructionset which follows
from our usage of hashtables since we can have effectivly unlimited registers.
This especially aids in the code-generation phase since we don't have to care
about register allocation (an NP-Hard Problem). It actually has some more
benefits of which i might speak about in a later post.
But even though we have more or less unlimited registers we still have to follow
some rules, which result in the calling convention used in JHCR. To explore this
let's have a look at a small example.
JASS:
native print_integer takes integer x returns nothing
function add takes integer a, integer b returns integer
return a+b
endfunction
function main takes nothing returns nothing
local integer r = add(1, 3)
call print_integer(r)
endfunction
This snippet
could be compiled to this bytecode:
Code:
fun 123 add
add integer 0 1 2
ret integer
fun 234 main
lit integer -1 1
lit integer -2 3
bind integer 1 -1
bind integer 2 -2
call 1 123 add
bind integer 1 1
call -3 567 print_integer
ret nothing
Let's look at the
add
function first. From this function we can see that
the parameters to a function are stored in registers 1 to n where n is the
amount of parameters a function takes. We can also see that the return value
is stored in register 0. And this
is our calling convention.
So let's now look at the
main
function. From that we can see a few things
aswell: first of all we have to use the
bind
instruction to transfer values
from local "local" registers to the registers of the to-be-called function.
So unlike registers found in your CPU we have them scoped for each function.
The second thing we can notice is that temporary values are assigned to
"negative" registers -1, -2, etc. As i said before, this makes compiling just
a bit more easy and it doesn't cost us much since we can use the "endless"
hashtables.
But we can also see that local variables also use positive registers. In fact
they use registers n+1 .. n+m, where m is the amount of local variables
declared inside that function.
And we can see how the
call
instruction is used since the return value
is directly copied into the register of our local variable:
Code:
call 1 123 add
^ ^ ^
| | |
| | +- The Name of the function is ommited in the final bytecode but
| | `- it's useful for debugging.
| |
| `- Just the internal id of the function to-be-called
|
+- The Interpreter takes whatever is stored in the called functions
+- register 0 and puts it into this functions register 1 which is the
+- the local variable r. Technically the transfer of data actually
+- happens in when interpreting the ret instruction since we lack the
`- type information in the call instruction.
Question to the reader: how do you do local arrays with this setup?
Label
To achieve loops and conditions and all that good stuff we use labels and jump
instructions. We actually don't have many at all: Label, Jmp, JmpT.
The Label instruction just creates the point in a list of instructions under
that "label". Do note that that label is static, so you can't create dynamic
labels to jump to. Once interpreted the interpreter simply ignores any label
instruction. The Jmp instruction on the other hand just jumps to the
corresponding label instruction. The following snippet is an infinite loop.
Now finally the JmpT instruction jumps to the label if and only if the value
stored under the register is true. For an example look at the code below.
That code is an if-statement compiled to our bytecode where the condition
is stored in register -1.
Code:
jmpt 2 -1
<else branch>
jmp 3
label 2
<if branch>
label 3
Name
The Name datatype is mostly ignored because we assign unique ids to everything
we can. There is only one case where we actually have to use an actual name.
Can you guess?
If we reload the script with a totaly new function that wasn't seen before,
that new function ofcourse also gets an id but to link up the name to the new
id in our already running map we have to transfer both. And we need that
mapping for
ExecuteFunc
only (i think).
fin
Now that we looked at the datatype representing the bytecode we can guess how
many of those instructions actually work. But as you might have guessed this
is still only a fairly high-level description of what JHCR is doing so i hope
i can shed some light on the nitty-gritty details in a later post.