• Listen to a special audio message from Bill Roper to the Hive Workshop community (Bill is a former Vice President of Blizzard Entertainment, Producer, Designer, Musician, Voice Actor) 🔗Click here to hear his message!
  • Read Evilhog's interview with Gregory Alper, the original composer of the music for WarCraft: Orcs & Humans 🔗Click here to read the full interview.

Custom languages?

Status
Not open for further replies.
Level 21
Joined
Mar 27, 2012
Messages
3,232
I would like to make some quality-of-life changes to how I write vJASS, but I don't know where to start.
Ideally the process would take place when external script files have already been imported. It doesn't matter if vJASS to JASS conversion has been done at that point.

Example pseudocode(a weighted random number generator):
JASS:
Weight[1] = 5
Weight[2] = 3
Weight[3] = 6
Weights = 3

WeightSum = 0//Variable declaration without writing the type or scope.
for i in 1 to Weights//A loop that runs a set amount of times
    WeightSum = WeightSum + Weight[i]

r = Random(1,WeightSum)//Variable declarations as before
x = 1
i = 1
loop
    if i > Weights then
        break//The typical "exitwhen true" line.
    x =+ Weight[i]//Adds the number to x, meaning that the line translates to "set x = x + Weight[i]"
    if r < x then
        break
    i++//Translates to set i = i + 1
endloop

How would I made something like this? I am somewhat familiar with python, so I can make some simpler scripts if I know how to apply them.
 
Usually you'll go line by line and run it through a tokenizer function--something that splits up a line into parts. Intuitively, you would separate by spaces, but since JASS allows you to do things like set x=I2R(1.00), you won't only separate by spaces. It is probably best to separate by spaces, and then tokenize by smaller things, e.g. '='

#1 rule: start off simple! Go through things one step at a time. I would first try to handle variable assignment. e.g. Weights = 3. You would compile that into set Weights = 3. It is somewhat simple, but try it with multiple cases e.g. Weights=3 or Weights= (3). Things like that. Also, starting with something simple like that will allow you to focus on getting a good layout of your code rather than concerning about the actual functionality.

You should have an input file and an output file (don't modify the input file directly, IMO). Take advantage of python dictionaries! They are immensely useful for this, since you'll be able to keep track of particular functions' local variables, the map's global variables, and function signatures through it. Automatically declaring locals is a little tough since you need to decipher the type that is returned by the expression on the other side. You need the common.j/Blizzard.j functions as well as the function signatures of all functions above the call in the map. IMO, implement that later.

Once you establish the file I/O and some basic features, you can tackle the harder ones.
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
I guess the first and easiest thing to do would be making a new format and a small program that would convert it into vJASS. At this point there is not yet a difference between JASS and vJASS.

So my idea now is that I will abuse the import feature of the JNGPE editor and make it so that the imported files are already converted when I write them.(Thus meaning that I can't write the new code in editor just yet)

How would I do the tokenizing? I think mostly the spaces don't even matter, because JASS doesn't do anything with them.

What I'm confused about is how I would decipher the meaning of things. For instance, in the case of i++ I guess I can make it recognize this by the double plus and then move back 1 token to retrieve the variable(VAR). Then I'd make it replace the whole line with "set VAR = VAR + 1". This seems easy in description, but I'm a bit confused.
 
Level 26
Joined
Aug 18, 2009
Messages
4,097
When you run jasshelper, not only code inducing features like import or macros are processed but everything else, too. So I would suggest to resolve them yourself. About the tokenizing, you can use regular expressions. vJass is rather verbose and restricting though, so searching for the keywords or the = in the above example is quite identifying. Leading/trailing white space can be cut immediately. Your i++ could look like:

\s*(\w*)%+%+\s*

\s space, \w alphanumeric
* greedy quantor, take as much of the token set in front of it as possible
% for escaping
() for capturing

Dunno, may look a bit different in python https://docs.python.org/2/library/re.html
 
Level 23
Joined
Apr 16, 2012
Messages
4,041
x =+ Weight[i]//Adds the number to x, meaning that the line translates to "set x = x + Weight[i]" if you want to do this, change the syntax to +=

the =+ was also used by very old C, but it was later changed(in K&R C I believe) to +=, because what if I have the following:

JASS:
x =- Weight[i]

Should this compile into set x = x - Weigth[i] or set x = -Weigth[i]
 
Level 15
Joined
Aug 7, 2013
Messages
1,338
I suspect most of the syntactic sugar you add can just be done by using a regular expression. I actually have done this myself for my own project. Basically every time I test the new script I re-compile it all by calling a Python script that goes through every .j file and applies the appropriate syntax. The output of this compilation is then a pure vJASS script which jasshelper can compile.

Here's an example of a (dirty and quick) regex that can capture your simple for-loop syntax:

JASS:
for i in 1 to Weights//A loop that runs a set amount of times
    WeightSum = WeightSum + Weight[i]
endfor

Code:
forPattern = re.compile(r'for(?P<forHeader>[^\n]+)\n(?P<forBody>.+(?=endfor))', re.DOTALL)

Outputs:
Code:
[(' i in 1 to Weights//A loop that runs a set amount of times', '    WeightSum = WeightSum + Weight[i]\n    WeightSum = W + S\n')]

This regex pattern will find the for-loop header and also all of the body statements. Note that I added in a terminating word (i.e. closing bracket); you could use indentation/tabs for this. You'd also have to handle comments possibly, depending on your implementation. Once you have the header/body statements, it's just a matter of parsing the header, turning it into a loop structure/instantiating a variable, and then placing the body statements inside. This could also be done recursively (i.e. nested for loops).
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
In before you realize that you can't parse languages with regular expressions, and that you are going to end up with endless bugs (even more so because you are adding stuff that will simply not work in the context of Jass, like ++/-- operators. The idea of C++ing Jass was dumped for a reason.)
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
As in simple string replacements.
The obvious cases are those where the user can write anything without it being a language construct, such as strings and comments.
There are also others though.

Now try to parse this:
JASS:
a = b++
func(a++)
a--b // as in a-(-b), will be broken
1++ // can be fixed with pattern matching to see if the token is a valid identifier
local int breaker // without proper pattern matching, this will be replaced
if x == y // ops, injected a second if and introduced a syntax error, because this injection makes no sense
if x == y and z == w then // do you see what I am getting at?

String replacements go a long way, but you need also actual parsing first.
And adding "features" that makes no sense (like the conditions) doesn't help.
 
Level 14
Joined
Dec 12, 2012
Messages
1,007
So far it all works and I don't see any bugs, Of course, I'm also not using regular expressions.
Yes, the "set", "if" and "then" keywords can be omitted in the current version.

Do those things work:

JASS:
function test takes nothing returns nothing
    local integer break = 5
    break += -(2 + 3)*(-1)
    
    if /*if*/test == break /*then*/
        break--
    endif
endfunction

?
 
Level 15
Joined
Aug 7, 2013
Messages
1,338
@looking_for_help

Well I think it would be wise to just make break a reserved word in the language. That's how it is in Python at least.

Code:
>>> break = 5
SyntaxError: invalid syntax
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
Okay, I will have to reconsider how I'm making this.
For the record, those are the results when run through the compiler:

GhostWolf - An utter mess. You specifically tried to break it.
JASS:
set a = b = a = b + 1//I only considered i++ for use in loops, so this is what happens
set func(a = func(a + 1//same
set a = a - 1
set 1 = 1 + 1
local int exitwhen trueer // without proper pattern matching, this will be replaced
if x == y // ops, injected a second if and introduced a syntax error, because this injection makes no sense then
if if x == y and z == w then // do you see what I am getting at?

looking_for_help - This went better than I expected actually. Only 1 mistake.
JASS:
function test takes nothing returns nothing
    set = 5
     set break = break +  -(2 + 3)*(-1)
   
    if /*if*/test == break /*then*/ then
        set break = break - 1
    endif
endfunction

What I will have to do:
Add some context-sensitivity(allow "break" as a variable name; ignore comments; clearly I need to make my own parsing functions)
 
Level 14
Joined
Dec 12, 2012
Messages
1,007
@looking_for_help

Well I think it would be wise to just make break a reserved word in the language.

Unfortunatly naming a variable "break" is perfectly valid Jass/vJass syntax, such restrictions should be done with care because they break valid code.

GhostWolf - An utter mess. You specifically tried to break it.

Thats no argument ;)

Such language extensions have to work under all circumstances, otherwise they will cause more problems than they will solve (IMO).

And thats not possible with just performing string replacements, you will have to build a lexer that tokenizes the code and then a parser that "understands" the code.
 
Level 26
Joined
Aug 18, 2009
Messages
4,097
Such language extensions have to work under all circumstances, otherwise they will cause more problems than they will solve (IMO).

From my experience no, not at all. It's not that I make so many typos or non-obvious syntax errors. I just like to write my code neatlier, avoid redundancy and gain more readability. Also it is not limited to syntactic sugar.
 
Level 14
Joined
Dec 12, 2012
Messages
1,007
From my experience no, not at all. It's not that I make so many typos or non-obvious syntax errors.

Sorry, but the examples from GhostWolf were really simple and already broke everything while most of them should be valid syntax (not talking about typos).

Such a system is very brittle and likely introduces many problems. If thats ok for you, ok. But he asked how to do it and so its best to point out the problems of such an approach.

And omitting both if and else at the same time really isn't a good idea IMO. What about such lines:

JASS:
local boolean b = true or false // If or asignment? Does your regex check this?
string s="\\\\\"\"break\\" /*/* break*/*/ // Valid syntax, quite hard with regex
local integer i = 1--/*also valid*/----1
if string_array[int_array[i++]] == ""
local string s += "bbb
	bbb" // Multiline strings?
loop
    break and false // What happens here?
endloop

If you don't care about all that, its fine.
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
Unfortunatly naming a variable "break" is perfectly valid Jass/vJass syntax, such restrictions should be done with care because they break valid code.



Thats no argument ;)

Such language extensions have to work under all circumstances, otherwise they will cause more problems than they will solve (IMO).

And thats not possible with just performing string replacements, you will have to build a lexer that tokenizes the code and then a parser that "understands" the code.

How would I build a lexer?
Sorry, but the examples from GhostWolf were really simple and already broke everything while most of them should be valid syntax (not talking about typos).

Such a system is very brittle and likely introduces many problems. If thats ok for you, ok. But he asked how to do it and so its best to point out the problems of such an approach.

And omitting both if and else at the same time really isn't a good idea IMO. What about such lines:

JASS:
local boolean b = true or false // If or asignment? Does your regex check this?
string s="\\\\\"\"break\\" /*/* break*/*/ // Valid syntax, quite hard with regex
local integer i = 1--/*also valid*/----1
if string_array[int_array[i++]] == ""
local string s += "bbb
	bbb" // Multiline strings?
loop
    break and false // What happens here?
endloop

If you don't care about all that, its fine.

JASS:
set = true or false //When writing the compiler I didn't remember to consider local declarations.
string s="\\\\\"\"exitwhen true\\" /*/* exitwhen true*/*/ // Valid syntax, quite hard with regex. I don't think it's that hard because I just need to avoid reading anything inside comments.
set local integer i = 1 = local integer i = 1 - 1//Same problem with i++ as in previous examples. 
set if string_array[int_array[i = if string_array[int_array[i + 1
 set local string s = local string s +  "bbb
    bbb" // Multiline strings? Are those even allowed in usual JASS?
loop
    exitwhen true and false // What happens here? Never runs.
endloop
Omitting if is bad, if you do not have a comparison there but a simple call to a boolean function, it gets ambiguous. And it lowers the readability.

In that case I would only omit the "then" keyword.
Please use Wurst or I cannot take you seriously any longer.
Writing Jass sucks and it will not improve with your String replacements.
This is a fact.
That's your personal opinion. I choose to improve myself with this kind of stuff though. I said before why I am not using wurst.

I haven't rewritten the thing yet, but I'm taking notes of what needs to be changed.
 
Level 14
Joined
Dec 12, 2012
Messages
1,007
How would I build a lexer?

Look here for a high quality implementation of a lexer (also called scanner). Try to understand the "NextToken" method.

JASS:
string s="\\\\\"\"exitwhen true\\" /*/* exitwhen true*/*/ // Valid syntax, quite hard with regex. I don't think it's that hard because I just need to avoid reading anything inside comments.

Yes, but such comments can be nested and you have to count the nest-levels properly to detect whether a nested comment was closed or is still open.

JASS:
set local string s = local string s +  "bbb
    bbb" // Multiline strings? Are those even allowed in usual JASS?

Yes, they are allowed in vanilla Jass. Block comments are not however and as you didn't notice/were able to compile those I guess you are using vJass anyway, right? Wouldn't really make that much sense to make a language extension for more comfort and not use vJass...

JASS:
loop
    exitwhen true and false // What happens here? Never runs.
endloop

Almost right, it runs infintly and crashes the thread ;)

In that case I would only omit the "then" keyword.

Thats a good idea.
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
JASS:
string s="\\\\\"\"exitwhen true\\" /*/* exitwhen true*/*/ // Valid syntax, quite hard with regex. I don't think it's that hard because I just need to avoid reading anything inside comments.

Yes, but such comments can be nested and you have to count the nest-levels properly to detect whether a nested comment was closed or is still open.

JASS:
set local string s = local string s +  "bbb
    bbb" // Multiline strings? Are those even allowed in usual JASS?

Yes, they are allowed in vanilla Jass. Block comments are not however and as you didn't notice/were able to compile those I guess you are using vJass anyway, right? Wouldn't really make that much sense to make a language extension for more comfort and not use vJass...

JASS:
loop
    exitwhen true and false // What happens here? Never runs.
endloop

Almost right, it runs infintly and crashes the thread ;)

I think nesting is fairly easy to handle with a stack. I need to do that anyway if I am ever going to omit endblock keywords(Which I intend to do). Probably the hardest thing to do for me is writing the actual parsing that would make use of those stacks or run in parallel with them.

Yes, I have so far not added anything that conflicts with vJASS. In fact, I use this kind of code in external files and combine it with the import feature of vJASS.

The loop would indeed run endlessly, but the break statement isn't supposed to be used like this in the first place. Nevertheless, I will make the break statement only work if it is the only thing on the line. If it isn't, then it will be considered a variable.

Btw, how do block comments work exactly? Does being able to use them mean that \n is not necessary?
 
Level 14
Joined
Dec 12, 2012
Messages
1,007
I think nesting is fairly easy to handle with a stack. I need to do that anyway if I am ever going to omit endblock keywords(Which I intend to do).

Yes it is easy, with a proper lexer.

Btw, how do block comments work exactly? Does being able to use them mean that \n is not necessary?

They work like braces: /* first level /* second level /* third level */ second level */ first level */
and yes you don't need \n with them. Unfortunatly even such perverted things are possible in vJass:

JASS:
local int/*
some random comments
*/eger i = 3
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
Yes it is easy, with a proper lexer.



They work like braces: /* first level /* second level /* third level */ second level */ first level */
and yes you don't need \n with them. Unfortunatly even such perverted things are possible in vJass:

JASS:
local int/*
some random comments
*/eger i = 3

I miswrote it. I wanted to write block strings. How about those?
 
Level 6
Joined
Jul 30, 2013
Messages
282
well vjass is kind of dead and could use some love.. maybe you'd bother taking that up?
if youre serious that is..

also.. cjass js not as evil as ppl say it is... its just got crappy docs. (and you can't use modules.... :(.. but ive only seen very little map side code that bothers with those anyway)
it comes with lots of those shorthands tho and for/foreach, lambda ..

just my 2 cents.
 
Level 17
Joined
Jul 17, 2011
Messages
1,863
parsing a language with c++ like syntax is very hard you need to know how to make a lot of things to make it work basically if anyone posessed that kind of knowledge, they would not be hanging around a site like thw but just removing things like "set" and "then" and "endif" would be beneficial for me :thumbs_up::thumbs_up:
 
Level 6
Joined
Jul 30, 2013
Messages
282
like i said cjass already does that..
if all u want is to get rid of call/set/local then why bother making your own thing that 99% will be bugged anyway..

also.. don't parse programming languages with regex..(or not-so-programming-languages for that matter )

finding a substring is one thing.. but if u actually want to generate code you will be standing knee deep in a puddle of blood in no time.
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
like i said cjass already does that..
if all u want is to get rid of call/set/local then why bother making your own thing that 99% will be bugged anyway..

also.. don't parse programming languages with regex..(or not-so-programming-languages for that matter )

finding a substring is one thing.. but if u actually want to generate code you will be standing knee deep in a puddle of blood in no time.
The main point for me personally is not doing things better, but learning more about how to program various things. I'm the kind of person that learns many things aside from my own specialty in order to be versatile.
Well you can easily find out such things by just trying them out yourself, right?
I assumed that you can simply tell me that, as I usually don't want to bother with tests when the question is about a specific and probably widely-known detail.

About this project, I think I will not do anything big for some time, because I have motivation and health problems (those usually come together in my case).
 
Level 14
Joined
Dec 12, 2012
Messages
1,007
About this project, I think I will not do anything big for some time, because I have motivation and health problems (those usually come together in my case).

Ok, this is sad. Then I hope you have a quick recovery ;)

The main point for me personally is not doing things better, but learning more about how to program various things. I'm the kind of person that learns many things aside from my own specialty in order to be versatile.

Then write a lexer. This is a very good practise and also teaches a lot about how code is "understood" by the compiler.
 
Level 6
Joined
Jul 30, 2013
Messages
282
The main point for me personally is not doing things better, but learning more about how to program various things. I'm the kind of person that learns many things aside from my own specialty in order to be versatile.

I assumed that you can simply tell me that, as I usually don't want to bother with tests when the question is about a specific and probably widely-known detail.

About this project, I think I will not do anything big for some time, because I have motivation and health problems (those usually come together in my case).

k.. that changes stuff considerably..
tho i'd still be careful about how much regex i use if at all. regex may be nice and convenient but some things are really awkward or even impossible to do.

also its rather hard to keep contex using regex. without context you will be severely limited.
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
At the very least, you have to remove every string and comment from the source, before replacements happen.
An easy way to do this, is make a map of all the strings and comments, where each one is mapped to a number, and then replace every occurrence with that number and a character that the user can't input, such as \0, \1, etc.

In addition, not checking for actual tokens (e.g. breaker is not break) isn't ideal.
 
Level 21
Joined
Mar 27, 2012
Messages
3,232
Yeah, at first I thought I could get away with checking for specific strings, but apparently not. When and if I'll work on this again I will probably try writing a tokenizer that can at least know when "break" is the only thing on the line along with some other features.
 

Deleted member 219079

D

Deleted member 219079

You'll need to contact Vexorian for the source code, and make your changes to it.
 
Level 6
Joined
Jul 30, 2013
Messages
282
cohadar was crying over the vjass source at one point i recall..
unfortunately cohadars changes broke a ton of stuff so i cant use all the nice stuff he added :'(
 
Status
Not open for further replies.
Top