• Listen to a special audio message from Bill Roper to the Hive Workshop community (Bill is a former Vice President of Blizzard Entertainment, Producer, Designer, Musician, Voice Actor) 🔗Click here to hear his message!
  • Read Evilhog's interview with Gregory Alper, the original composer of the music for WarCraft: Orcs & Humans 🔗Click here to read the full interview.

Parsing jass file with c++

Status
Not open for further replies.
Level 10
Joined
Jun 6, 2007
Messages
392
Hello! I'm writing a simple tool to process (e.g. combine) jass files in c++. I'd like to find a (simple) way to detect second definitions.

So let's say for example, that I want to combine files a.j and b.j into c.j. Both a.j and b.j contain a global variable called "udg_index" (quite common name). What I want to do, is to copy that variable only once to c.j (if variables have same type) or display an error message (different type). In a similar way I want to prevent function redeclarations. What would be the easiest approach to this problem?
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
If you do actual parsing (as opposed to some basic string manipulations), you'll end up with a map of globals, and a map of functions, and by map I mean std::map (or whatever implementation).
While you add objects to these maps, you can check if they already exist, and act accordingly.

Not sure what's the purpose of this though.
 
Level 10
Joined
Jun 6, 2007
Messages
392
Well, I guess parsing was a bad word choice. My goal is to simply copy the contents of 2 files into one, so that there are no name conflicts. The best option would be to rename all conflicting symbols in one file.

This is how I thought it could work (without renaming, simply leaving out conflicting definitions):
Code:
1. copy stuff before globals in file 1 to output
2. copy stuff before globals in file 2 to output
3. write "globals" to output
4. for each line in file 1 until "endglobals"
  4.1. read words from line until word isn't a reserved word
  4.2. store word into globals-vector
  4.3. copy line to output
5. for each line in file 2 until "endglobals"
  5.1. read words from line until word isn't a reserved word
  5.2. if word exists in globals-vector
    5.2.1. then do nothing
    5.2.2. else copy line to output
6. write "endglobals" to output

7. read lines from file 1 until end of file
  7.1. read words until word == "function" or end of line
  7.2 if not end of line
    7.2.1. then word = next word
    7.2.2. store word  to functions-vector
  7.3. copy line to output
  7.4. while line != endfunction
    7.4.1. read and copy line to output

8. read lines from file 2 until end of file
  8.1. read words until word == "function" or end of line
  8.2 if not end of line
    8.2.1. then if word exists in functions-vector
      8.2.1.1. then read lines until line == "endfunction"
      8.2.1.2. else read and copy lines to output until line == "endfunction"
 
Level 15
Joined
Oct 18, 2008
Messages
1,591
I think instead of a line to line approach, you should use an object oriented, data driven approach. Use a data structure like this (note: I'm using C# definitions since I'm more familiar with it that C++):

class ClassData
{
string Name;
string Fields;
FunctionData[] Functions;
ClassData[] SubClassesAndStructs;
}

Then recursively parse your code into this (I'd advice using regex for it, but you can also use a line-by-line approach). First parse the subclasses/structs recursively until you reach the innermost. Then parse the functions; remove the empty lines and save the rest into the fields variable. I'm not sure about JASS having subclasses or structs or whatever; I haven't touched JASS code since 3 years ago, and even then I only used it where necessary, but for this process, it shouldn't really matter.
 
Level 10
Joined
Jun 6, 2007
Messages
392
Thanks! With jass it's actually simpler, because there are no classes/structs (except in vjass). Basically jass files are like:
Code:
globals
  type1 name1 = value1
  type2 name2
  ...
endglobals
function f1 takes params1 returns returntype1
  ...
endfunction
function f2 takes params2 returns returntype2
  ...
endfunction
...
So there are only globals and functions which makes reading jass files easier.
 
Level 10
Joined
Jun 6, 2007
Messages
392
Ahh thanks for your post. I realized that I shouldn't have mentioned renaming because of one issue:

Let's say, that I again have files a.j and b.j. I want to extend a with functionality from b. So file a contains libraries (not in vjass sense, just a collection of functions and globals) lib_x and lib_y, and file b contains libraries lib_y and lib_z. I'd like the result file to have one instance of lib_x, lib_y and lib_z. Renaming would produce 2 instances of lib_y. Now if for example y is a spell, it will be executed twice on cast event.
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
Can you give an actual real world case where this would happen?
What are you trying to do?

Either way, if you want to avoid collisions, you'd obviously need to do some parsing.
It's not as easy as just looking for the globals/endglobals and function/endfunction keywords, because you also need to take comments and strings into consideration. You'll need to walk through the characters and try to match the starting of some scope (e.g. "function" "globals" "/*" "//"). Once a scope is found, keep reading until the ending token of it is found ("endfunction" "endglobals" "*/" "\n").
While you're doing that, you can grab additional data, like the name of a global/function you are processing, which you can use to avoid collisions.

Not sure if it's of any help, but this is how I did it in Ruby.
 
Level 10
Joined
Jun 6, 2007
Messages
392
Well, I have this hobby of creating new abilities and adding them to existing maps (only for fun and for personal use). This tool could be used for merging the original map script with the spell script.

I believe that I now have an understanding on how to approach this problem. Big thanks to you both!
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,255
Consider using a parser based language like YACC rather than a standard programming language.

Obviously you will need to fine one appropriate to your platform.

I am not entirely sure of how one would write a well designed parser using C/C++. Obviously it will need a whole lot of tables for the detected state and also will need some kind of stack to handle the parsing. It will also need to loop until end of file. Probably needs some method to get "next argument".
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
It's actually not that hard to write a specialized parser, especially for Jass, since it's so simple.
Granted, without regular expressions it's more work to get tokens.
With them the Ruby code I posted does it just fine.
As long as you have a working function that gets the next token from some position, you are good to go.
The state stack is very simple.
 
Status
Not open for further replies.
Top