• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

Antlr4: How's my grammar? (functions + expressions + scalars + arrays)

Status
Not open for further replies.
Level 31
Joined
Jul 10, 2007
Messages
6,306
Code:
grammar Grammar;

import Operation;

firstrule: (assignment | function)*;      //nothing

assignment: Identifier | array '=' expression;                                  //onEnter, if var == Identifier, set map, otherwise set double map?
                                                                                //visit expression with a visitor

expression: expression op=EXP<assoc=right> expression                           #OpExponent         //return exponent, visit expression
          | expression op=(MUL|DIV) expression                                  #OpMulDiv           //return mult or div, visit expression
          | expression '(' expression ')'                                       #OpExpressionMul1   //return mult, visit expression
          | function                                                            #LFunction          //visit function
          | expression Identifier                                               #OpExpressionMul2   //return mult, visit expression
          | expression op=(ADD|SUB) expression                                  #OpAddSub           //return
          | array                                                               #LArray             //visit array
          | Identifier                                                          #LIdentifier
          | Integer                                                             #LInteger
          | Float                                                               #LFloat
          | '(' expression ')'                                                  #OpExpression
          ;

array: Identifier '[' expression ']';                                           //read value given expression, validate that the expression is an integer

argumentList: expression (',' expression)*;

function: Identifier '(' argumentList ')';

COMMENT:                '//' ~[\r\n]* -> skip;
COMMENT_DELIMITED:      '/*' .* '*/' -> skip;
Integer:                '0' | ([1-9][0-9]*);
Float:                  ('0' | ([1-9][0-9]*)) ('.' [0-9]*)?;
Identifier:             [a-zA-Z_][a-zA-Z0-9_]*;
WS:                     [ \t\r\n] -> skip;

Code:
lexer grammar Operation;

EXP: '**';
MUL: '*';
DIV: '/';
ADD: '+';
SUB: '-';
 
Last edited:
From a fast look it appears to be ambiguous, but I'm not sure how your parser generator handles the rules, So I'm guessing it's ambiguous from a theorical aspect.

for instance, according to the rules, we don't know if 3 * 4 + 2 is build into:
*
3 +
4 2

or

+
* 2
3 4

This causes some shift-reduce problems when generating the parser. Fixing ambiguousity by hand is a bit of a pain sometimes and can really make your grammar look ugly, most parser libraries have ways to help solve ambiguousities by defining "precedence" or selection rules.
 
Level 29
Joined
Jul 29, 2007
Messages
5,174
  • array: Identifier '[' expression ']'; Should be array: '[' argumentList ']'; (What's the identifier even for in this context?)
  • It is ambiguous whether 0 is an integer or float.
  • [a-zA-Z0-9_] = [\w]
  • [ \t\r\n] = [\s]
  • I would imagine that putting all the constant expressions (function, Identifier, ...) before the ones that themselves begin in an expression would make more sense, and probably make it faster too, but that's just a guess.

By the way, why do you even want a distinction between integers and floats? a single Number class is nicer and more abstracted.
 
Level 31
Joined
Jul 10, 2007
Messages
6,306
It's Antlr, which resolves ambiguities through precedence. It also places all lexer rules above parser rules.

0 would be an integer as integer is before float

[a-zA-Z0-9_] = [\w]
[ \t\r\n] = [\s]

dunno if that'll work, it's not full regex, but cool ;)

I would imagine that putting all the constant expressions (function, Identifier, ...) before the ones that themselves begin in an expression would make more sense, and probably make it faster too, but that's just a guess.

it won't ;p

By the way, why do you even want a distinction between integers and floats? a single Number class is nicer and more abstracted.

typechecking
 
Status
Not open for further replies.
Top