Building an RPN to Equation Parser

In the final part of the examination of lex and yacc, here are the rules for building a parser that translates RPN into equation input (the reverse of the Equation to RPN parser.Translating RPN into standard equation format is a lot more difficult. Although the fundamentals are similar to the RPN parser (we still use a stack for values that are popped off when we see an operand), it is the recording of that process is much more difficult. In the RPN calculator, we can place the result of the calculation back onto the stack so that the value can be used. To resolve something into the equation format we need to record the equivalent expression, not the value. For that, we use a temporary string, and then check if the temporary string has a value and append further expressions to that string. Also, to help precedence in the final calculation (a process handled automatically by the sequence of numbers an operands in RPN) we also enclose each stage of the calculation in parentheses. The resulting rules are shown below. Note that for the example, only the basic operands (+ – * /) are supported, but the principles are valid for any combination.

%%list:   /* nothing */        | list EOLN        | list expr EOLN        { printf( "%sn",exprstring); }        ;expr:   primary        | expr primary MUL          {            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s * %g)",exprstring, pop());              }            else              {                sprintf(tmpstring,"( %g * %g )",pop(),pop());              }            strcpy(exprstring,tmpstring);          }        | expr primary DIV          {            temp=pop();            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s / %g)",exprstring, temp);              }            else              {                sprintf(tmpstring,"( %g / %g )",pop(),temp);              }            strcpy(exprstring,tmpstring);          }        | expr primary PLUS          {            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s + %g)",exprstring, pop());              }            else              {                sprintf(tmpstring,"( %g + %g )",pop(),pop());              }            strcpy(exprstring,tmpstring);          }        | expr primary MINUS          {            temp=pop();            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s - %g)",exprstring, temp);              }            else              {                sprintf(tmpstring,"( %g - %g )",pop(),temp);              }            strcpy(exprstring,tmpstring);          }        ;primary: NUMBER { push($1); }        ;%%

You can see the resulting output below:

4 5 + 6 *(( 4 + 5 ) * 6)

As mentioned in the original IBM article, we can pipe sequences together to show the parsing and calculation of an expression from different formats. For example:

$ rpntoequ|calc    4 5 + 6 *54

And even rpntoequ and equtorpn:

$ rpntoequ|equtorpn  4 5 + 6 *4 5 + 6 *

The current RPN translator as shown here is not as advanced as the main RPN system, and so it doesn’t support all the options, or expression formats, but you can get the general idea. You can download the code for this example: rpntoequ.tar.gz (Unix).

Building an Equation to RPN Parser

As part of the continuing examination of lex and yacc, here are the rules for building a parser that translates equations into RPN format.The process is actually very simple. Because of the way the parser works, all you have to do is print out whatever component we see at each stage. For example, when you see a number, print it out, and when you see a operand, also print it out. The basic ruleset is shown below:

%%list:   /* nothing */        | list EOLN        | list expr EOLN        { printf( "n" ); }        ;expr:   shift_expr        ;shift_expr: pow_expr        | shift_expr LEFTSHIFT pow_expr { printf("> "); }        ;pow_expr: add_expr        | pow_expr POW add_expr { printf("^ "); }        ;add_expr: mul_expr        | add_expr PLUS mul_expr  { printf("+ "); }        | add_expr MINUS mul_expr { printf("- "); }        ;mul_expr: unary_expr        | mul_expr MUL unary_expr { printf("* "); }        | mul_expr DIV unary_expr { printf("/ "); }        | mul_expr MOD unary_expr { printf("% "); }        ;unary_expr: postfix_expr        | MINUS primary %prec UNARYMINUS { printf("-"); }        | INC unary_expr { printf("++ "); }        | DEC unary_expr { printf("-- "); }        ;postfix_expr: primary        | postfix_expr INC { printf("++ "); }        | postfix_expr DEC { printf("-- "); }        | postfix_expr FACT { printf("! "); }        ; primary: NUMBER { printf("%g ",$1); }        | PI { printf("%g ", M_PI); }        | OPENBRACKET expr CLOSEBRACKET { }        | function_call        ;function_call: SIN OPENBRACKET expr CLOSEBRACKET { printf("sin "); }        | COS OPENBRACKET expr CLOSEBRACKET { printf("cos "); }        | TAN OPENBRACKET expr CLOSEBRACKET { printf("tan "); }        | ASIN OPENBRACKET expr CLOSEBRACKET { printf("asin "); }        | ACOS OPENBRACKET expr CLOSEBRACKET { printf("acos "); }        | ATAN OPENBRACKET expr CLOSEBRACKET { printf("atan "); }        ;%%

Why does it work? It has to do with the parser evaluates the different components. When, for example, the parser identifies an addition with this rule:

add_expr: mul_expr        | add_expr PLUS mul_expr  { printf("+ "); }

The code that the parser generates evaluates the sub-rules first, and in both cases the rules will ultimately lead to the numerical value. Each time the number is seen, the value is printed. Once both rules have been resolved, it then matches the full expression and outputs the plus sign. In use, the parser generates all of the necessary RPN:

4+5*64 5 6 * + (4+5)*64 5 + 6 *

You can download the source for the equation to RPN parser: equtorpn.tar.gz (Unix)

Building an RPN Calculator

There’s an tutorial shortly due to appear at IBM developerWorks that covers the process behind building a calculator using the lex and yacc (or flex and bison) tools to build a parser. The tutorial covers a natural expression parser, i.e. one capable of processing:


I suggest a couple of extensions in the tutorial, namely a Reverse Polish Notation (RPN) calculator, and translators that convert to/from RPN and standard equation format. Here, we’re going to start with looking at the RPN calculator. The RPN system is more straightforward for people to learn when you think about typical equations, for example you might write:

4563 +

In RPN, you would enter this as:

45 63 +

From a parsing point of view, the process is also easier, because you can perform the calculation by pushing the numbers on to the stack and then performing a calculation with those two numbers. This hugely simplifies the parser, but it only has to push numbers and pop them off when it sees the operand, rather than having to extract both numbers and parser from the input text. Even better, compound calculations can be made easier because the result of one calculation can be pushed back on to the stack for the next part. For example, the following equation:

45 63 + 23 *

Can be read as:

  1. Push 45 on to stack
  2. Push 63 on to stack
  3. Pop value off stack, add it to another value popped off the stack
  4. Push result to stack
  5. Pop value off stack, multiply by value popped off stack

The lexical analysis component (i.e. the lex definitions) remain the same, it’s only the parser that changes. Before we examine the yacc rules, you need to see the simple stack system. It provides two functions, one pushes values on, and the other pops values off. All the values are stored in a simple array and a global stack pointer holds the current storage location so that values can be popped off or pushed back:

#include "globdefs.h"int sp=0;double val[MAXVAL];void push(f)double f;{  if (sp  0)    return(val[--sp]);  else  {    printf("error: stack emptyn");    return 0.0;  }}

The yacc rules for a simple RPN parser are shown below (the rest of the surrounding code is identical).

%%list:   /* nothing */        | list EOLN                       { printf( "%gn" , pop()); }        | list exprlist EOLN          { printf( "%gn" , pop()); }        ;exprlist: shift_expr        | exprlist shift_expr        ;shift_expr: add_expr        | shift_expr LEFTSHIFT           {             temp=pop();            push(((int)pop()) > ((int)temp));           }        ;add_expr: mul_expr        | add_expr PLUS           { push(pop()+pop()); }        | add_expr MINUS           {             temp=pop();             push(pop()-temp);           }        ;mul_expr: unary_expr        | add_expr MUL           { push(pop()*pop()); }        | add_expr DIV           {             temp=pop();            push(pop()/temp);           }        | add_expr MOD           {             temp=pop();            push(fmod(pop(),temp));           }        ;  unary_expr: primary        | MINUS primary %prec UNARYMINUS { push(-pop()); }        | unary_expr INC { push(pop()+1); }        | unary_expr DEC { push(pop()-1); }        ; primary: NUMBER { push($1); }        | PI { push(M_PI); }        ;%%

You can see here that numbers are simply pushed onto the stack:

primary: NUMBER { push($1); }        | PI { push(M_PI); }        ;

While any calculation is a case of popping off the values and putting them back on the stack:

add_expr: mul_expr        | add_expr PLUS { push(pop()+pop()); }        | add_expr MINUS { temp=pop(); push(pop()-temp); }        ;

The ruleset is shorter, partially because this RPN calculator is not as advanced, but also because the process is much simpler because the rules don’t need to take into account the complex structure of a typical equation line. In a future post we’ll cover the RPN to equation and equation to RPN parsers. You can download the complete code for the RPN calculator as rpn.tar.gz (Unix).