## Building an RPN to Equation Parser

In the final part of the examination of lex and yacc, here are the rules for building a parser that translates RPN into equation input (the reverse of the Equation to RPN parser.Translating RPN into standard equation format is a lot more difficult. Although the fundamentals are similar to the RPN parser (we still use a stack for values that are popped off when we see an operand), it is the recording of that process is much more difficult. In the RPN calculator, we can place the result of the calculation back onto the stack so that the value can be used. To resolve something into the equation format we need to record the equivalent expression, not the value. For that, we use a temporary string, and then check if the temporary string has a value and append further expressions to that string. Also, to help precedence in the final calculation (a process handled automatically by the sequence of numbers an operands in RPN) we also enclose each stage of the calculation in parentheses. The resulting rules are shown below. Note that for the example, only the basic operands (+ – * /) are supported, but the principles are valid for any combination.

`%%list:   /* nothing */        | list EOLN        | list expr EOLN        { printf( "%sn",exprstring); }        ;expr:   primary        | expr primary MUL          {            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s * %g)",exprstring, pop());              }            else              {                sprintf(tmpstring,"( %g * %g )",pop(),pop());              }            strcpy(exprstring,tmpstring);          }        | expr primary DIV          {            temp=pop();            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s / %g)",exprstring, temp);              }            else              {                sprintf(tmpstring,"( %g / %g )",pop(),temp);              }            strcpy(exprstring,tmpstring);          }        | expr primary PLUS          {            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s + %g)",exprstring, pop());              }            else              {                sprintf(tmpstring,"( %g + %g )",pop(),pop());              }            strcpy(exprstring,tmpstring);          }        | expr primary MINUS          {            temp=pop();            if (strlen(exprstring) > 0)              {                sprintf(tmpstring,"(%s - %g)",exprstring, temp);              }            else              {                sprintf(tmpstring,"( %g - %g )",pop(),temp);              }            strcpy(exprstring,tmpstring);          }        ;primary: NUMBER { push(\$1); }        ;%%`

You can see the resulting output below:

`4 5 + 6 *(( 4 + 5 ) * 6)`

As mentioned in the original IBM article, we can pipe sequences together to show the parsing and calculation of an expression from different formats. For example:

`\$ rpntoequ|calc    4 5 + 6 *54`

And even rpntoequ and equtorpn:

`\$ rpntoequ|equtorpn  4 5 + 6 *4 5 + 6 *`

The current RPN translator as shown here is not as advanced as the main RPN system, and so it doesn’t support all the options, or expression formats, but you can get the general idea. You can download the code for this example: rpntoequ.tar.gz (Unix).

## Building an Equation to RPN Parser

As part of the continuing examination of lex and yacc, here are the rules for building a parser that translates equations into RPN format.The process is actually very simple. Because of the way the parser works, all you have to do is print out whatever component we see at each stage. For example, when you see a number, print it out, and when you see a operand, also print it out. The basic ruleset is shown below:

`%%list:   /* nothing */        | list EOLN        | list expr EOLN        { printf( "n" ); }        ;expr:   shift_expr        ;shift_expr: pow_expr        | shift_expr LEFTSHIFT pow_expr { printf("> "); }        ;pow_expr: add_expr        | pow_expr POW add_expr { printf("^ "); }        ;add_expr: mul_expr        | add_expr PLUS mul_expr  { printf("+ "); }        | add_expr MINUS mul_expr { printf("- "); }        ;mul_expr: unary_expr        | mul_expr MUL unary_expr { printf("* "); }        | mul_expr DIV unary_expr { printf("/ "); }        | mul_expr MOD unary_expr { printf("% "); }        ;unary_expr: postfix_expr        | MINUS primary %prec UNARYMINUS { printf("-"); }        | INC unary_expr { printf("++ "); }        | DEC unary_expr { printf("-- "); }        ;postfix_expr: primary        | postfix_expr INC { printf("++ "); }        | postfix_expr DEC { printf("-- "); }        | postfix_expr FACT { printf("! "); }        ; primary: NUMBER { printf("%g ",\$1); }        | PI { printf("%g ", M_PI); }        | OPENBRACKET expr CLOSEBRACKET { }        | function_call        ;function_call: SIN OPENBRACKET expr CLOSEBRACKET { printf("sin "); }        | COS OPENBRACKET expr CLOSEBRACKET { printf("cos "); }        | TAN OPENBRACKET expr CLOSEBRACKET { printf("tan "); }        | ASIN OPENBRACKET expr CLOSEBRACKET { printf("asin "); }        | ACOS OPENBRACKET expr CLOSEBRACKET { printf("acos "); }        | ATAN OPENBRACKET expr CLOSEBRACKET { printf("atan "); }        ;%%`

Why does it work? It has to do with the parser evaluates the different components. When, for example, the parser identifies an addition with this rule:

`add_expr: mul_expr        | add_expr PLUS mul_expr  { printf("+ "); }`

The code that the parser generates evaluates the sub-rules first, and in both cases the rules will ultimately lead to the numerical value. Each time the number is seen, the value is printed. Once both rules have been resolved, it then matches the full expression and outputs the plus sign. In use, the parser generates all of the necessary RPN:

`4+5*64 5 6 * + (4+5)*64 5 + 6 *`

You can download the source for the equation to RPN parser: equtorpn.tar.gz (Unix)

## O’Reilly RSS Feeds getting annoying

For some strange reason at the moment, O’Reilly feeds are creating duplicates. It’s obviously deliberate, but I can’t see the logic behind the process. The main culprit (but not the only one) is a series on Refactoring (called Refactoring Everything). Refactoring is an important part of any programming process, but the application they are looking at specifically is Perl based. However, of the feeds I subscribe the latest issue (7) and all previous issues, as well a a number of other articles, has appeared on:

Every single one of those links is unique…The ONLamp link is obviously the key one, and I can, at a stretch, understand why it appears on the PHP and Python areas. The Apache one also a little relevance (it’s key to the LAMP stack), but while the application is web based, the refactoring has nothing to do with Apache. But why does it also appear on the BSD feed? I guess you could be refactoring on a BSD platform, but why pollute a BSD focused feed with a programming story?

## Building an RPN Calculator

There’s an tutorial shortly due to appear at IBM developerWorks that covers the process behind building a calculator using the lex and yacc (or flex and bison) tools to build a parser. The tutorial covers a natural expression parser, i.e. one capable of processing:

(4+5)*6

I suggest a couple of extensions in the tutorial, namely a Reverse Polish Notation (RPN) calculator, and translators that convert to/from RPN and standard equation format. Here, we’re going to start with looking at the RPN calculator. The RPN system is more straightforward for people to learn when you think about typical equations, for example you might write:

`4563 +`

In RPN, you would enter this as:

45 63 +

From a parsing point of view, the process is also easier, because you can perform the calculation by pushing the numbers on to the stack and then performing a calculation with those two numbers. This hugely simplifies the parser, but it only has to push numbers and pop them off when it sees the operand, rather than having to extract both numbers and parser from the input text. Even better, compound calculations can be made easier because the result of one calculation can be pushed back on to the stack for the next part. For example, the following equation:

45 63 + 23 *

1. Push 45 on to stack
2. Push 63 on to stack
3. Pop value off stack, add it to another value popped off the stack
4. Push result to stack
5. Pop value off stack, multiply by value popped off stack

The lexical analysis component (i.e. the lex definitions) remain the same, it’s only the parser that changes. Before we examine the yacc rules, you need to see the simple stack system. It provides two functions, one pushes values on, and the other pops values off. All the values are stored in a simple array and a global stack pointer holds the current storage location so that values can be popped off or pushed back:

`#include "globdefs.h"int sp=0;double val[MAXVAL];void push(f)double f;{  if (sp  0)    return(val[--sp]);  else  {    printf("error: stack emptyn");    return 0.0;  }}`

The yacc rules for a simple RPN parser are shown below (the rest of the surrounding code is identical).

`%%list:   /* nothing */        | list EOLN                       { printf( "%gn" , pop()); }        | list exprlist EOLN          { printf( "%gn" , pop()); }        ;exprlist: shift_expr        | exprlist shift_expr        ;shift_expr: add_expr        | shift_expr LEFTSHIFT           {             temp=pop();            push(((int)pop()) > ((int)temp));           }        ;add_expr: mul_expr        | add_expr PLUS           { push(pop()+pop()); }        | add_expr MINUS           {             temp=pop();             push(pop()-temp);           }        ;mul_expr: unary_expr        | add_expr MUL           { push(pop()*pop()); }        | add_expr DIV           {             temp=pop();            push(pop()/temp);           }        | add_expr MOD           {             temp=pop();            push(fmod(pop(),temp));           }        ;  unary_expr: primary        | MINUS primary %prec UNARYMINUS { push(-pop()); }        | unary_expr INC { push(pop()+1); }        | unary_expr DEC { push(pop()-1); }        ; primary: NUMBER { push(\$1); }        | PI { push(M_PI); }        ;%%`

You can see here that numbers are simply pushed onto the stack:

`primary: NUMBER { push(\$1); }        | PI { push(M_PI); }        ;`

While any calculation is a case of popping off the values and putting them back on the stack:

`add_expr: mul_expr        | add_expr PLUS { push(pop()+pop()); }        | add_expr MINUS { temp=pop(); push(pop()-temp); }        ;`

The ruleset is shorter, partially because this RPN calculator is not as advanced, but also because the process is much simpler because the rules don’t need to take into account the complex structure of a typical equation line. In a future post we’ll cover the RPN to equation and equation to RPN parsers. You can download the complete code for the RPN calculator as rpn.tar.gz (Unix).

## LoCA 2006 Keynote

I will be doing the opening keynote presentation for the 2nd International Workshopo on Location and Context-Awareness (LoCA 2006) in Dublin, on May 10th. The focus of the keynote is Google Maps and Google Earth, which of course ties in nicely to my new book, Hacking Google Maps and Google Earth. I’ll be announcing that book properly once it’s all been finalized, at the moment we’re going through the final stages of editing and proofing. There is however a website dedicated to the new book (and mapping technology/Google Maps etc in general) called MCslp Maps. I;ll be posting up the examples and the code from the book over the next two weeks. Back to LoCA 2006, it looks like an interesting workshop, covering issues from Google Maps style location and information through to the identification and location of smaller items, like computers and hardware within offices. Registration is still open, but if you are unable to attend, I’ll probably be posting up the keynote after the conference.

## MCslp Coalface is live

I’ve started up a new blog designed to handle the day-to-day thoughts and issues, programming notes, IT tricks and so on that I come across (and use/develop) each day. Called MCslp Coalface, the blog is designed to contain the issues and notes that really do come straight from the work-end of what I do day by day. Expect programs, scripts, and usage notes on what I’m doing, as well as follow up thoughts and notes for certain articles and ideas that don’t fall into the realm of MCslp. You can find the new site at http://coalface.mcslp.com and there are RSS and Atom feeds available. As with most of my sites, the content is also included as part of Planet MCslp. You may also have noticed that I’ve changed the theme for the site, now using the excellent Red Train by Vladimir Simovic. The same theme is now used on both the main MCslp and the MCslp Coalface.

## Using awk with different input/output separators

I had to reformat some stuff from the man pages for inclusion in another document that would be converted to a proper table. Here’s a trick for using awk/gawk to take the input (multiple spaces) and output with tabs using different input and output separators.

`BEGIN { OFS = "t"; FS = "[ ][ ]+" }{ print \$1,\$2,\$3,\$4 }`

I only wanted the four columns from the original table, hence why I specified them explicitly here.