SE250:HTTB:Parsing:Syntactic Analysis

From Marks Wiki
Jump to navigation Jump to search
Previous Page Contents Next Page



Parser

A parser takes a sequence of tokens, and attempts to construct a valid structure out of them. As it reads through the tokens, it compairs what it finds to defined formal grammar. When it find what seems to be a valid sequence, it adds the tokens to the parse tree, in the structure defined for that sequence.

As the formal grammar the Parser uses is defined specifically to find allowable combinations of token types, the parser is able to detect localised syntactic errors. For example, it will detect if a "for" statement has an incorrect number of arguments. However, as most parser's don't keep an overall picture of the program "in mind", most cannot detect errors on a larger scale. For example, they will not pick up if a variable is used befor initialization. Things like this are usually checked later in the process.

Parse Trees

A parse tree is produced by recursively following the rules of a grammar. Each level represents a matched production. Just as multiple grammars can be written to parse the same language, there can be multiple equivalent parse trees for the same expression.

Example of a parsetree:

<html> <img src="http://www.resisty.com/upload/parsetree.jpg" width="543" height="676" alt="Parse Tree" /> </html>

Abstract Syntax Trees

Abstract syntax trees are simplified trees which unambiguously show the structure of a text.

Example (same expression as above for comparison):

<html><img src="http://img151.imageshack.us/img151/6968/astdv8.png"/></html>

Grammars

A grammar formally defines the acceptable syntax of a language. We specify a grammar using productions. Productions look like this:

<S> -> <C>
<S> -> <C><S>
<SC> -> "<S>"

To understand these, try reading them like this:
A string can be a character.
A string can be a character followed by a string.
A string constant can be a quote, followed by a string, followed by a quote.

Grammars for mathematical expressions tend to be a bit more complex. Here is a cut down example (where N is a number):

<E> -> <N>
<E> -> ( <E> )
<E> -> <E> + <E>
<E> -> <E> - <E>
<E> -> <E> * <E>
<E> -> <E> / <E>

There are two (related) problems with this grammar. The first is that it's ambiguous. It could match 1 - 2 + 3 as (1 - 2) + 3 = -1 + 3 = 2. Or, it could match it as 1 - (2 + 3) = 1 - 5 = -4. The second is that order of operations is ignored. For example, multiplication should take precedence over addition.

Example Grammar

Balanced Parentheses

< B > -> €
<B> -> (<B>) <B>

Statements

<St> -> if(<condition>) <St>
<St> -> while(<condition>) <St>
<St> -> if(<condition>) <St> else <St>
<Stlist> -> E
<Stlist> -> <St><Stlist>
<St> -> {<Stlist>}

Lexical Analyzer - Flex

Lexical analysis is the complete structure that has all the above operations in it, it consists of a scanner and a tokenizer. They are the functions that perform the analysis for it and have many operations within them. Although these functions make the analyser, lexical analysis can be done is one step instead of going through each function. This is possible if only one character is read into it at a time. This can be done using a flex a parser generator, this is a faster and more efficient track than the normal route of operation. A flex uses a table and goto statements to jump into statements and instructions. What it does is pick up on lexical patterns in text and collects regular occurrences those terms. Flex uses the commands given by the user and interprets which terms to be collected and which ones not to be. More modern flex analysers have libraries in which the terms to be collected don’t need to be stated. After doing its operations the flex generator outputs the file in a c format.


Previous Page Contents Next Page