FreeBSD Mail Archives

Date:      Tue, 26 Feb 2002 14:06:53 +0000
From:      Tony Finch <dot@dotat.at>
To:        FreeBSD-gnats-submit@freebsd.org
Cc:        Tony Finch <dot@dotat.at>
Subject:   docs/35345: Restore old yacc documentation
Message-ID:  <E16fiFp-000OEy-00@hand.dotat.at>

next in thread | raw e-mail | index | archive | help


>Number:         35345
>Category:       docs
>Synopsis:       Restore old yacc documentation
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-doc
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          update
>Submitter-Id:   current-users
>Arrival-Date:   Tue Feb 26 06:10:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Tony Finch
>Release:        FreeBSD 4.5-STABLE-20020220 i386
>Organization:
dotat labs
>Environment:
System: FreeBSD hand.dotat.at 4.5-STABLE-20020220 FreeBSD 4.5-STABLE-20020220 #15: Wed Feb 20 07:46:52 GMT 2002 fanf@hand.dotat.at:/FreeBSD/obj/FreeBSD/releng4/sys/SHARP i386
>Description:
The patch below adds the yacc documentation obtained from 4.3BSD-Reno via
TUHS's archives. There are some incompatibilities between the -msU macros
that this documentation was written in and the -ms macros, but I am not
a troff person so I haven't attempted to fix it.

There are of course lots of other sections of the documentation that
are missing because they have in the past been encumbered. I can prepare
patches for adding them as well, if you want.
>How-To-Repeat:
>Fix:

--- /usr/src/share/doc/psd/Makefile	Fri Nov 26 09:27:36 1999
+++ /usr/src/share/doc/psd.new/Makefile	Tue Feb 26 13:50:20 2002
@@ -4,15 +4,15 @@
 # The following modules do not build/install:
 # 10.gdb
 
-# The following modules are encumbered:
+# The following modules are missing because they were encumbered:
 # 01.cacm 02.implement 03.iosys 04.uprog 06.Clang 11.adb 14.sccs
-# 15.yacc 16.lex 17.m4
+# 16.lex 17.m4
 
 # The following modules do not apply to FreeBSD:
 # 07.pascal 08.f77 09.f77io
 
 SUBDIR=	 title contents 
-SUBDIR+= 05.sysman 12.make 13.rcs 18.gprof 20.ipctut 21.ipc
+SUBDIR+= 05.sysman 12.make 13.rcs 15.yacc 18.gprof 20.ipctut 21.ipc
 
 # The following modules are new in FreeBSD:
 SUBDIR+= 22.rpcgen 23.rpc 24.xdr 25.xdrrfc 26.rpcrfc 27.nfsrpc
--- /usr/src/share/doc/psd/15.yacc/Makefile	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/Makefile	Tue Feb 26 13:53:48 2002
@@ -0,0 +1,10 @@
+#	From: @(#)Makefile	8.1 (Berkeley) 6/8/93
+# $FreeBSD$
+
+VOLUME=	psd/15.yacc
+SRCS=	ss.. ss0 ss1 ss2 ss3 ss4 ss5 ss6 ss7 ss8 ss9 ssA ssB ssa ssb ssc ssd
+MACROS=	-ms
+
+USE_REFER=	yes
+
+.include <bsd.doc.mk>
--- /usr/src/share/doc/psd/15.yacc/ss..	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss..	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,56 @@
+.\"	@(#)ss..	6.1 (Berkeley) 5/8/86
+.\"
+.EH 'PSD:15-%''Yacc: Yet Another Compiler-Compiler'
+.OH 'Yacc: Yet Another Compiler-Compiler''PSD:15-%'
+.\".RP
+.ND "July 31, 1978"
+.TL
+Yacc:
+Yet Another Compiler-Compiler
+.AU "MH 2C-559" 3968
+Stephen C. Johnson
+.AI
+.MH
+.AB
+.PP
+Computer program input generally has some structure;
+in fact, every computer program that does input can be thought of as defining
+an ``input language'' which it accepts.
+An input language may be as complex as a programming language, or as simple as
+a sequence of numbers.
+Unfortunately, usual input facilities
+are limited, difficult to use,
+and often are lax about checking their inputs for validity.
+.PP
+Yacc provides a general tool for describing
+the input to a computer program.
+The Yacc user specifies the structures
+of his input, together with code to be invoked as
+each such structure is recognized.
+Yacc turns such a specification into a subroutine that
+handles the input process;
+frequently, it is convenient and appropriate to have most
+of the flow of control in the user's application
+handled by this subroutine.
+.PP
+The input subroutine produced by Yacc calls a user-supplied routine to
+return the next basic input item.
+Thus, the user can specify his input in terms of individual input characters, or
+in terms of higher level constructs such as names and numbers.
+The user-supplied routine may also handle idiomatic features such as
+comment and continuation conventions, which typically defy easy grammatical specification.
+.PP
+Yacc is written in portable C.
+The class of specifications accepted is a very general one: LALR(1)
+grammars with disambiguating rules.
+.PP
+In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc
+has also been used for less conventional languages,
+including a phototypesetter language, several desk calculator languages, a document retrieval system,
+and a Fortran debugging system.
+.AE
+.OK
+.\"Computer Languages
+.\"Compilers
+.\"Formal Language Theory
+.CS 23 11 34 0 0 8
--- /usr/src/share/doc/psd/15.yacc/ss0	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss0	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,201 @@
+.\"	@(#)ss0	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+0: Introduction
+.PP
+Yacc provides a general tool for imposing structure on the input to a computer program.
+The Yacc user prepares a
+specification of the input process; this includes rules
+describing the input structure, code to be invoked when these
+rules are recognized, and a low-level routine to do the
+basic input.
+Yacc then generates a function to control the input process.
+This function, called a
+.I parser ,
+calls the user-supplied low-level input routine
+(the
+.I "lexical analyzer" )
+to pick up the basic items
+(called
+.I tokens )
+from the input stream.
+These tokens are organized according to the input structure rules,
+called
+.I "grammar rules" \|;
+when one of these rules has been recognized,
+then user code supplied for this rule, an
+.I action ,
+is invoked; actions have the ability to return values and
+make use of the values of other actions.
+.PP
+Yacc is written in a portable dialect of C
+.[
+Ritchie Kernighan Language Prentice
+.]
+and the actions, and output subroutine, are in C as well.
+Moreover, many of the syntactic conventions of Yacc follow C.
+.PP
+The heart of the input specification is a collection of grammar rules.
+Each rule describes an allowable structure and gives it a name.
+For example, one grammar rule might be
+.DS
+date  :  month\_name  day  \',\'  year   ;
+.DE
+Here,
+.I date ,
+.I month\_name ,
+.I day ,
+and
+.I year
+represent structures of interest in the input process;
+presumably,
+.I month\_name ,
+.I day ,
+and
+.I year
+are defined elsewhere.
+The comma ``,'' is enclosed in single quotes; this implies that the
+comma is to appear literally in the input.
+The colon and semicolon merely serve as punctuation in the rule, and have
+no significance in controlling the input.
+Thus, with proper definitions, the input
+.DS
+July  4, 1776
+.DE
+might be matched by the above rule.
+.PP
+An important part of the input process is carried out by the
+lexical analyzer.
+This user routine reads the input stream, recognizing the lower level structures,
+and communicates these tokens
+to the parser.
+For historical reasons, a structure recognized by the lexical analyzer is called a
+.I "terminal symbol" ,
+while the structure recognized by the parser is called a
+.I "nonterminal symbol" .
+To avoid confusion, terminal symbols will usually be referred to as
+.I tokens .
+.PP
+There is considerable leeway in deciding whether to recognize structures using the lexical
+analyzer or grammar rules.
+For example, the rules
+.DS
+month\_name  :  \'J\' \'a\' \'n\'   ;
+month\_name  :  \'F\' \'e\' \'b\'   ;
+
+         . . .
+
+month\_name  :  \'D\' \'e\' \'c\'   ;
+.DE
+might be used in the above example.
+The lexical analyzer would only need to recognize individual letters, and
+.I month\_name
+would be a nonterminal symbol.
+Such low-level rules tend to waste time and space, and may
+complicate the specification beyond Yacc's ability to deal with it.
+Usually, the lexical analyzer would
+recognize the month names,
+and return an indication that a
+.I month\_name
+was seen; in this case,
+.I month\_name
+would be a token.
+.PP
+Literal characters such as ``,'' must also be passed through the lexical
+analyzer, and are also considered tokens.
+.PP
+Specification files are very flexible.
+It is realively easy to add to the above example the rule
+.DS
+date  :  month \'/\' day \'/\' year   ;
+.DE
+allowing
+.DS
+7 / 4 / 1776
+.DE
+as a synonym for
+.DS
+July 4, 1776
+.DE
+In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
+and little danger of disrupting existing input.
+.PP
+The input being read may not conform to the
+specifications.
+These input errors are detected as early as is theoretically possible with a
+left-to-right scan;
+thus, not only is the chance of reading and computing with bad
+input data substantially reduced, but the bad data can usually be quickly found.
+Error handling,
+provided as part of the input specifications,
+permits the reentry of bad data,
+or the continuation of the input process after skipping over the bad data.
+.PP
+In some cases, Yacc fails to produce a parser when given a set of
+specifications.
+For example, the specifications may be self contradictory, or they may
+require a more powerful recognition mechanism than that available to Yacc.
+The former cases represent design errors;
+the latter cases
+can often be corrected
+by making
+the lexical analyzer
+more powerful, or by rewriting some of the grammar rules.
+While Yacc cannot handle all possible specifications, its power
+compares favorably with similar systems;
+moreover, the
+constructions which are difficult for Yacc to handle are
+also frequently difficult for human beings to handle.
+Some users have reported that the discipline of formulating valid
+Yacc specifications for their input revealed errors of
+conception or design early in the program development.
+.PP
+The theory underlying Yacc has been described elsewhere.
+.[
+Aho Johnson Surveys LR Parsing
+.]
+.[
+Aho Johnson Ullman Ambiguous Grammars
+.]
+.[
+Aho Ullman Principles Compiler Design
+.]
+Yacc has been extensively used in numerous practical applications,
+including
+.I lint ,
+.[
+Johnson Lint
+.]
+the Portable C Compiler,
+.[
+Johnson Portable Compiler Theory
+.]
+and a system for typesetting mathematics.
+.[
+Kernighan Cherry typesetting system CACM
+.]
+.PP
+The next several sections describe the
+basic process of preparing a Yacc specification;
+Section 1 describes the preparation of grammar rules,
+Section 2 the preparation of the user supplied actions associated with these rules,
+and Section 3 the preparation of lexical analyzers.
+Section 4 describes the operation of the parser.
+Section 5 discusses various reasons why Yacc may be unable to produce a
+parser from a specification, and what to do about it.
+Section 6 describes a simple mechanism for
+handling operator precedences in arithmetic expressions.
+Section 7 discusses error detection and recovery.
+Section 8 discusses the operating environment and special features
+of the parsers Yacc produces.
+Section 9 gives some suggestions which should improve the
+style and efficiency of the specifications.
+Section 10 discusses some advanced topics, and Section 11 gives
+acknowledgements.
+Appendix A has a brief example, and Appendix B gives a
+summary of the Yacc input syntax.
+Appendix C gives an example using some of the more advanced
+features of Yacc, and, finally,
+Appendix D describes mechanisms and syntax
+no longer actively supported, but
+provided for historical continuity with older versions of Yacc.
--- /usr/src/share/doc/psd/15.yacc/ss1	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss1	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,138 @@
+.\"	@(#)ss1	6.1 (Berkeley) 5/8/86
+.\"
+.tr *\(**
+.tr |\(or
+.SH
+1: Basic Specifications
+.PP
+Names refer to either tokens or nonterminal symbols.
+Yacc requires
+token names to be declared as such.
+In addition, for reasons discussed in Section 3, it is often desirable
+to include the lexical analyzer as part of the specification file;
+it may be useful to include other programs as well.
+Thus, every specification file consists of three sections:
+the
+.I declarations ,
+.I "(grammar) rules" ,
+and
+.I programs .
+The sections are separated by double percent ``%%'' marks.
+(The percent ``%'' is generally used in Yacc specifications as an escape character.)
+.PP
+In other words, a full specification file looks like
+.DS
+declarations
+%%
+rules
+%%
+programs
+.DE
+.PP
+The declaration section may be empty.
+Moreover, if the programs section is omitted, the second %% mark may be omitted also;
+thus, the smallest legal Yacc specification is
+.DS
+%%
+rules
+.DE
+.PP
+Blanks, tabs, and newlines are ignored except
+that they may not appear in names or multi-character reserved symbols.
+Comments may appear wherever a name is legal; they are enclosed
+in /* . . . */, as in C and PL/I.
+.PP
+The rules section is made up of one or more grammar rules.
+A grammar rule has the form:
+.DS
+A  :  BODY  ;
+.DE
+A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
+The colon and the semicolon are Yacc punctuation.
+.PP
+Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
+non-initial digits.
+Upper and lower case letters are distinct.
+The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
+.PP
+A literal consists of a character enclosed in single quotes ``\'''.
+As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
+are recognized.
+Thus
+.DS
+\'\en\'	newline
+\'\er\'	return
+\'\e\'\'	single quote ``\'''
+\'\e\e\'	backslash ``\e''
+\'\et\'	tab
+\'\eb\'	backspace
+\'\ef\'	form feed
+\'\exxx\'	``xxx'' in octal
+.DE
+For a number of technical reasons, the
+\s-2NUL\s0
+character (\'\e0\' or 0) should never
+be used in grammar rules.
+.PP
+If there are several grammar rules with the same left hand side, the vertical bar ``|''
+can be used to avoid rewriting the left hand side.
+In addition,
+the semicolon at the end of a rule can be dropped before a vertical bar.
+Thus the grammar rules
+.DS
+A	:	B  C  D   ;
+A	:	E  F   ;
+A	:	G   ;
+.DE
+can be given to Yacc as
+.DS
+A	:	B  C  D
+	|	E  F
+	|	G
+	;
+.DE
+It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
+although it makes the input much more readable, and easier to change.
+.PP
+If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
+.DS
+empty :   ;
+.DE
+.PP
+Names representing tokens must be declared; this is most simply done by writing
+.DS
+%token   name1  name2 . . .
+.DE
+in the declarations section.
+(See Sections 3 , 5, and 6 for much more discussion).
+Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
+Every nonterminal symbol must appear on the left side of at least one rule.
+.PP
+Of all the nonterminal symbols, one, called the
+.I "start symbol" ,
+has particular importance.
+The parser is designed to recognize the start symbol; thus,
+this symbol represents the largest,
+most general structure described by the grammar rules.
+By default,
+the start symbol is taken to be the left hand side of the first
+grammar rule in the rules section.
+It is possible, and in fact desirable, to declare the start
+symbol explicitly in the declarations section using the %start keyword:
+.DS
+%start   symbol
+.DE
+.PP
+The end of the input to the parser is signaled by a special token, called the
+.I endmarker .
+If the tokens up to, but not including, the endmarker form a structure
+which matches the start symbol, the parser function returns to its caller
+after the endmarker is seen; it
+.I accepts
+the input.
+If the endmarker is seen in any other context, it is an error.
+.PP
+It is the job of the user-supplied lexical analyzer
+to return the endmarker when appropriate; see section 3, below.
+Usually the endmarker represents some reasonably obvious 
+I/O status, such as ``end-of-file'' or ``end-of-record''.
--- /usr/src/share/doc/psd/15.yacc/ss2	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss2	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,151 @@
+.\"	@(#)ss2	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+2: Actions
+.PP
+With each grammar rule, the user may associate actions to be performed each time
+the rule is recognized in the input process.
+These actions may return values, and may obtain the values returned by previous
+actions.
+Moreover, the lexical analyzer can return values
+for tokens, if desired.
+.PP
+An action is an arbitrary C statement, and as such can do
+input and output, call subprograms, and alter
+external vectors and variables.
+An action is specified by
+one or more statements, enclosed in curly braces ``{'' and ``}''.
+For example,
+.DS
+A	:	\'(\'  B  \')\'
+			{	hello( 1, "abc" );  }
+.DE
+and
+.DS
+XXX	:	YYY  ZZZ
+			{	printf("a message\en");
+				flag = 25;   }
+.DE
+are grammar rules with actions.
+.PP
+To facilitate easy communication between the actions and the parser, the action statements are altered
+slightly.
+The symbol ``dollar sign'' ``$'' is used as a signal to Yacc in this context.
+.PP
+To return a value, the action normally sets the
+pseudo-variable ``$$'' to some value.
+For example, an action that does nothing but return the value 1 is
+.DS
+	{  $$ = 1;  }
+.DE
+.PP
+To obtain the values returned by previous actions and the lexical analyzer, the
+action may use the pseudo-variables $1, $2, . . .,
+which refer to the values returned by the
+components of the right side of a rule, reading from left to right.
+Thus, if the rule is
+.DS
+A	:	B  C  D   ;
+.DE
+for example, then $2 has the value returned by C, and $3 the value returned by D.
+.PP
+As a more concrete example, consider the rule
+.DS
+expr	:	\'(\'  expr  \')\'   ;
+.DE
+The value returned by this rule is usually the value of the
+.I expr
+in parentheses.
+This can be indicated by
+.DS
+expr	:	 \'(\'  expr  \')\'		{  $$ = $2 ;  }
+.DE
+.PP
+By default, the value of a rule is the value of the first element in it ($1).
+Thus, grammar rules of the form
+.DS
+A	:	B    ;
+.DE
+frequently need not have an explicit action.
+.PP
+In the examples above, all the actions came at the end of their rules.
+Sometimes, it is desirable to get control before a rule is fully parsed.
+Yacc permits an action to be written in the middle of a rule as well
+as at the end.
+This rule is assumed to return a value, accessible
+through the usual \$ mechanism by the actions to
+the right of it.
+In turn, it may access the values
+returned by the symbols to its left.
+Thus, in the rule
+.DS
+A	:	B
+			{  $$ = 1;  }
+		C
+			{   x = $2;   y = $3;  }
+	;
+.DE
+the effect is to set
+.I x
+to 1, and
+.I y
+to the value returned by C.
+.PP
+Actions that do not terminate a rule are actually
+handled by Yacc by manufacturing a new nonterminal
+symbol name, and a new rule matching this
+name to the empty string.
+The interior action is the action triggered off by recognizing
+this added rule.
+Yacc actually treats the above example as if
+it had been written:
+.DS
+$ACT	:	/* empty */
+			{  $$ = 1;  }
+	;
+
+A	:	B  $ACT  C
+			{   x = $2;   y = $3;  }
+	;
+.DE
+.PP
+In many applications, output is not done directly by the actions;
+rather, a data structure, such as a parse tree, is constructed in memory,
+and transformations are applied to it before output is generated.
+Parse trees are particularly easy to
+construct, given routines to build and maintain the tree
+structure desired.
+For example, suppose there is a C function
+.I node ,
+written so that the call
+.DS
+node( L, n1, n2 )
+.DE
+creates a node with label L, and descendants n1 and n2, and returns the index of
+the newly created node.
+Then parse tree can be built by supplying actions such as:
+.DS
+expr	:	expr  \'+\'  expr  
+			{  $$ = node( \'+\', $1, $3 );  }
+.DE
+in the specification.
+.PP
+The user may define other variables to be used by the actions.
+Declarations and definitions can appear in
+the declarations section,
+enclosed in the marks ``%{'' and ``%}''.
+These declarations and definitions have global scope, 
+so they are known to the action statements and the lexical analyzer.
+For example,
+.DS
+%{   int variable = 0;   %}
+.DE
+could be placed in the declarations section,
+making
+.I variable
+accessible to all of the actions.
+The Yacc parser uses only names beginning in ``yy'';
+the user should avoid such names.
+.PP
+In these examples, all the values are integers: a discussion of
+values of other types will be found in Section 10.
--- /usr/src/share/doc/psd/15.yacc/ss3	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss3	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,104 @@
+.\"	@(#)ss3	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+3: Lexical Analysis
+.PP
+The user must supply a lexical analyzer to read the input stream and communicate tokens
+(with values, if desired) to the parser.
+The lexical analyzer is an integer-valued function called
+.I yylex .
+The function returns an integer, the
+.I "token number" ,
+representing the kind of token read.
+If there is a value associated with that token, it should be assigned
+to the external variable
+.I yylval .
+.PP
+The parser and the lexical analyzer must agree on these token numbers in order for
+communication between them to take place.
+The numbers may be chosen by Yacc, or chosen by the user.
+In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer
+to return these numbers symbolically.
+For example, suppose that the token name DIGIT has been defined in the declarations section of the
+Yacc specification file.
+The relevant portion of the lexical analyzer might look like:
+.DS
+yylex(){
+	extern int yylval;
+	int c;
+	. . .
+	c = getchar();
+	. . .
+	switch( c ) {
+		. . .
+	case \'0\':
+	case \'1\':
+	  . . .
+	case \'9\':
+		yylval = c\-\'0\';
+		return( DIGIT );
+		. . .
+		}
+	. . .
+.DE
+.PP
+The intent is to return a token number of DIGIT, and a value equal to the numerical value of the
+digit.
+Provided that the lexical analyzer code is placed in the programs section of the specification file,
+the identifier DIGIT will be defined as the token number associated
+with the token DIGIT.
+.PP
+This mechanism leads to clear,
+easily modified lexical analyzers; the only pitfall is the need
+to avoid using any token names in the grammar that are reserved
+or significant in C or the parser; for example, the use of
+token names
+.I if
+or
+.I while
+will almost certainly cause severe
+difficulties when the lexical analyzer is compiled.
+The token name
+.I error
+is reserved for error handling, and should not be used naively
+(see Section 7).
+.PP
+As mentioned above, the token numbers may be chosen by Yacc or by the user.
+In the default situation, the numbers are chosen by Yacc.
+The default token number for a literal
+character is the numerical value of the character in the local character set.
+Other names are assigned token numbers
+starting at 257.
+.PP
+To assign a token number to a token (including literals),
+the first appearance of the token name or literal
+.I
+in the declarations section
+.R
+can be immediately followed by
+a nonnegative integer.
+This integer is taken to be the token number of the name or literal.
+Names and literals not defined by this mechanism retain their default definition.
+It is important that all token numbers be distinct.
+.PP
+For historical reasons, the endmarker must have token
+number 0 or negative.
+This token number cannot be redefined by the user; thus, all
+lexical analyzers should be prepared to return 0 or negative as a token number
+upon reaching the end of their input.
+.PP
+A very useful tool for constructing lexical analyzers is
+the
+.I Lex
+program developed by Mike Lesk.
+.[
+Lesk Lex
+.]
+These lexical analyzers are designed to work in close
+harmony with Yacc parsers.
+The specifications for these lexical analyzers
+use regular expressions instead of grammar rules.
+Lex can be easily used to produce quite complicated lexical analyzers,
+but there remain some languages (such as FORTRAN) which do not
+fit any theoretical framework, and whose lexical analyzers
+must be crafted by hand.
--- /usr/src/share/doc/psd/15.yacc/ss4	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss4	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,330 @@
+.\"	@(#)ss4	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+4: How the Parser Works
+.PP
+Yacc turns the specification file into a C program, which
+parses the input according to the specification given.
+The algorithm used to go from the
+specification to the parser is complex, and will not be discussed
+here (see
+the references for more information).
+The parser itself, however, is relatively simple,
+and understanding how it works, while
+not strictly necessary, will nevertheless make
+treatment of error recovery and ambiguities much more
+comprehensible.
+.PP
+The parser produced by Yacc consists
+of a finite state machine with a stack.
+The parser is also capable of reading and remembering the next
+input token (called the
+.I lookahead
+token).
+The
+.I "current state"
+is always the one on the top of the stack.
+The states of the finite state machine are given
+small integer labels; initially, the machine is in state 0,
+the stack contains only state 0, and no lookahead token has been read.
+.PP
+The machine has only four actions available to it, called
+.I shift ,
+.I reduce ,
+.I accept ,
+and
+.I error .
+A move of the parser is done as follows:
+.IP 1.
+Based on its current state, the parser decides
+whether it needs a lookahead token to decide
+what action should be done; if it needs one, and does
+not have one, it calls
+.I yylex
+to obtain the next token.
+.IP 2.
+Using the current state, and the lookahead token
+if needed, the parser decides on its next action, and
+carries it out.
+This may result in states being pushed onto the stack, or popped off of
+the stack, and in the lookahead token being processed
+or left alone.
+.PP
+The
+.I shift
+action is the most common action the parser takes.
+Whenever a shift action is taken, there is always
+a lookahead token.
+For example, in state 56 there may be
+an action:
+.DS
+	IF	shift 34
+.DE
+which says, in state 56, if the lookahead token is IF,
+the current state (56) is pushed down on the stack,
+and state 34 becomes the current state (on the
+top of the stack).
+The lookahead token is cleared.
+.PP
+The
+.I reduce
+action keeps the stack from growing without
+bounds.
+Reduce actions are appropriate when the parser has seen
+the right hand side of a grammar rule,
+and is prepared to announce that it has seen
+an instance of the rule, replacing the right hand side
+by the left hand side.
+It may be necessary to consult the lookahead token
+to decide whether to reduce, but
+usually it is not; in fact, the default
+action (represented by a ``.'') is often a reduce action.
+.PP
+Reduce actions are associated with individual grammar rules.
+Grammar rules are also given small integer
+numbers, leading to some confusion.
+The action
+.DS
+	\fB.\fR	reduce 18
+.DE
+refers to
+.I "grammar rule"
+18, while the action
+.DS
+	IF	shift 34
+.DE
+refers to
+.I state
+34.
+.PP
+Suppose the rule being reduced is
+.DS
+A	\fB:\fR	x  y  z    ;
+.DE
+The reduce action depends on the
+left hand symbol (A in this case), and the number of
+symbols on the right hand side (three in this case).
+To reduce, first pop off the top three states
+from the stack
+(In general, the number of states popped equals the number of symbols on the
+right side of the rule).
+In effect, these states were the ones
+put on the stack while recognizing
+.I x ,
+.I y ,
+and
+.I z ,
+and no longer serve any useful purpose.
+After popping these states, a state is uncovered
+which was the state the parser was in before beginning to
+process the rule.
+Using this uncovered state, and the symbol
+on the left side of the rule, perform what is in
+effect a shift of A.
+A new state is obtained, pushed onto the stack, and parsing continues.
+There are significant differences between the processing of
+the left hand symbol and an ordinary shift of a token,
+however, so this action is called a
+.I goto
+action.
+In particular, the lookahead token is cleared by a shift, and
+is not affected by a goto.
+In any case, the uncovered state contains an entry such as:
+.DS
+	A	goto 20
+.DE
+causing state 20 to be pushed
+onto the stack, and become the current state.
+.PP
+In effect, the reduce action ``turns back the clock'' in the parse,
+popping the states off the stack to go back to the
+state where the right hand side of the rule was first seen.
+The parser then behaves as if it had seen the left side at that time.
+If the right hand side of the rule is empty,
+no states are popped off of the stack: the uncovered state
+is in fact the current state.
+.PP
+The reduce action is also important in the treatment of user-supplied
+actions and values.
+When a rule is reduced, the code supplied with the rule is executed
+before the stack is adjusted.
+In addition to the stack holding the states, another stack,
+running in parallel with it, holds the values returned
+from the lexical analyzer and the actions.
+When a shift takes place, the external variable
+.I yylval
+is copied onto the value stack.
+After the return from the user code, the reduction is carried out.
+When the
+.I goto
+action is done, the external variable
+.I yyval
+is copied onto the value stack.
+The pseudo-variables $1, $2, etc., refer to the value stack.
+.PP
+The other two parser actions are conceptually much simpler.
+The
+.I accept
+action indicates that the entire input has been seen and
+that it matches the specification.
+This action appears only when the lookahead token is 
+the endmarker, and indicates that the parser has successfully
+done its job.
+The
+.I error
+action, on the other hand, represents a place where the parser
+can no longer continue parsing according to the specification.
+The input tokens it has seen, together with the lookahead token,
+cannot be followed by anything that would result
+in a legal input.
+The parser reports an error, and attempts to recover the situation and
+resume parsing: the error recovery (as opposed to the detection of error)
+will be covered in Section 7.
+.PP
+It is time for an example!
+Consider the specification
+.DS
+%token  DING  DONG  DELL
+%%
+rhyme	:	sound  place
+	;
+sound	:	DING  DONG
+	;
+place	:	DELL
+	;
+.DE
+.PP
+When Yacc is invoked with the
+.B \-v
+option, a file called
+.I y.output
+is produced, with a human-readable description of the parser.
+The
+.I y.output
+file corresponding to the above grammar (with some statistics
+stripped off the end) is:
+.DS
+state 0
+	$accept  :  \_rhyme  $end 
+
+	DING  shift 3
+	.  error
+
+	rhyme  goto 1
+	sound  goto 2
+
+state 1
+	$accept  :   rhyme\_$end 
+
+	$end  accept
+	.  error
+
+state 2
+	rhyme  :   sound\_place 
+
+	DELL  shift 5
+	.  error
+
+	place   goto 4
+
+state 3
+	sound   :   DING\_DONG 
+
+	DONG  shift 6
+	.  error
+
+state 4
+	rhyme  :   sound  place\_    (1)
+
+	.   reduce  1
+
+state 5
+	place  :   DELL\_    (3)
+
+	.   reduce  3
+
+state 6
+	sound   :   DING  DONG\_    (2)
+
+	.   reduce  2
+.DE
+Notice that, in addition to the actions for each state, there is a
+description of the parsing rules being processed in each
+state.  The \_ character is used to indicate
+what has been seen, and what is yet to come, in each rule.
+Suppose the input is
+.DS
+DING  DONG  DELL
+.DE
+It is instructive to follow the steps of the parser while
+processing this input.
+.PP
+Initially, the current state is state 0.
+The parser needs to refer to the input in order to decide
+between the actions available in state 0, so
+the first token,
+.I DING ,
+is read, becoming the lookahead token.
+The action in state 0 on
+.I DING
+is
+is ``shift 3'', so state 3 is pushed onto the stack,
+and the lookahead token is cleared.
+State 3 becomes the current state.
+The next token,
+.I DONG ,
+is read, becoming the lookahead token.
+The action in state 3 on the token
+.I DONG
+is ``shift 6'',
+so state 6 is pushed onto the stack, and the lookahead is cleared.
+The stack now contains 0, 3, and 6.
+In state 6, without even consulting the lookahead,
+the parser reduces by rule 2.
+.DS
+	sound  :   DING  DONG
+.DE
+This rule has two symbols on the right hand side, so
+two states, 6 and 3, are popped off of the stack, uncovering state 0.
+Consulting the description of state 0, looking for a goto on 
+.I sound ,
+.DS
+	sound	goto 2
+.DE
+is obtained; thus state 2 is pushed onto the stack,
+becoming the current state.
+.PP
+In state 2, the next token,
+.I DELL ,
+must be read.
+The action is ``shift 5'', so state 5 is pushed onto the stack, which now has
+0, 2, and 5 on it, and the lookahead token is cleared.
+In state 5, the only action is to reduce by rule 3.
+This has one symbol on the right hand side, so one state, 5,
+is popped off, and state 2 is uncovered.
+The goto in state 2 on
+.I place ,
+the left side of rule 3,
+is state 4.
+Now, the stack contains 0, 2, and 4.
+In state 4, the only action is to reduce by rule 1.
+There are two symbols on the right, so the top two states are popped off,
+uncovering state 0 again.
+In state 0, there is a goto on
+.I rhyme
+causing the parser to enter state 1.
+In state 1, the input is read; the endmarker is obtained,
+indicated by ``$end'' in the
+.I y.output
+file.
+The action in state 1 when the endmarker is seen is to accept,
+successfully ending the parse.
+.PP
+The reader is urged to consider how the parser works
+when confronted with such incorrect strings as
+.I "DING DONG DONG" ,
+.I "DING DONG" ,
+.I "DING DONG DELL DELL" ,
+etc.
+A few minutes spend with this and other simple examples will
+probably be repaid when problems arise in more complicated contexts.
--- /usr/src/share/doc/psd/15.yacc/ss5	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss5	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,302 @@
+.\"	@(#)ss5	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+5: Ambiguity and Conflicts
+.PP
+A set of grammar rules is
+.I ambiguous
+if there is some input string that can be structured in two or more different ways.
+For example, the grammar rule
+.DS
+expr	:	expr  \'\-\'  expr
+.DE
+is a natural way of expressing the fact that one way of forming an arithmetic expression
+is to put two other expressions together with a minus sign between them.
+Unfortunately, this grammar rule does not
+completely specify the way that all complex inputs
+should be structured.
+For example, if the input is
+.DS
+expr  \-  expr  \-  expr
+.DE
+the rule allows this input to be structured as either
+.DS
+(  expr  \-  expr  )  \-  expr
+.DE
+or as
+.DS
+expr  \-  (  expr  \-  expr  )
+.DE
+(The first is called
+.I "left association" ,
+the second
+.I "right association" ).
+.PP
+Yacc detects such ambiguities when it is attempting to build the parser.
+It is instructive to consider the problem that confronts the parser when it is
+given an input such as
+.DS
+expr  \-  expr  \-  expr
+.DE
+When the parser has read the second expr, the input that it has seen:
+.DS
+expr  \-  expr
+.DE
+matches the right side of the grammar rule above.
+The parser could
+.I reduce
+the input by applying this rule;
+after applying the rule;
+the input is reduced to
+.I expr (the
+left side of the rule).
+The parser would then read the final part of the input:
+.DS
+\-  expr
+.DE
+and again reduce.
+The effect of this is to take the left associative interpretation.
+.PP
+Alternatively, when the parser has seen
+.DS
+expr  \-  expr
+.DE
+it could defer the immediate application of the rule, and continue reading
+the input until it had seen
+.DS
+expr  \-  expr  \-  expr
+.DE
+It could then apply the rule to the rightmost three symbols, reducing them to
+.I expr
+and leaving
+.DS
+expr  \-  expr
+.DE
+Now the rule can be reduced once more; the effect is to
+take the right associative interpretation.
+Thus, having read
+.DS
+expr  \-  expr
+.DE
+the parser can do two legal things, a shift or a reduction, and has no way of
+deciding between them.
+This is called a
+.I "shift / reduce conflict" .
+It may also happen that the parser has a choice of two legal reductions;
+this is called a
+.I "reduce / reduce conflict" .
+Note that there are never any ``Shift/shift'' conflicts.
+.PP
+When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
+It does this by selecting one of the valid steps wherever it has a choice.
+A rule describing which choice to make in a given situation is called
+a
+.I "disambiguating rule" .
+.PP
+Yacc invokes two disambiguating rules by default:
+.IP 1.
+In a shift/reduce conflict, the default is to do the shift.
+.IP 2.
+In a reduce/reduce conflict, the default is to reduce by the
+.I earlier
+grammar rule (in the input sequence).
+.PP
+Rule 1 implies that reductions are deferred whenever there is a choice,
+in favor of shifts.
+Rule 2 gives the user rather crude control over the behavior of the parser
+in this situation, but reduce/reduce conflicts should be avoided whenever possible.
+.PP
+Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
+require a more complex parser than Yacc can construct.
+The use of actions within rules can also cause conflicts, if the action must
+be done before the parser can be sure which rule is being recognized.
+In these cases, the application of disambiguating rules is inappropriate,
+and leads to an incorrect parser.
+For this reason, Yacc
+always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2.
+.PP
+In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
+possible to rewrite the grammar rules so that the same inputs are read but there are no
+conflicts.
+For this reason, most previous parser generators
+have considered conflicts to be fatal errors.
+Our experience has suggested that this rewriting is somewhat unnatural,
+and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
+.PP
+As an example of the power of disambiguating rules, consider a fragment from a programming
+language involving an ``if-then-else'' construction:
+.DS
+stat	:	IF  \'(\'  cond  \')\'  stat
+	|	IF  \'(\'  cond  \')\'  stat  ELSE  stat
+	;
+.DE
+In these rules,
+.I IF
+and
+.I ELSE
+are tokens,
+.I cond
+is a nonterminal symbol describing
+conditional (logical) expressions, and
+.I stat
+is a nonterminal symbol describing statements.
+The first rule will be called the
+.ul
+simple-if
+rule, and the
+second the
+.ul
+if-else
+rule.
+.PP
+These two rules form an ambiguous construction, since input of the form
+.DS
+IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
+.DE
+can be structured according to these rules in two ways:
+.DS
+IF  (  C1  )  {
+	IF  (  C2  )  S1
+	}
+ELSE  S2
+.DE
+or
+.DS
+IF  (  C1  )  {
+	IF  (  C2  )  S1
+	ELSE  S2
+	}
+.DE
+The second interpretation is the one given in most programming languages
+having this construct.
+Each
+.I ELSE
+is associated with the last preceding
+``un-\fIELSE'\fRd''
+.I IF .
+In this example, consider the situation where the parser has seen
+.DS
+IF  (  C1  )  IF  (  C2  )  S1
+.DE
+and is looking at the
+.I ELSE .
+It can immediately
+reduce
+by the simple-if rule to get
+.DS
+IF  (  C1  )  stat
+.DE
+and then read the remaining input,
+.DS
+ELSE  S2
+.DE
+and reduce
+.DS
+IF  (  C1  )  stat  ELSE  S2
+.DE
+by the if-else rule.
+This leads to the first of the above groupings of the input.
+.PP
+On the other hand, the
+.I ELSE
+may be shifted,
+.I S2
+read, and then the right hand portion of
+.DS
+IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
+.DE
+can be reduced by the if-else rule to get
+.DS
+IF  (  C1  )  stat
+.DE
+which can be reduced by the simple-if rule.
+This leads to the second of the above groupings of the input, which
+is usually desired.
+.PP
+Once again the parser can do two valid things \- there is a shift/reduce conflict.
+The application of disambiguating rule 1 tells the parser to shift in this case,
+which leads to the desired grouping.
+.PP
+This shift/reduce conflict arises only when there is a particular current input symbol,
+.I ELSE ,
+and particular inputs already seen, such as
+.DS
+IF  (  C1  )  IF  (  C2  )  S1
+.DE
+In general, there may be many conflicts, and each one
+will be associated with an input symbol and
+a set of previously read inputs.
+The previously read inputs are characterized by the
+state
+of the parser.
+.PP
+The conflict messages of Yacc are best understood
+by examining the verbose (\fB\-v\fR) option output file.
+For example, the output corresponding to the above
+conflict state might be:
+.DS L
+23: shift/reduce conflict (shift 45, reduce 18) on ELSE
+
+state 23
+
+	  stat  :  IF  (  cond  )  stat\_         (18)
+	  stat  :  IF  (  cond  )  stat\_ELSE  stat
+
+	 ELSE     shift 45
+	 .        reduce 18
+
+.DE
+The first line describes the conflict, giving the state and the input symbol.
+The ordinary state description follows, giving
+the grammar rules active in the state, and the parser actions.
+Recall that the underline marks the
+portion of the grammar rules which has been seen.
+Thus in the example, in state 23 the parser has seen input corresponding
+to
+.DS
+IF  (  cond  )  stat
+.DE
+and the two grammar rules shown are active at this time.
+The parser can do two possible things.
+If the input symbol is
+.I ELSE ,
+it is possible to shift into state
+45.
+State 45 will have, as part of its description, the line
+.DS
+stat  :  IF  (  cond  )  stat  ELSE\_stat
+.DE
+since the
+.I ELSE
+will have been shifted in this state.
+Back in state 23, the alternative action, described by ``\fB.\fR'',
+is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
+in this case, if the input symbol is not
+.I ELSE ,
+the parser reduces by grammar rule 18:
+.DS
+stat  :  IF  \'(\'  cond  \')\'  stat
+.DE
+Once again, notice that the numbers following ``shift'' commands refer to other states,
+while the numbers following ``reduce'' commands refer to grammar
+rule numbers.
+In the
+.I y.output
+file, the rule numbers are printed after those rules which can be reduced.
+In most one states, there will be at most reduce action possible in the
+state, and this will be the default command.
+The user who encounters unexpected shift/reduce conflicts will probably want to
+look at the verbose output to decide whether the default actions are appropriate.
+In really tough cases, the user might need to know more about
+the behavior and construction of the parser than can be covered here.
+In this case, one of the theoretical references
+.[
+Aho Johnson Surveys Parsing
+.]
+.[
+Aho Johnson Ullman Deterministic Ambiguous
+.]
+.[
+Aho Ullman Principles Design
+.]
+might be consulted; the services of a local guru might also be appropriate.
--- /usr/src/share/doc/psd/15.yacc/ss6	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss6	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,146 @@
+.\"	@(#)ss6	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+6: Precedence
+.PP
+There is one common situation
+where the rules given above for resolving conflicts are not sufficient;
+this is in the parsing of arithmetic expressions.
+Most of the commonly used constructions for arithmetic expressions can be naturally
+described by the notion of
+.I precedence
+levels for operators, together with information about left
+or right associativity.
+It turns out that ambiguous grammars with appropriate disambiguating rules
+can be used to create parsers that are faster and easier to
+write than parsers constructed from unambiguous grammars.
+The basic notion is to write grammar rules
+of the form
+.DS
+expr  :  expr  OP  expr
+.DE
+and
+.DS
+expr  :  UNARY  expr
+.DE
+for all binary and unary operators desired.
+This creates a very ambiguous grammar, with many parsing conflicts.
+As disambiguating rules, the user specifies the precedence, or binding
+strength, of all the operators, and the associativity
+of the binary operators.
+This information is sufficient to allow Yacc to resolve the parsing conflicts
+in accordance with these rules, and construct a parser that realizes the desired
+precedences and associativities.
+.PP
+The precedences and associativities are attached to tokens in the declarations section.
+This is done by a series of lines beginning with a Yacc keyword: %left, %right,
+or %nonassoc, followed by a list of tokens.
+All of the tokens on the same line are assumed to have the same precedence level
+and associativity; the lines are listed in
+order of increasing precedence or binding strength.
+Thus,
+.DS
+%left  \'+\'  \'\-\'
+%left  \'*\'  \'/\'
+.DE
+describes the precedence and associativity of the four arithmetic operators.
+Plus and minus are left associative, and have lower precedence than
+star and slash, which are also left associative.
+The keyword %right is used to describe right associative operators,
+and the keyword %nonassoc is used to describe operators, like
+the operator .LT. in Fortran, that may not associate with themselves; thus,
+.DS
+A  .LT.  B  .LT.  C
+.DE
+is illegal in Fortran, and such an operator would be described with the keyword
+%nonassoc in Yacc.
+As an example of the behavior of these declarations, the description
+.DS
+%right  \'=\'
+%left  \'+\'  \'\-\'
+%left  \'*\'  \'/\'
+
+%%
+
+expr	:	expr  \'=\'  expr
+	|	expr  \'+\'  expr
+	|	expr  \'\-\'  expr
+	|	expr  \'*\'  expr
+	|	expr  \'/\'  expr
+	|	NAME
+	;
+.DE
+might be used to structure the input
+.DS
+a  =  b  =  c*d  \-  e  \-  f*g
+.DE
+as follows:
+.DS
+a = ( b = ( ((c*d)\-e) \- (f*g) ) )
+.DE
+When this mechanism is used,
+unary operators must, in general, be given a precedence.
+Sometimes a unary operator and a binary operator
+have the same symbolic representation, but different precedences.
+An example is unary and binary \'\-\'; unary minus may be given the same
+strength as multiplication, or even higher, while binary minus has a lower strength than
+multiplication.
+The keyword, %prec, changes the precedence level associated with a particular grammar rule.
+%prec appears immediately after the body of the grammar rule, before the action or closing semicolon,
+and is followed by a token name or literal.
+It
+causes the precedence of the grammar rule to become that of the following token name or literal.
+For example, to make unary minus have the same precedence as multiplication the rules might resemble:
+.DS
+%left  \'+\'  \'\-\'
+%left  \'*\'  \'/\'
+
+%%
+
+expr	:	expr  \'+\'  expr
+	|	expr  \'\-\'  expr
+	|	expr  \'*\'  expr
+	|	expr  \'/\'  expr
+	|	\'\-\'  expr      %prec  \'*\'
+	|	NAME
+	;
+.DE
+.PP
+A token declared
+by %left, %right, and %nonassoc need not be, but may be, declared by %token as well.
+.PP
+The precedences and associativities are used by Yacc to
+resolve parsing conflicts; they give rise to disambiguating rules.
+Formally, the rules work as follows:
+.IP 1.
+The precedences and associativities are recorded for those tokens and literals
+that have them.
+.IP 2.
+A precedence and associativity is associated with each grammar rule; it is the precedence
+and associativity of the last token or literal in the body of the rule.
+If the %prec construction is used, it overrides this default.
+Some grammar rules may have no precedence and associativity associated with them.
+.IP 3.
+When there is a reduce/reduce conflict, or there is a shift/reduce conflict
+and either the input symbol or the grammar rule has no precedence and associativity,
+then the two disambiguating rules given at the beginning of the section are used,
+and the conflicts are reported.
+.IP 4.
+If there is a shift/reduce conflict, and both the grammar rule and the input character
+have precedence and associativity associated with them, then the conflict is resolved
+in favor of the action (shift or reduce) associated with the higher precedence.
+If the precedences are the same, then the associativity is used; left
+associative implies reduce, right associative implies shift, and nonassociating
+implies error.
+.PP
+Conflicts resolved by precedence are not counted in the number of shift/reduce and reduce/reduce
+conflicts reported by Yacc.
+This means that mistakes in the specification of precedences may
+disguise errors in the input grammar; it is a good idea to be sparing
+with precedences, and use them in an essentially ``cookbook'' fashion,
+until some experience has been gained.
+The
+.I y.output
+file
+is very useful in deciding whether the parser is actually doing
+what was intended.
--- /usr/src/share/doc/psd/15.yacc/ss7	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss7	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,124 @@
+.\"	@(#)ss7	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+7: Error Handling
+.PP
+Error handling is an extremely difficult area, and many of the problems are semantic ones.
+When an error is found, for example, it may be necessary to reclaim parse tree storage,
+delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
+.PP
+It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
+scanning the input to find further syntax errors.
+This leads to the problem of getting the parser ``restarted'' after an error.
+A general class of algorithms to do this involves discarding a number of tokens
+from the input string, and attempting to adjust the parser so that input can continue.
+.PP
+To allow the user some control over this process,
+Yacc provides a simple, but reasonably general, feature.
+The token name ``error'' is reserved for error handling.
+This name can be used in grammar rules;
+in effect, it suggests places where errors are expected, and recovery might take place.
+The parser pops its stack until it enters a state where the token ``error'' is legal.
+It then behaves as if the token ``error'' were the current lookahead token,
+and performs the action encountered.
+The lookahead token is then reset to the token that caused the error.
+If no special error rules have been specified, the processing halts when an error is detected.
+.PP
+In order to prevent a cascade of error messages, the parser, after
+detecting an error, remains in error state until three tokens have been successfully
+read and shifted.
+If an error is detected when the parser is already in error state,
+no message is given, and the input token is quietly deleted.
+.PP
+As an example, a rule of the form
+.DS
+stat	:	error
+.DE
+would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
+in which the error was seen.
+More precisely, the parser will
+scan ahead, looking for three tokens that might legally follow
+a statement, and start processing at the first of these; if
+the beginnings of statements are not sufficiently distinctive, it may make a
+false start in the middle of a statement, and end up reporting a
+second error where there is in fact no error.
+.PP
+Actions may be used with these special error rules.
+These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
+.PP
+Error rules such as the above are very general, but difficult to control.
+Somewhat easier are rules such as
+.DS
+stat	:	error  \';\'
+.DE
+Here, when there is an error, the parser attempts to skip over the statement, but
+will do so by skipping to the next \';\'.
+All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
+When the \';\' is seen, this rule will be reduced, and any ``cleanup''
+action associated with it performed.
+.PP
+Another form of error rule arises in interactive applications, where
+it may be desirable to permit a line to be reentered after an error.
+A possible error rule might be
+.DS
+input	:	error  \'\en\'  {  printf( "Reenter last line: " );  }  input
+			{	$$  =  $4;  }
+.DE
+There is one potential difficulty with this approach;
+the parser must correctly process three input tokens before it
+admits that it has correctly resynchronized after the error.
+If the reentered line contains an error
+in the first two tokens, the parser deletes the offending tokens,
+and gives no message; this is clearly unacceptable.
+For this reason, there is a mechanism that
+can be used to force the parser
+to believe that an error has been fully recovered from.
+The statement
+.DS
+yyerrok ;
+.DE
+in an action
+resets the parser to its normal mode.
+The last example is better written
+.DS
+input	:	error  \'\en\'
+			{	yyerrok;
+				printf( "Reenter last line: " );   }
+		input
+			{	$$  =  $4;  }
+	;
+.DE
+.PP
+As mentioned above, the token seen immediately
+after the ``error'' symbol is the input token at which the
+error was discovered.
+Sometimes, this is inappropriate; for example, an
+error recovery action might
+take upon itself the job of finding the correct place to resume input.
+In this case,
+the previous lookahead token must be cleared.
+The statement
+.DS
+yyclearin ;
+.DE
+in an action will have this effect.
+For example, suppose the action after error
+were to call some sophisticated resynchronization routine,
+supplied by the user, that attempted to advance the input to the
+beginning of the next valid statement.
+After this routine was called, the next token returned by yylex would presumably
+be the first token in a legal statement;
+the old, illegal token must be discarded, and the error state reset.
+This could be done by a rule like
+.DS
+stat	:	error 
+			{	resynch();
+				yyerrok ;
+				yyclearin ;   }
+	;
+.DE
+.PP
+These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
+from many errors;
+moreover, the user can get control to deal with
+the error actions required by other portions of the program.
--- /usr/src/share/doc/psd/15.yacc/ss8	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss8	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,93 @@
+.\"	@(#)ss8	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+8: The Yacc Environment
+.PP
+When the user inputs a specification
+to Yacc, the output is a file of C programs, called
+.I y.tab.c
+on most
+systems
+(due to local file system conventions, the names may differ from
+installation to installation).
+The function produced by Yacc is called
+.I yyparse \|;
+it is an integer valued function.
+When it is called, it in turn repeatedly calls
+.I yylex ,
+the lexical analyzer
+supplied by the user (see Section 3)
+to obtain input tokens.
+Eventually, either an error is detected, in which case
+(if no error recovery is possible)
+.I yyparse
+returns the value 1,
+or the lexical analyzer returns the endmarker token
+and the parser accepts.
+In this case,
+.I yyparse
+returns the value 0.
+.PP
+The user must provide a certain amount of environment for this
+parser in order to obtain a working program.
+For example, as with every C program, a program called
+.I main
+must be defined, that eventually calls
+.I yyparse .
+In addition, a routine called
+.I yyerror
+prints a message
+when a syntax error is detected.
+.PP
+These two routines must be supplied in one form or another by the
+user.
+To ease the initial effort of using Yacc, a library has been
+provided with default versions of
+.I main
+and
+.I yyerror .
+The name of this library is system dependent;
+on many systems the library is accessed by a
+.B \-ly
+argument to the loader.
+To show the triviality of these default programs, the source is
+given below:
+.DS
+main(){
+	return( yyparse() );
+	}
+.DE
+and
+.DS
+# include <stdio.h>
+
+yyerror(s) char *s; {
+	fprintf( stderr, "%s\en", s );
+	}
+.DE
+The argument to
+.I yyerror
+is a string containing an error message, usually
+the string ``syntax error''.
+The average application will want to do better than this.
+Ordinarily, the program should keep track of the input line number, and print it
+along with the message when a syntax error is detected.
+The external integer variable
+.I yychar
+contains the lookahead token number at the time the error was detected;
+this may be of some interest in giving better diagnostics.
+Since the
+.I main
+program is probably supplied by the user (to read arguments, etc.)
+the Yacc library is useful only in small
+projects, or in the earliest stages of larger ones.
+.PP
+The external integer variable
+.I yydebug
+is normally set to 0.
+If it is set to a nonzero value, the parser will output a
+verbose description of its actions, including
+a discussion of which input symbols have been read, and
+what the parser actions are.
+Depending on the operating environment,
+it may be possible to set this variable by using a debugging system.
--- /usr/src/share/doc/psd/15.yacc/ss9	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ss9	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,169 @@
+.\"	@(#)ss9	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+9: Hints for Preparing Specifications
+.PP
+This section contains miscellaneous hints on preparing efficient, easy to change,
+and clear specifications.
+The individual subsections are more or less
+independent.
+.SH
+Input Style
+.PP
+It is difficult to
+provide rules with substantial actions
+and still have a readable specification file.
+The following style hints owe much to Brian Kernighan.
+.IP a.
+Use all capital letters for token names, all lower case letters for
+nonterminal names.
+This rule comes under the heading of ``knowing who to blame when
+things go wrong.''
+.IP b.
+Put grammar rules and actions on separate lines.
+This allows either to be changed without
+an automatic need to change the other.
+.IP c.
+Put all rules with the same left hand side together.
+Put the left hand side in only once, and let all
+following rules begin with a vertical bar.
+.IP d.
+Put a semicolon only after the last rule with a given left hand side,
+and put the semicolon on a separate line.
+This allows new rules to be easily added.
+.IP e.
+Indent rule bodies by two tab stops, and action bodies by three
+tab stops.
+.PP
+The example in Appendix A is written following this style, as are
+the examples in the text of this paper (where space permits).
+The user must make up his own mind about these stylistic questions;
+the central problem, however, is to make the rules visible through
+the morass of action code.
+.SH
+Left Recursion
+.PP
+The algorithm used by the Yacc parser encourages so called ``left recursive''
+grammar rules: rules of the form
+.DS
+name	:	name  rest_of_rule  ;
+.DE
+These rules frequently arise when
+writing specifications of sequences and lists:
+.DS
+list	:	item
+	|	list  \',\'  item
+	;
+.DE
+and
+.DS
+seq	:	item
+	|	seq  item
+	;
+.DE
+In each of these cases, the first rule
+will be reduced for the first item only, and the second rule
+will be reduced for the second and all succeeding items.
+.PP
+With right recursive rules, such as
+.DS
+seq	:	item
+	|	item  seq
+	;
+.DE
+the parser would be a bit bigger, and the items would be seen, and reduced,
+from right to left.
+More seriously, an internal stack in the parser
+would be in danger of overflowing if a very long sequence were read.
+Thus, the user should use left recursion wherever reasonable.
+.PP
+It is worth considering whether a sequence with zero
+elements has any meaning, and if so, consider writing
+the sequence specification with an empty rule:
+.DS
+seq	:	/* empty */
+	|	seq  item
+	;
+.DE
+Once again, the first rule would always be reduced exactly once, before the
+first item was read,
+and then the second rule would be reduced once for each item read.
+Permitting empty sequences
+often leads to increased generality.
+However, conflicts might arise if Yacc is asked to decide
+which empty sequence it has seen, when it hasn't seen enough to
+know!
+.SH
+Lexical Tie-ins
+.PP
+Some lexical decisions depend on context.
+For example, the lexical analyzer might want to
+delete blanks normally, but not within quoted strings.
+Or names might be entered into a symbol table in declarations,
+but not in expressions.
+.PP
+One way of handling this situation is
+to create a global flag that is
+examined by the lexical analyzer, and set by actions.
+For example, suppose a program
+consists of 0 or more declarations, followed by 0 or more statements.
+Consider:
+.DS
+%{
+	int dflag;
+%}
+  ...  other declarations ...
+
+%%
+
+prog	:	decls  stats
+	;
+
+decls	:	/* empty */
+			{	dflag = 1;  }
+	|	decls  declaration
+	;
+
+stats	:	/* empty */
+			{	dflag = 0;  }
+	|	stats  statement
+	;
+
+    ...  other rules ...
+.DE
+The flag
+.I dflag
+is now 0 when reading statements, and 1 when reading declarations,
+.ul
+except for the first token in the first statement.
+This token must be seen by the parser before it can tell that
+the declaration section has ended and the statements have
+begun.
+In many cases, this single token exception does not
+affect the lexical scan.
+.PP
+This kind of ``backdoor'' approach can be elaborated
+to a noxious degree.
+Nevertheless, it represents a way of doing some things
+that are difficult, if not impossible, to
+do otherwise.
+.SH
+Reserved Words
+.PP
+Some programming languages
+permit the user to
+use words like ``if'', which are normally reserved,
+as label or variable names, provided that such use does not
+conflict with the legal use of these names in the programming language.
+This is extremely hard to do in the framework of Yacc;
+it is difficult to pass information to the lexical analyzer
+telling it ``this instance of `if' is a keyword, and that instance is a variable''.
+The user can make a stab at it, using the
+mechanism described in the last subsection,
+but it is difficult.
+.PP
+A number of ways of making this easier are under advisement.
+Until then, it is better that the keywords be
+.I reserved \|;
+that is, be forbidden for use as variable names.
+There are powerful stylistic reasons for preferring this, anyway.
--- /usr/src/share/doc/psd/15.yacc/ssA	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ssA	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,184 @@
+.\"	@(#)ssA	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+10: Advanced Topics
+.PP
+This section discusses a number of advanced features
+of Yacc.
+.SH
+Simulating Error and Accept in Actions
+.PP
+The parsing actions of error and accept can be simulated
+in an action by use of macros YYACCEPT and YYERROR.
+YYACCEPT causes
+.I yyparse
+to return the value 0;
+YYERROR causes
+the parser to behave as if the current input symbol
+had been a syntax error;
+.I yyerror
+is called, and error recovery takes place.
+These mechanisms can be used to simulate parsers
+with multiple endmarkers or context-sensitive syntax checking.
+.SH
+Accessing Values in Enclosing Rules.
+.PP
+An action may refer to values
+returned by actions to the left of the current rule.
+The mechanism is simply the same as with ordinary actions,
+a dollar sign followed by a digit, but in this case the
+digit may be 0 or negative.
+Consider
+.DS
+sent	:	adj  noun  verb  adj  noun
+			{  \fIlook at the sentence\fR . . .  }
+	;
+
+adj	:	THE		{	$$ = THE;  }
+	|	YOUNG	{	$$ = YOUNG;  }
+	. . .
+	;
+
+noun	:	DOG
+			{	$$ = DOG;  }
+	|	CRONE
+			{	if( $0 == YOUNG ){
+					printf( "what?\en" );
+					}
+				$$ = CRONE;
+				}
+	;
+	. . .
+.DE
+In the action following the word CRONE, a check is made that the
+preceding token shifted was not YOUNG.
+Obviously, this is only possible when a great deal is known about
+what might precede the symbol
+.I noun
+in the input.
+There is also a distinctly unstructured flavor about this.
+Nevertheless, at times this mechanism will save a great
+deal of trouble, especially when a few combinations are to
+be excluded from an otherwise regular structure.
+.SH
+Support for Arbitrary Value Types
+.PP
+By default, the values returned by actions and the lexical analyzer are integers.
+Yacc can also support
+values of other types, including structures.
+In addition, Yacc keeps track of the types, and inserts
+appropriate union member names so that the resulting parser will
+be strictly type checked.
+The Yacc value stack (see Section 4)
+is declared to be a
+.I union
+of the various types of values desired.
+The user declares the union, and associates union member names
+to each token and nonterminal symbol having a value.
+When the value is referenced through a $$ or $n construction,
+Yacc will automatically insert the appropriate union name, so that
+no unwanted conversions will take place.
+In addition, type checking commands such as
+.I Lint\|
+.[
+Johnson Lint Checker 1273
+.]
+will be far more silent.
+.PP
+There are three mechanisms used to provide for this typing.
+First, there is a way of defining the union; this must be
+done by the user since other programs, notably the lexical analyzer,
+must know about the union member names.
+Second, there is a way of associating a union member name with tokens
+and nonterminals.
+Finally, there is a mechanism for describing the type of those
+few values where Yacc can not easily determine the type.
+.PP
+To declare the union, the user includes in the declaration section:
+.DS
+%union  {
+	body of union ...
+	}
+.DE
+This declares the Yacc value stack,
+and the external variables
+.I yylval
+and
+.I yyval ,
+to have type equal to this union.
+If Yacc was invoked with the
+.B \-d
+option, the union declaration
+is copied onto the
+.I y.tab.h
+file.
+Alternatively,
+the union may be declared in a header file, and a typedef
+used to define the variable YYSTYPE to represent
+this union.
+Thus, the header file might also have said:
+.DS
+typedef union {
+	body of union ...
+	} YYSTYPE;
+.DE
+The header file must be included in the declarations
+section, by use of %{ and %}.
+.PP
+Once YYSTYPE is defined,
+the union member names must be associated
+with the various terminal and nonterminal names.
+The construction
+.DS
+< name >
+.DE
+is used to indicate a union member name.
+If this follows
+one of the
+keywords %token,
+%left, %right, and %nonassoc,
+the union member name is associated with the tokens listed.
+Thus, saying
+.DS
+%left  <optype>  \'+\'  \'\-\'
+.DE
+will cause any reference to values returned by these two tokens to be
+tagged with
+the union member name
+.I optype .
+Another keyword, %type, is
+used similarly to associate
+union member names with nonterminals.
+Thus, one might say
+.DS
+%type  <nodetype>  expr  stat
+.DE
+.PP
+There remain a couple of cases where these mechanisms are insufficient.
+If there is an action within a rule, the value returned
+by this action has no
+.I "a priori"
+type.
+Similarly, reference to left context values (such as $0 \- see the
+previous subsection ) leaves Yacc with no easy way of knowing the type.
+In this case, a type can be imposed on the reference by inserting
+a union member name, between < and >, immediately after
+the first $.
+An example of this usage is
+.DS
+rule	:	aaa  {  $<intval>$  =  3;  } bbb
+			{	fun( $<intval>2, $<other>0 );  }
+	;
+.DE
+This syntax has little to recommend it, but the situation arises rarely.
+.PP
+A sample specification is given in Appendix C.
+The facilities in this subsection are not triggered until they are used:
+in particular, the use of %type will turn on these mechanisms.
+When they are used, there is a fairly strict level of checking.
+For example, use of $n or $$ to refer to something with no defined type
+is diagnosed.
+If these facilities are not triggered, the Yacc value stack is used to
+hold
+.I int' s,
+as was true historically.
--- /usr/src/share/doc/psd/15.yacc/ssB	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ssB	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,26 @@
+.\"	@(#)ssB	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+11: Acknowledgements
+.PP
+Yacc owes much to a
+most stimulating collection of users, who have goaded
+me beyond my inclination, and frequently beyond my
+ability, in their endless search for ``one more feature''.
+Their irritating unwillingness to learn how to
+do things my way has usually led to my doing things their way;
+most of the time, they have been right.
+B. W. Kernighan, P. J. Plauger, S. I. Feldman, C. Imagna,
+M. E. Lesk,
+and A. Snyder will recognize some of their ideas in the current version
+of Yacc.
+C. B. Haley contributed to the error recovery algorithm.
+D. M. Ritchie, B. W. Kernighan, and M. O. Harris helped translate this document into English.
+Al Aho also deserves special credit for bringing
+the mountain to Mohammed, and other favors.
+.SG "MH-1273-SCJ-unix"
+.bp
+.[
+$LIST$
+.]
+.bp
--- /usr/src/share/doc/psd/15.yacc/ssa	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ssa	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,113 @@
+.\"	@(#)ssa	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+Appendix A:  A Simple Example
+.PP
+This example gives the complete Yacc specification for a small desk calculator;
+the desk calculator has 26 registers, labeled ``a'' through ``z'', and accepts
+arithmetic expressions made up of the operators +, \-, *, /,
+% (mod operator), & (bitwise and), | (bitwise or), and assignment.
+If an expression at the top level is an assignment, the value is not
+printed; otherwise it is.
+As in C, an integer that begins with 0 (zero) is assumed to be octal;
+otherwise, it is assumed to be decimal.
+.PP
+As an example of a Yacc specification, the desk calculator
+does a reasonable job of showing how precedences and ambiguities
+are used, and demonstrating simple error recovery.
+The major oversimplifications are that the
+lexical analysis phase is much simpler than for most applications, and the
+output is produced immediately, line by line.
+Note the way that decimal and octal integers are read in by the grammar rules;
+This job is probably better done by the lexical analyzer.
+.sp
+.nf
+.ta .5i 1i 1.5i 2i 2.5i
+
+%{
+#  include  <stdio.h>
+#  include  <ctype.h>
+
+int  regs[26];
+int  base;
+
+%}
+
+%start  list
+
+%token  DIGIT  LETTER
+
+%left  \'|\'
+%left  \'&\'
+%left  \'+\'  \'\-\'
+%left  \'*\'  \'/\'  \'%\'
+%left  UMINUS      /*  supplies  precedence  for  unary  minus  */
+
+%%      /*  beginning  of  rules  section  */
+
+list	:	/*  empty  */
+	|	list  stat  \'\en\'
+	|	list  error  \'\en\'
+			{	yyerrok;  }
+	;
+
+stat	:	expr
+			{	printf( "%d\en", $1 );  }
+	|	LETTER  \'=\'  expr
+			{	regs[$1]  =  $3;  }
+	;
+
+expr	:	\'(\'  expr  \')\'
+			{	$$  =  $2;  }
+	|	expr  \'+\'  expr
+			{	$$  =  $1  +  $3;  }
+	|	expr  \'\-\'  expr
+			{	$$  =  $1  \-  $3;  }
+	|	expr  \'*\'  expr
+			{	$$  =  $1  *  $3;  }
+	|	expr  \'/\'  expr
+			{	$$  =  $1  /  $3;  }
+	|	expr  \'%\'  expr
+			{	$$  =  $1  %  $3;  }
+	|	expr  \'&\'  expr
+			{	$$  =  $1  &  $3;  }
+	|	expr  \'|\'  expr
+			{	$$  =  $1  |  $3;  }
+	|	\'\-\'  expr        %prec  UMINUS
+			{	$$  =  \-  $2;  }
+	|	LETTER
+			{	$$  =  regs[$1];  }
+	|	number          
+	;
+
+number	:	DIGIT
+			{	$$ = $1;    base  =  ($1==0)  ?  8  :  10;  }
+	|	number  DIGIT
+			{	$$  =  base * $1  +  $2;  }
+	;
+
+%%      /*  start  of  programs  */
+
+yylex() {		/*  lexical  analysis  routine  */
+              /*  returns  LETTER  for  a  lower  case  letter,  yylval = 0  through  25  */
+              /*  return  DIGIT  for  a  digit,  yylval = 0  through  9  */
+              /*  all  other  characters  are  returned  immediately  */
+
+	int  c;
+
+	while(  (c=getchar())  ==  \' \'  )  {	/*  skip  blanks  */  }
+
+	/*  c  is  now  nonblank  */
+
+	if(  islower(  c  )  )  {	
+		yylval  =  c  \-  \'a\';
+		return  (  LETTER  );
+		}
+	if(  isdigit(  c  )  )  {	
+		yylval  =  c  \-  \'0\';
+		return(  DIGIT  );
+		}
+	return(  c  );
+	}
+.fi
+.bp
--- /usr/src/share/doc/psd/15.yacc/ssb	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ssb	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,110 @@
+.\"	@(#)ssb	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+Appendix B: Yacc Input Syntax
+.PP
+This Appendix has a description of the Yacc input syntax, as a Yacc specification.
+Context dependencies, etc., are not considered.
+Ironically, the Yacc input specification language
+is most naturally specified as an LR(2) grammar; the sticky
+part comes when an identifier is seen in a rule, immediately
+following an action.
+If this identifier is followed by a colon, it is the start of the
+next rule; otherwise
+it is a continuation of the current rule, which just happens to have
+an action embedded in it.
+As implemented, the lexical analyzer looks
+ahead after seeing an identifier, and
+decide whether the next token (skipping blanks, newlines, comments, etc.)
+is a colon.
+If so, it returns the token C_IDENTIFIER.
+Otherwise, it returns IDENTIFIER.
+Literals (quoted strings) are also returned as IDENTIFIERS,
+but never as part of C_IDENTIFIERs.
+.sp
+.nf
+.ta .6i 1.2i 1.8i 2.4i 3i 3.6i
+
+            /*  grammar  for  the  input  to  Yacc  */
+
+	/*  basic  entities  */
+%token	IDENTIFIER	/*   includes  identifiers   and  literals  */
+%token	C_IDENTIFIER	/*    identifier  (but  not  literal)  followed  by  colon    */
+%token	NUMBER		/*    [0-9]+    */
+
+	/*  reserved  words:    %type  =>  TYPE,  %left  =>  LEFT,  etc.  */
+
+%token	LEFT  RIGHT  NONASSOC  TOKEN  PREC  TYPE  START  UNION
+
+%token	MARK	/*  the  %%  mark  */
+%token	LCURL	/*  the  %{  mark  */
+%token	RCURL	/*  the  %}  mark  */
+
+	/*  ascii  character  literals  stand  for  themselves  */
+
+%start	spec
+
+%%
+
+spec	:	defs  MARK  rules  tail
+	;
+
+tail	:	MARK	{    \fIIn  this  action,  eat  up  the  rest  of  the  file\fR    }
+	|	/*  empty:  the  second  MARK  is  optional  */
+	;
+
+defs	:	/*  empty  */
+	|	defs  def
+	;
+
+def	:	START  IDENTIFIER
+	|	UNION  {  \fICopy union  definition  to  output\fR  }
+	|	LCURL  {  \fICopy  C  code  to  output  file\fR   }  RCURL
+	|	ndefs  rword  tag  nlist
+	;
+
+rword	:	TOKEN
+	|	LEFT
+	|	RIGHT
+	|	NONASSOC
+	|	TYPE
+	;
+
+tag	:	/*  empty:  union  tag  is  optional  */
+	|	\'<\'  IDENTIFIER  \'>\'
+	;
+
+nlist	:	nmno
+	|	nlist  nmno
+	|	nlist  \',\'  nmno
+	;
+
+nmno	:	IDENTIFIER		/*  NOTE:  literal  illegal  with  %type  */
+	|	IDENTIFIER  NUMBER      /*  NOTE:  illegal  with  %type  */
+	;
+
+	/*  rules  section  */
+
+rules	:	C_IDENTIFIER  rbody  prec
+	|	rules  rule
+	;
+
+rule	:	C_IDENTIFIER  rbody  prec
+	|	'|'  rbody  prec
+	;
+
+rbody	:	/*  empty  */
+	|	rbody  IDENTIFIER
+	|	rbody  act
+	;
+
+act	:	\'{\'  {  \fICopy  action,  translate  $$,  etc.\fR  }  \'}\'
+	;
+
+prec	:	/*  empty  */
+	|	PREC  IDENTIFIER
+	|	PREC  IDENTIFIER  act
+	|	prec  \';\'
+	;
+.fi
+.bp
--- /usr/src/share/doc/psd/15.yacc/ssc	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ssc	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,311 @@
+.\"	@(#)ssc	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+Appendix C: An Advanced Example
+.PP
+This Appendix gives an example of a grammar using some
+of the advanced features discussed in Section 10.
+The desk calculator example in Appendix A is
+modified to provide a desk calculator that
+does floating point interval arithmetic.
+The calculator understands floating point
+constants, the arithmetic operations +, \-, *, /,
+unary \-, and = (assignment), and has 26 floating
+point variables, ``a'' through ``z''.
+Moreover, it also understands
+.I intervals ,
+written
+.DS
+	( x , y )
+.DE
+where
+.I x
+is less than or equal to
+.I y .
+There are 26 interval valued variables ``A'' through ``Z''
+that may also be used.
+The usage is similar to that in Appendix A; assignments
+return no value, and print nothing, while expressions print
+the (floating or interval) value.
+.PP
+This example explores a number of interesting features
+of Yacc and C.
+Intervals are represented by a structure, consisting of the
+left and right endpoint values, stored as
+.I double 's.
+This structure is given a type name, INTERVAL, by using
+.I typedef .
+The Yacc value stack can also contain floating point scalars, and
+integers (used to index into the arrays holding the variable values).
+Notice that this entire strategy depends strongly on being able to
+assign structures and unions in C.
+In fact, many of the actions call functions that return structures
+as well.
+.PP
+It is also worth noting the use of YYERROR to handle error conditions:
+division by an interval containing 0, and an interval presented in
+the wrong order.
+In effect, the error recovery mechanism of Yacc is used to throw away the
+rest of the offending line.
+.PP
+In addition to the mixing of types on the value stack,
+this grammar also demonstrates an interesting use of syntax to
+keep track of the type (e.g. scalar or interval) of intermediate
+expressions.
+Note that a scalar can be automatically promoted to an interval if
+the context demands an interval value.
+This causes a large number of conflicts when the grammar is run through
+Yacc: 18 Shift/Reduce and 26 Reduce/Reduce.
+The problem can be seen by looking at the two input lines:
+.DS
+	2.5 + ( 3.5 \- 4. )
+.DE
+and
+.DS
+	2.5 + ( 3.5 , 4. )
+.DE
+Notice that the 2.5 is to be used in an interval valued expression
+in the second example, but this fact is not known until
+the ``,'' is read; by this time, 2.5 is finished, and the parser cannot go back
+and change its mind.
+More generally, it might be necessary to look ahead an arbitrary number of
+tokens to decide whether to convert a scalar to an interval.
+This problem is evaded by having two rules for each binary interval
+valued operator: one when the left operand is a scalar, and one when
+the left operand is an interval.
+In the second case, the right operand must be an interval,
+so the conversion will be applied automatically.
+Despite this evasion, there are still many cases where the
+conversion may be applied or not, leading to the above
+conflicts.
+They are resolved by listing the rules that yield scalars first
+in the specification file; in this way, the conflicts will
+be resolved in the direction of keeping scalar
+valued expressions scalar valued until they are forced to become
+intervals.
+.PP
+This way of handling multiple types is very instructive, but not very general.
+If there were many kinds of expression types, instead of just two,
+the number of rules needed would increase dramatically, and the conflicts
+even more dramatically.
+Thus, while this example is instructive, it is better practice in a
+more normal programming language environment to
+keep the type information as part of the value, and not as part
+of the grammar.
+.PP
+Finally, a word about the lexical analysis.
+The only unusual feature is the treatment of floating point constants.
+The C library routine
+.I atof
+is used to do the actual conversion from a character string
+to a double precision value.
+If the lexical analyzer detects an error,
+it responds by returning a token that
+is illegal in the grammar, provoking a syntax error
+in the parser, and thence error recovery.
+.DS L
+
+%{
+
+#  include  <stdio.h>
+#  include  <ctype.h>
+
+typedef  struct  interval  {
+	double  lo,  hi;
+	}  INTERVAL;
+
+INTERVAL  vmul(),  vdiv();
+
+double  atof();
+
+double  dreg[ 26 ];
+INTERVAL  vreg[ 26 ];
+
+%}
+
+%start    lines
+
+%union    {
+	int  ival;
+	double  dval;
+	INTERVAL  vval;
+	}
+
+%token  <ival>  DREG  VREG	/*  indices  into  dreg,  vreg  arrays  */
+
+%token  <dval>  CONST		/*  floating  point  constant  */
+
+%type  <dval>  dexp		/*  expression  */
+
+%type  <vval>  vexp		/*  interval  expression  */
+
+	/*  precedence  information  about  the  operators  */
+
+%left	\'+\'  \'\-\'
+%left	\'*\'  \'/\'
+%left	UMINUS        /*  precedence  for  unary  minus  */
+
+%%
+
+lines	:	/*  empty  */
+	|	lines  line
+	;
+
+line	:	dexp  \'\en\'
+			{	printf(  "%15.8f\en",  $1  );  }
+	|	vexp  \'\en\'
+			{	printf(  "(%15.8f  ,  %15.8f  )\en",  $1.lo,  $1.hi  );  }
+	|	DREG  \'=\'  dexp  \'\en\'
+			{	dreg[$1]  =  $3;  }
+	|	VREG  \'=\'  vexp  \'\en\'
+			{	vreg[$1]  =  $3;  }
+	|	error  \'\en\'
+			{	yyerrok;  }
+	;
+
+dexp	:	CONST
+	|	DREG
+			{	$$  =  dreg[$1];  }
+	|	dexp  \'+\'  dexp
+			{	$$  =  $1  +  $3;  }
+	|	dexp  \'\-\'  dexp
+			{	$$  =  $1  \-  $3;  }
+	|	dexp  \'*\'  dexp
+			{	$$  =  $1  *  $3;  }
+	|	dexp  \'/\'  dexp
+			{	$$  =  $1  /  $3;  }
+	|	\'\-\'  dexp	%prec  UMINUS
+			{	$$  =  \- $2;  }
+	|	\'(\'  dexp  \')\'
+			{	$$  =  $2;  }
+	;
+
+vexp	:	dexp
+			{	$$.hi  =  $$.lo  =  $1;  }
+	|	\'(\'  dexp  \',\'  dexp  \')\'
+			{
+			$$.lo  =  $2;
+			$$.hi  =  $4;
+			if(  $$.lo  >  $$.hi  ){
+				printf(  "interval  out  of  order\en"  );
+				YYERROR;
+				}
+			}
+	|	VREG
+			{	$$  =  vreg[$1];    }
+	|	vexp  \'+\'  vexp
+			{	$$.hi  =  $1.hi  +  $3.hi;
+				$$.lo  =  $1.lo  +  $3.lo;    }
+	|	dexp  \'+\'  vexp
+			{	$$.hi  =  $1  +  $3.hi;
+				$$.lo  =  $1  +  $3.lo;    }
+	|	vexp  \'\-\'  vexp
+			{	$$.hi  =  $1.hi  \-  $3.lo;
+				$$.lo  =  $1.lo  \-  $3.hi;    }
+	|	dexp  \'\-\'  vexp
+			{	$$.hi  =  $1  \-  $3.lo;
+				$$.lo  =  $1  \-  $3.hi;    }
+	|	vexp  \'*\'  vexp
+			{	$$  =  vmul(  $1.lo,  $1.hi,  $3  );  }
+	|	dexp  \'*\'  vexp
+			{	$$  =  vmul(  $1,  $1,  $3  );  }
+	|	vexp  \'/\'  vexp
+			{	if(  dcheck(  $3  )  )  YYERROR;
+				$$  =  vdiv(  $1.lo,  $1.hi,  $3  );  }
+	|	dexp  \'/\'  vexp
+			{	if(  dcheck(  $3  )  )  YYERROR;
+				$$  =  vdiv(  $1,  $1,  $3  );  }
+	|	\'\-\'  vexp	%prec  UMINUS
+			{	$$.hi  =  \-$2.lo;    $$.lo  =  \-$2.hi;    }
+	|	\'(\'  vexp  \')\'
+			{	$$  =  $2;  }
+	;
+
+%%
+
+#  define  BSZ  50        /*  buffer  size  for  floating  point  numbers  */
+
+	/*  lexical  analysis  */
+
+yylex(){
+	register  c;
+
+	while(  (c=getchar())  ==  \' \'  ){  /*  skip  over  blanks  */  }
+
+	if(  isupper(  c  )  ){
+		yylval.ival  =  c  \-  \'A\';
+		return(  VREG  );
+		}
+	if(  islower(  c  )  ){
+		yylval.ival  =  c  \-  \'a\';
+		return(  DREG  );
+		}
+
+	if(  isdigit(  c  )  ||  c==\'.\'  ){
+		/*  gobble  up  digits,  points,  exponents  */
+
+		char  buf[BSZ+1],  *cp  =  buf;
+		int  dot  =  0,  exp  =  0;
+
+		for(  ;  (cp\-buf)<BSZ  ;  ++cp,c=getchar()  ){
+
+			*cp  =  c;
+			if(  isdigit(  c  )  )  continue;
+			if(  c  ==  \'.\'  ){
+				if(  dot++  ||  exp  )  return(  \'.\'  );    /*  will  cause  syntax  error  */
+				continue;
+				}
+
+			if(  c  ==  \'e\'  ){
+				if(  exp++  )  return(  \'e\'  );    /*  will  cause  syntax  error  */
+				continue;
+				}
+
+			/*  end  of  number  */
+			break;
+			}
+		*cp  =  \'\e0\';
+		if(  (cp\-buf)  >=  BSZ  )  printf(  "constant  too  long:  truncated\en"  );
+		else  ungetc(  c,  stdin  );    /*  push  back  last  char  read  */
+		yylval.dval  =  atof(  buf  );
+		return(  CONST  );
+		}
+	return(  c  );
+	}
+
+INTERVAL  hilo(  a,  b,  c,  d  )  double  a,  b,  c,  d;  {
+	/*  returns  the  smallest  interval  containing  a,  b,  c,  and  d  */
+	/*  used  by  *,  /  routines  */
+	INTERVAL  v;
+
+	if(  a>b  )  {  v.hi  =  a;    v.lo  =  b;  }
+	else  {  v.hi  =  b;    v.lo  =  a;  }
+
+	if(  c>d  )  {
+		if(  c>v.hi  )  v.hi  =  c;
+		if(  d<v.lo  )  v.lo  =  d;
+		}
+	else  {
+		if(  d>v.hi  )  v.hi  =  d;
+		if(  c<v.lo  )  v.lo  =  c;
+		}
+	return(  v  );
+	}
+
+INTERVAL  vmul(  a,  b,  v  )  double  a,  b;    INTERVAL  v;  {
+	return(  hilo(  a*v.hi,  a*v.lo,  b*v.hi,  b*v.lo  )  );
+	}
+
+dcheck(  v  )  INTERVAL  v;  {
+	if(  v.hi  >=  0.  &&  v.lo  <=  0.  ){
+		printf(  "divisor  interval  contains  0.\en"  );
+		return(  1  );
+		}
+	return(  0  );
+	}
+
+INTERVAL  vdiv(  a,  b,  v  )  double  a,  b;    INTERVAL  v;  {
+	return(  hilo(  a/v.hi,  a/v.lo,  b/v.hi,  b/v.lo  )  );
+	}
+.DE
+.bp
--- /usr/src/share/doc/psd/15.yacc/ssd	Thu Jan  1 00:00:00 1970
+++ /usr/src/share/doc/psd.new/15.yacc/ssd	Tue Feb 26 13:49:21 2002
@@ -0,0 +1,40 @@
+.\"	@(#)ssd	6.1 (Berkeley) 5/8/86
+.\"
+.SH
+Appendix D: Old Features Supported but not Encouraged
+.PP
+This Appendix mentions synonyms and features which are supported for historical
+continuity, but, for various reasons, are not encouraged.
+.IP 1.
+Literals may also be delimited by double quotes ``"''.
+.IP 2.
+Literals may be more than one character long.
+If all the characters are alphabetic, numeric, or \_, the type number of the literal is defined,
+just as if the literal did not have the quotes around it.
+Otherwise, it is difficult to find the value for such literals.
+.IP
+The use of multi-character literals is likely to mislead those unfamiliar with
+Yacc, since it suggests that Yacc is doing a job which must be actually done by the lexical analyzer.
+.IP 3.
+Most places where % is legal, backslash ``\e'' may be used.
+In particular, \e\e is the same as %%, \eleft the same as %left, etc.
+.IP 4.
+There are a number of other synonyms:
+.DS
+%< is the same as %left
+%> is the same as %right
+%binary and %2 are the same as %nonassoc
+%0 and %term are the same as %token
+%= is the same as %prec
+.DE
+.IP 5.
+Actions may also have the form
+.DS
+={ . . . }
+.DE
+and the curly braces can be dropped if the action is a
+single C statement.
+.IP 6.
+C code between %{ and %} used to be permitted at the
+head of the rules section, as well as in the
+declaration section.
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E16fiFp-000OEy-00>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation