Here, printable characters are specified via the C library function isprint() and correspond to the flex character class expression [:print:]; whitespace characters are specified via the C library function isspace() and correspond to the flex character class expression [:space:] (see flex user manual).
DOC_START : <html> DOC_END : </html> HEAD_START : <head> HEAD_END : </head> BODY_START : <body> BODY_END : </body> BF_START : <b> BF_END : </b> IT_START : <i> IT_END : </i> UL_START : <ul> UL_END : </ul> OL_START : <ol> OL_END : </ol> LI_START : <li> LI_END : </li> TAG : This token matches any character sequence that has the structure of a tag, as described above, and which is not any of the tags listed above. SPL_ENT : This token matches special entities (see Sec. 1.2 below). In our case there are four special entities: & < > "
Note: Each special entity begins with an ampersand character '&' and ends with a semicolon ';'.TEXT : This token matches any (single) non-whitespace character that is not within a tag. SPACE : This token matches any non-empty sequence of whitespace characters.
Input character sequence Output & & < < > > " "
Thus, the rule for the nonterminal Html above consists of two alternatives. The first says that one possible structure for Html is to have something with the structure of Item (which is then defined by its own rules), followed by something else which again has the structure of Html; the second says that Html can simply the the empty sequence. (For those of you who have unwound the recursion here in your head, this amounts to saying that Html consists of zero or more Items.) The start symbol for the grammar is Doc.
Doc : Wspace DOC_START Wspace Head Wspace Body Wspace DOC_END Wspace Head : HEAD_START Html HEAD_END Body : BODY_START Html BODY_END Wspace : SPACE | ![]()
Html : Item Html | ![]()
Item : BF_START Html BF_END | IT_START Html IT_END | List | Other List : UL_START Wspace ItemList Wspace UL_END | OL_START Wspace ItemList Wspace OL_END ItemList : ItemList Wspace OneItem | OneItem OneItem : LI_START Html LI_END Other : TAG | TEXT | SPL_ENT | SPACE