Additionally, the token TAG will match any tag that is not one of the tags listed above, and the token TEXT will match any (single) character that is not within a tag or comment; ; and the token SPACE will match any non-empty sequence of whitespace characters.
DOC_START : <html> DOC_END : </html> HEAD_START : <head> HEAD_END : </head> BODY_START : <body> BODY_END : </body> BF_START : <b> BF_END : </b> IT_START : <i> IT_END : </i> UL_START : <ul> UL_END : </ul> OL_START : <ol> OL_END : </ol> LI_START : <li> LI_END : </li>
Thus, the rule for the nonterminal Html above consists of two alternatives. The first says that one possible structure for Html is to have something with the structure of Item (which is then defined by its own rules), followed by something else which again has the structure of Html; the second says that Html can simply the the empty sequence. (For those of you who have unwound the recursion here in your head, this amounts to saying that Html consists of zero or more Items.) The start symbol for the grammar is Doc.
Doc : Wspace DOC_START Wspace Head Wspace Body Wspace DOC_END Wspace Head : HEAD_START Html HEAD_END Body : BODY_START Html BODY_END Wspace : SPACE | Html : Item Html | Item : BF_START Html BF_END | IT_START Html IT_END | List | Other List : UL_START Wspace ItemList Wspace UL_END | OL_START Wspace ItemList Wspace OL_END ItemList : ItemList Wspace OneItem | OneItem OneItem : LI_START Html LI_END Other : TAG | TEXT | SPACE