Parsing

This chapter will describe how tags, scopes, entities and their attributes are parsed. To make it easier to understand how parsing in RXML works many examples will be provided.

When RXML was redesigned to be XML compliant, many of the flaws were removed. Security issues have hopefully been removed as well as many odd quirks and behaviours.

XML Compliancy

To be XML compliant, a set of requirements has to be fulfilled. At this moment the XML standard is not fully supported. Roxen is able to understand and parse all XML compliant code, but doesn't have any support for DTD-parsing. Also, the parser is very forgiving when it comes to demanding that the code follows the correct XML-syntax. The table below shows a subset of the most common differences between old RXML and the new XML compliant RXML.

RXML 1.3	RXML 2
`<accessed>`	`<accessed/>`
`<p>`	`<p>`content`</p>`
`<tablify nice>`	`<tablify nice="nice"/>` or `<tablify nice=""/>`
`<if supports=cookies>`	`<if supports="cookies"/>`

Note that if a site has older RXML code it can still survive the move to the new system. By adding the RXML compatibility module the parser will understand the old syntax.

RXML Parsing

The Roxen 2 RXML parser utilizes a top-down parsing approach. This means that the parser now uses preparsing instead of postparsing as in RXML 1.3. Repetitive parsing in output tags used to be a performance hog in RXML 1.3. The new parser now makes one analyze pass when doing output from a tag, instead of rescanning the output tag in every loop. This feature makes tags like <formoutput> unneccessary.

The images below show some of the differences in RXML parsing between Roxen WebServer 2.1 and Roxen Challenger 1.3.

Parsing '<!--' comments

The old 1.3 RXML parser didn't recognize '<!--' comments and thus the <if> inside the comment is parsed and the first ignored. The RXML parser 2 consistently ignores code within comments. However, if the comment is a RXML comment, problems might arise while the parser are scanning the page for tags.

Locating the closing tag

When the parser comes upon a container tag it does a scanpass, without performing any parsing, to locate the closing tag. If the parser as shown below, finds another <if> inside a RXML comment it automatically looks for its closing tag thus believing the closing tag for the first <if> is further down the page. As the RXML parser 2 can't find the last <if> tag it decides there are unmatching tags in the code and returns an unbalanced tag error message.

Evaluating tags

When the parser scans a page for tags it automatically evaluates the tag that it recognizes and stores the results for later use. Recognized tags are tags that are registered within the parser, mostly RXML-tags. Tags which the parser doesn't recognize, like HTML-tags, won't be parsed but their attributes will be stored. The 1.3 parser however, isn't that smart and will declare all unknown tags as text and hence the <if> tag will be found and parsed.