XISS
With the advent of XML as a standard for data
representation and exchange on the Internet, storing
and querying XML data becomes more and more important.
Several XML query languages have been proposed, and
the common feature of the languages is the use of
regular path expressions to query XML data. This poses
a new challenge concerning indexing and searching XML data,
because conventional approaches based on tree traversals may not
meet the processing requirements under heavy access requests.
In this project, we propose a new system for
indexing and storing XML data based on a numbering
scheme (Extended Preorder Numbering Scheme) for
elements. This numbering scheme quickly determines the
ancestor-descendant relationship between elements in
the hierarchy of XML data. We also propose several
algorithms for processing regular path expressions,
namely,
(1) EE-Join for searching paths from an element to another,
(2) EA-Join for scanning sorted elements and attributes
to find element-attribute pairs, and
(3) KC-Join for finding Kleene-Closure on repeated paths or elements.
The EE-Join algorithm is highly effective particularly
for searching paths that are very long or whose
lengths are unknown. Experimental results from our
prototype system implementation
show that the proposed algorithms can process XML queries with
regular path expressions by up to an order of magnitude faster than
conventional approaches.