XML Data Integration
XML has become the de facto standard for Information
Exchange protocol for e-commerce and many workgroup
applications such as Enterprise Resource Planning
(ERP). The availability of large amounts of heterogeneous
distributed web data necessitates the integration
of XML data from multiple XML sources for many reasons.
For example, currently, there are many e-commerce
companies, which sell similar products but represent
them using different XML schemas with possibly different
ontologies. When any two such companies merge, or
make an effort to service customers in cooperation,
there is a need for a uniform schema integration methodology.
In some applications like comparison-shopping, there
is a need for an illusionary centralized homogeneous
information system. Each organization creates its
own document structure according to specific requirements.
These documents/data may need to be restructured in
order to efficiently share the data with other organizations.
We propose an XML Schema integration methodology
and a querying mechanism. This allows easy integration
and querying of related but heterogeneous schemas.
To achieve our objective, we propose an object-oriented
data model called XSDM (XML schema Data Model) which
is a graphical representation of XML Schema for the
purpose of schema integration. We propose a three-layered
architecture for XML Schema integration, with each
layer presenting an integrated view of the concepts
that characterize the layer below. The three layers
included are namely pre-integration, comparison and
integration. During pre-integration, the schema present
in XML Schema notation is read and is converted into
the XSDM notation. During the comparison phase of
integration, correspondences as well as conflicts
between elements are identified. During the integration
phase, restructuring and merging of the initial schemas
takes place to obtain the global schema. We propose
integration policies for integrating element definitions
as well as their datatypes and attributes.
In the querying process, the user queries that apply
on the global schema are processed by the Global Query
engine, which transforms the global query into a set
of local queries that can be applied on the local
schemas. The local XML documents are queried using
XQuery. The results thus obtained from the local query
engine are then integrated to form a global document.
Reserchers
Dr.
Sanjay Madria
Sakamuri (MS Thesis), UMR
C. Bipin (MS Thesis), UMR
Dr. K. Passi, Laurentian University, Canada
Dr. Mukesh Mohania, IBM Research Lab, India
|