w2c logo Missouri S&T
About People News Projects Publications Services Grants Contact Us
Projects

XML Data Integration

XML has become the de facto standard for Information Exchange protocol for e-commerce and many workgroup applications such as Enterprise Resource Planning (ERP). The availability of large amounts of heterogeneous distributed web data necessitates the integration of XML data from multiple XML sources for many reasons. For example, currently, there are many e-commerce companies, which sell similar products but represent them using different XML schemas with possibly different ontologies. When any two such companies merge, or make an effort to service customers in cooperation, there is a need for a uniform schema integration methodology. In some applications like comparison-shopping, there is a need for an illusionary centralized homogeneous information system. Each organization creates its own document structure according to specific requirements. These documents/data may need to be restructured in order to efficiently share the data with other organizations.

We propose an XML Schema integration methodology and a querying mechanism. This allows easy integration and querying of related but heterogeneous schemas. To achieve our objective, we propose an object-oriented data model called XSDM (XML schema Data Model) which is a graphical representation of XML Schema for the purpose of schema integration. We propose a three-layered architecture for XML Schema integration, with each layer presenting an integrated view of the concepts that characterize the layer below. The three layers included are namely pre-integration, comparison and integration. During pre-integration, the schema present in XML Schema notation is read and is converted into the XSDM notation. During the comparison phase of integration, correspondences as well as conflicts between elements are identified. During the integration phase, restructuring and merging of the initial schemas takes place to obtain the global schema. We propose integration policies for integrating element definitions as well as their datatypes and attributes.

In the querying process, the user queries that apply on the global schema are processed by the Global Query engine, which transforms the global query into a set of local queries that can be applied on the local schemas. The local XML documents are queried using XQuery. The results thus obtained from the local query engine are then integrated to form a global document.

Reserchers

Dr. Sanjay Madria

Sakamuri (MS Thesis), UMR

C. Bipin (MS Thesis), UMR

Dr. K. Passi, Laurentian University, Canada

Dr. Mukesh Mohania, IBM Research Lab, India