Acessibilidade / Reportar erro

Managing source schema evolution in web warehouses

Abstract

Web Data Warehouses have been introduced to enable the analysis of integrated Web data. One of the main challenges in these systems is to deal with the volatile and dynamic nature of Web sources. In this work we address the effects of adding/removing/changing Web sources and data items to the Data Warehouse (DW) schema. By managing source evolution we mean the automatic propagation of these changes to the DW. The proposed approach is based on a wrapper/mediator architecture, which reduces the impact of Web source changes on the DW schema. This paper presents this architecture and analyses some selected evolution cases in the context of Web DW.

Web Warehouses; Data Warehouses; Schema Evolution; Interoperability


Full text available only in PDF format

ARTICLES

Managing source schema evolution in web warehouses

Adriana MarottaI; Regina MotzII; Raul RuggiaIII

IInstituto de Computación - Universidad de la República, Fax: 7110469-C.P.: 11300, adriana@fing.edu.uy

IIInstituto de Computación - Universidad de la República, rmotz@fing.edu.uy

IIIInstituto de Computación - Universidad de la República, ruggia@fing.edy.uy

ABSTRACT

Web Data Warehouses have been introduced to enable the analysis of integrated Web data. One of the main challenges in these systems is to deal with the volatile and dynamic nature of Web sources. In this work we address the effects of adding/removing/changing Web sources and data items to the Data Warehouse (DW) schema. By managing source evolution we mean the automatic propagation of these changes to the DW. The proposed approach is based on a wrapper/mediator architecture, which reduces the impact of Web source changes on the DW schema. This paper presents this architecture and analyses some selected evolution cases in the context of Web DW.

Keywords: Web Warehouses, Data Warehouses, Schema Evolution, Interoperability

  • [1] O. Etzioni. The World-Wide Web: Quagmire or Gold Mine? CACM, 39(11):65-68, November 1996.
  • [2] Susan Malaika. Resistance is Futile: The Web Will Assimilate Your Database. Data Engineering Bulletin 21(2): 4-13, (1998)
  • [3] Richard D. Hackathom. Web Farming for the Data Warehouse. The Morgan Kauimann Series in Data Management Systems, Jim Gray, Series Editor. November 1998. ISBN 1-55860-503-7
  • [4] L. Faulstich, M. Spiliopoulou, V. Linnemann. WIND: A Warehouse for Internet Data. Proceedings of British National Conference on Databases, pp. 169-183, London, 1997.
  • [5] Sourav S. Bhowmick. Whom: A Data Model and Algebra for a Web Warehouse. Phd Thesis, Nanyang Technological University, Singapore. August 2000.
  • [6] J. Hammer, H. Garcia-Molina, J Cho, R. Aranha, A. Crespo. Extracting Semistructured Information from the Web. Proc. of the Workshop on Management of Semistructured Data. Tucson, Arizona, May 1997.
  • [7] G. Huck, P. Fankhauser, K. Aberer, E. Neuhold. JEDI: Extracting and Synthesizing Information from the Web. COOPIS98, New York, August, 1998. IEEE Computer Society Press.
  • [8] A. Gutiérrez, R. Motz, D. Viera. Building a database with information extracted from web documents. In Proc. of the International Simposio de la Sociedad Chilena de Computación, SCCC 2000, Santiago, Chile, November 2000.
  • [9] J. Ferreiro, R. Motz, F. Perelló, D. Wonsever. Generación Automática de una Base de Datos desde Documentos de la Web. Congreso Argentino de Ciencias de la Computación, CACIC 2000, Ushuahia, October 2000.
  • [10] A. Do Carmo. Aplicando Integración de Esquemas en un contexto DW-Web. Master thesis. Universidad de la República. Uruguay. Mar 2000.
  • [11] A. Marotta. Data Warehouse Design and Maintenance through schema transformations. Master thesis. Universidad de la República. Uruguay. Oct 2000.
  • [12] R. Kimball. The Data Warehouse Toolkit. J. Wiley & Sons, Inc. 1996
  • [13] L. Silverston, W. H. Inmon, K. Graziano. The Data Model Resource Book. J. Wiley & Sons., 1997
  • [14] E.-P. Lim, W.-K. Ng, S. S. Bhowmick, F. Qinand X. Ye. A Data Warehousing System for Web Information. East Meets West: The First Asia Digital Library Workshop, The University of Hong Kong, Hong Kong, Aug. 6-7, 1998.
  • [15] A. Moura, C. Fernandes. A Metadata Approach to Represent and Visualize Sites on the Web. Proc. of the International Workshop on Information Integration on the Web (WIIW). Rio de Janeiro, Brazil, Apr 2001.
  • [16] C. S. Fernandes. A Graphic Tool for Representing and Visualizing Sites on the Web. Master Thesis. IME, Rio de Janeiro, Brazil, Feb 2001.
  • [17] Resource Description Framework. Model and Syntax Specification. W3C Recommendation. Feb. 1999. http://www.w3.org/TR/REC-rdf-syntax/
  • [18] Y. Zhu, C. Bornhovd, D. Sautner A. Buchmann. Materializing Web Data for OLAP and DSS. First International Conference on Web-Age Information Management (WAIM'00). Shanghai, China, June 21-23,2000.
  • [19] Y. Zhu, C. Bornhövd, and A. Buchmann. Data Transformation for Warehousing Web Data. In the proceedings of 3rd Intl. Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2001), June 2001, San Jose, USA.
  • [20] E. A. Rundensteiner, A. Koeller, X. Zhang. Maintaining Data Warehouses over Changing Information Sources. Communications of the ACM. June 2000 / Vol. 43 No. 6
  • [21] E. A. Rundensteiner, A. J. Lee, A. Nica. On Preserving Views in Evolving Environments. In Proceedings of 4th. Int. Workshop on Knowledge Representation Meets Databases (KRDB'97): Intelligent Access to Heterogeneous Information. Athens, Greece. Aug. 1997,pp. 13.1-13.11.
  • [22] A. J. Lee, A. Nica, E. A. Rundensteiner. The EVE framework: View Evolution in an Evolving Environment. Technical Report WPI-CS-TR-97-4, Worcester Polytechnic Institute, Dept. of Computer Science, 1997.
  • [23] Internet FastFind. PCMagazine. The 1998 Utility Guide - Off-line search utilities. (http://www.zdnet.com/cmag/features/utility/offsch/uos4.htm)
  • [24] HTML Compare. Oct 2001. http://www.htmlcompare.com
  • [25] HTML Diff. Oct 2001. http://www.htmldiff.com
  • [26] Mind-It. Oct 2001. http://www.netmind.com
  • [27] BuzzCity. Oct2001. http://es.buzzcity.com
  • [28] L. Liu, C. Pu, W. Tang. WebCQ - Detecting and Delivering Information Changes on the Web. CIKM 2000. Mc. Lean VA. USA.
  • [29] Change Tracking. http://www.wisosoft.com
  • [30] The World Wide Web Consortium. 2000. http://www.w3c.org
  • [31] Resource Description Framework Schemas. W3C Proposed Recommendation. 2001. http://www.w3.org/TR/1998/WD-rdf-schema
  • [32] A. Moura, M. C. Victorino. Using Mediator and Data Warehouse Technologies for developing an Environmental Decision Support System. Proc. of the International Workshop on Information Integration on the Web (WIIW). Rio de Janeiro, Brazil, Apr 2001.
  • [33] M. C. Victorino. Use of Mediation Technology to Extract Data and Metadata on the Web for Environmental Decision Support Systems. Master Thesis, IME, Rio de Janeiro, Brazil. Feb 2001.
  • [34] Common Warehouse Model. 2001. http://www.omg.com
  • [35] Calvanese D., De Giacomo G., Lenzerini M., Nardi D. and Rosati R. A principled Approach to Data Integration and Reconciliation to Data Warehousing. Proc. of the int. Workshop on design and Management of Data Warehouses (DMDW99) Heidelberg, Germany, June 1999.
  • [36] Regina Motz, Peter Fankauser and Gerald Huck. Schema Integration. Deliverable. IRO-DB(P8629)D4-4/1 -1995.
  • [37] Quix C. Repository Support for Data Warehouse Evolution. Proc. of the int. Workshop on design and Management of Data Warehouses (DMDW99) Heidelberg, Germany, June 1999.
  • [38] Regina Motz. Propagation of Structural Modifications to an Integrated Schema. Proceddings of Advanced Database Information Systems, ADBIS'98, Poland, September 1998.
  • [39] Regina Motz and Peter Fankhauser. Propagation of Semantic Modifications to an Integrated Schema. Proceedings of Cooperative Information Systems, Coopls '98, New York, August 1998.
  • [40] F. Ferradina, S. Lautemann. An Integrated Approach to Schema Evolution for Object Databases. OOIS 1996, London, U.K.
  • [41] S. Lautemann. An Introduction to Schema Versioning in OODBMS. In Proc. of the 7th DEXA Conference, Zurich Switzerland, September 1996. IEEE Computer Society. Workshop Proceedings.
  • [42] F. Ferradina, R. Zicari. Object Database Schema Evolution: are Lazy Updates always equivalent to Immediate Updates? OOPSLA Workshop, September 1993, Washington D.C.

Publication Dates

  • Publication in this collection
    14 Sept 2004
  • Date of issue
    Nov 2002
Sociedade Brasileira de Computação Sociedade Brasileira de Computação - UFRGS, Av. Bento Gonçalves 9500, B. Agronomia, Caixa Postal 15064, 91501-970 Porto Alegre, RS - Brazil, Tel. / Fax: (55 51) 316.6835 - Campinas - SP - Brazil
E-mail: jbcs@icmc.sc.usp.br