The UIUC Web Integration Repository


Over the past few years, the Web has deepened dramatically- A significant and increasing amount of structured information is hidden on the "deep Web," behind the query interfaces of searchable databases. The goal of this repository is to provide both datasets and tasks for supporting research toward exploring and integrating structured information and searchable databases on the Web. The repository was constructed originally in the MetaQuerier project at the University of Illinois at Urbana-Champaign, and has been joined with datasets contributed from the research community. We are continuing to build up the repository and we welcome donations of additional data. 

This repository contains both datasets and their related tasks. On one hand, each dataset in this repository archives a certain set of data pertinent to Web integration research-- for structured information and searchable databases on the Web. On the other hand, each task documents certain functionality pertinent to such integration (e.g., schema matching)  that has been studied and performed over some datasets. The datasets and their associated tasks are interlinked for cross reference.

Acknowledgement: The creation of this repository is partially supported by grants from National Science Foundation and NCSA.



Using This Repository

This repository is publicly available to facilitate research in the related areas of Web integration. If you publish material based on datasets or tasks in this repository, please refer to the source as follows, to help others to obtain the same datasets and reproduce your experiments.

The UIUC web integration repository. Computer Science Department, University of Illinois at Urbana-Champaign., 2003.

Bibtex entry:
title = "The {UIUC} Web Integration Repository",
year = "2003",
howpublished = "Computer Science Department, University of Illinois at Urbana-Champaign."


We welcome donation of additional datasets. Please contact binhe[at]

For questions and suggestions, please contact binhe[at]