
MetaQuerier: Exploring and Integrating the Deep Web
This research aims
at enabling effective access to structured information sources on
the Internet. Over the past few years, the Web has deepened dramatically-
A significant and increasing amount of information is hidden on the "deep"
Web, behind the query interfaces of searchable databases. There
are numerous such autonomous and heterogeneous sources, each with a different
schema and native query constraints. Because current crawlers cannot effectively
query databases, such data is invisible to traditional search engines,
and thus remains largely hidden from users.
We propose to build
a metaquery system, to help users in finding and querying
online databases effectively and uniformly. Our efforts aim at opening
up the deep Web to users, by building a MetaQuerier; see the architecture
below. On this wild frontier of the deep Web, the MetaQuerier will address
the challenges of both exploration and integration. Our goal
is thus two fold: First, to make the deep Web systematically accessible:
the MetaExplorer will discover sources on the deep Web to build
a searchable repository, in order to help users find sources useful for
their information need. Second, to make the deep Web uniformly usable:
the MetaIntegrator will help users interact with online databases
to ask queries.
Projects
-
First, the MetaExplorer
project focuses on the discovery, modeling, and structuring
of
databases on the Web, to build a searchable source repository. Essentially,
this MetaExplorer project will develop a "search engine" of Web
databases: It will develop crawlers for efficiently discovering databases
on the Internet, design models for representing these databases, develop
wrappers for automatically extracting their model parameters (e.g., schema
details on their query interfaces), and structure and index a searchable
repository of Web sources.
-
Second, the MetaIntegrator
project focuses on the integration issues of online sources-- i.e.,
to bring sources coherently together for query answering. Specifically,
we will investigate source selection, query mediation, and
schema
integration, for building the MetaIntegrator. In studying large-scale
integration, these thrusts will benefit from the source repository of the
companion MetaExplorer. We will investigate the key enabling technology
of dynamic ad-hoc information integration. In contrast to a traditional
static system, our MetaIntegrator is dynamic (as new sources may
be added any time when they are discovered) and essentially requires ad-hoc
integration, which must dynamically select and bring together different
sources to answer a query.
Given the pressing need
for effective access to the deep Web, we believe the synergy between the
exploration and integration focuses of the two sub-projects will together
bring a more complete and timely solution for realizing our MetaQuerier
goal.

Funding
We gratefully acknowledge
our funding sources:
People
Publications
- Context-Aware Wrapping: Synchronized Data Extraction. S.-L. Chuang, K. C.-C. Chang, and C. Zhai. To appear in Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, September 23-28 2007. [PDF]
-
Accessing the Deep Web: A Survey.
B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Communications of the ACM (CACM), 50(5):94-101, May 2007. [PDF]
-
- Collaborative Wrapping: A Turbo Framework for Web Data Extraction. S.-L. Chuang, K. C.-C. Chang, and C. Zhai. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, April 2007. [PDF] (poster)
- Automatic Complex Schema Matching across Web Query Interfaces: A Correlation Mining Approach. B. He and K. C.-C. Chang. ACM Transactions on Database Systems (TODS), 31(1), March 2006. [PDF]
Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the 31st Very Large Data Bases Conference (VLDB 2005), Trondheim, Norway, August 2005. [PDF]
-
Making Holistic Schema Matching Robust: An Ensemble Approach. B. He and K. C.-C. Chang. In Proceedings of the 2005 ACM SIGKDD Conference (KDD 2005) (Full Paper), Chicago, Illinois, August 2005. [PDF]
-
Query Routing: Finding Ways in the Maze of the Deep Web. G. Kabra, C. Li, and K. C.-C. Chang. In
Proceedings of the ICDE International Workshop on Challenges in Web Information Retrieval and Integration (ICDE-WIRI 2005), Tokyo, Japan, April 2005. [PDF]
-
Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. K. C.-C. Chang, B. He, and Z. Zhang. In
Proceedings of the Second Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, California, January 2005. [PDF]
-
Mining Semantics for Large Scale Integration on the Web: Evidences, Insights and Challenges. K. C.-C. Chang, B. He, and Z. Zhang. SIGKDD Explorations, 6(2):67-76, December 2004. Invited paper. [PDF]
-
A Holistic Paradigm for Large Scale Schema Matching. B. He and K. C.-C. Chang. SIGMOD Record, 33(4):20-25, December 2004. Invited paper. [PDF]
-
Organizing Structured Web Sources by Query
Schemas: A Clustering Approach. B. He, T. Tao, and K. C.-C. Chang.
In Proceedings of the 13th Conference on Information and Knowledge
Management (CIKM 2004) (Full Paper), Washington D.C., November 2004. [PDF]
-
Structured Databases on the Web: Observations and
Implications. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang.
SIGMOD Record, 33(3):61-70, September 2004. [PDF]
-
MetaQuerier over the Deep Web: Shallow Integration across Holistic Sources. K. C.-C. Chang, B. He, and Z. Zhang. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb'04), Toronto, Canada, August 2004. [PDF]
-
On-the-fly Constraint Mapping across Web Query Interfaces. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb'04), Toronto, Canada, August 2004. [PDF]
-
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach. B. He, K. C.-C. Chang, and J. Han. In Proceedings of the 2004 ACM SIGKDD Conference (KDD 2004) (Full Paper), Seattle, Washington, August 2004. [PDF]
-
Mining Complex Matchings across Web Query Interfaces. B. He, K. C.-C. Chang, and J. Han. In
Proceedings of the 9th ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD'04) (Full Paper), Paris, France, June 2004. [PDF]
-
Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings
of the 2004 ACM SIGMOD Conference (SIGMOD 2004), Paris, France,
June 2004. [PDF]
-
Clustering Structured Web Sources: A Schema-based, Model-Differentiation Approach. B. He, T. Tao, and K. C.-C. Chang. In Proceedings of the EBDT Workshop on Clustering Information over the Web (EDBT-ClustWeb'04), Crete, Greece, March 2004. An expanded version of this paper, invited to be a part of the Current Trends in Database Technology volume, is published in the Springer-Verlag Lecture Notes in Computer Science Series Vol. 3268. [PDF]
-
Statistical Schema Matching
across Web Query Interfaces. B. He and K. C.-C. Chang. In Proceedings
of the 2003 ACM SIGMOD Conference (SIGMOD 2003), San Diego, California,
June 2003. [PDF]
-
Approximate Query
Translation Across Heterogeneous Information Sources. K. C.-C. Chang
and H. Garcia-Molina. In Proceedings of the 26th VLDB Conference (VLDB
2000), pages 566-577, Cairo, Egypt, September 2000. [Extended
Version]
Technical Reports
- A Structure-Driven Yield-Aware Web Form Crawler:
Building a Database of Online Databases. B. He, C. Li, D. Killian, M. Patel, Y. Tseng, and K. C.-C. Chang. UIUCDCS-R-2006-2752, Department of Computer Science, UIUC, July 2006. [PDF]
Tutorials
- Accessing the Web: From Search to Integration.
K. C.-C. Chang and J. Cho. In Proceedings of the 2006 ACM SIGMOD
Conference (SIGMOD 2006), Chicago, June 2006. Tutorial description. [PDF]
[Part II:
Web Integration; Bibliography]
Demos
- Online Demo: Query capability extraction for understanding Web query interfaces
-
MetaQuerier: Querying Structured Web Sources On-the-fly. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 2005 ACM SIGMOD Conference (SIGMOD 2005), System Demonstration, Baltimore, Maryland, June 2005. [PDF]
-
MetaQuerier: Querying Structured Web Sources On-the-fly. B. He, Z. Zhang, and K. C.-C. Chang. In Second Midwest Database Research Symposium, Chicago, Illinois, April 2005.
-
Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings
of the 21st International Conference on Data Engineering (ICDE 2005), System Demonstration, Tokyo, Japan, April 2005. [PDF]
-
Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In NSF
Information and Data Management (IDM) Workshop 2004, Boston, Massachussett, October 2004.
-
Knocking the Door to the Deep Web: Integrating Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings
of the 2004 ACM SIGMOD Conference (SIGMOD 2004), System Demonstration, Paris, France, June 2004. [PDF]
-
Toward a MetaQuerier for the Deep Web: Integrating Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In First Midwest Database Research Symposium, Chicago, Illinois, April 2004.
-
Knocking the Doors to the Deep Web: Understanding Web Query Interfaces. Z. Zhang, B. He, and K. C.-C. Chang. In NSF Information and Data Management (IDM) Workshop 2003, Seattle, Washington, September 2003.
Datasets