IWRandom Query Interfaces

http://metaquerier.cs.uiuc.edu/repository/datasets/iwrandom/
November 2003


1. Overview

This dataset collects query interfaces of 33 deep Web sources randomly sampled from Invisible-Web.net. The purpose of this dataset is to provide a good diversity of query interfaces from various domains. As all deep Web sources from Invisible-Web.net are linearly numbered with an ID, we thus draw random samples from the set by the source ID, which cover 16 (out of 18) top level domains listed in Invisible-Web.net.

2. Creation

The dataset was created in November 2003. It was collected manually from Invisible-Web.net by drawing random samples from the sequentially numbered deep Web sources. For each source, we archived the query interface page.

Original Owners

Zhen Zhang, Bin He and Kevin Chen-Chuan Chang
Computer Science Department
University at Illinois at Urbana-Champaign
zhang2@uiuc.edu

Date Created: November 2003

3. The Dataset

3.1. Summary

The dataset contains 33 query interfaces from 16 (out of 18) top level domains listed in Invisible-Web.net. The domains covered in the dataset include: Art and Architecture, Bibiographics, Business and Investing, Computers and Computing, Education, Entertainment, Goverment Info and Data, US and World History, Legal and Criminal Info, Search for People, Public Record, Reference, Science, Social Science and Transportation.

3.2. Browsable Dataset

The dataset is browsable through http://metaquerier.cs.uiuc.edu/repository/datasets/iwrandom/browsable.html. The page lists all the 33 query interfaces, with pointers to each of them.

3.3. Downloadable Dataset

The whole dataset can also be downloaded as a gzipped tar ball from http://metaquerier.cs.uiuc.edu/repository/datasets/iwrandom/downloadable.tar.gz.
After untar and decompression, the directory contains a browsable.html page with pointers to each query interface in the dataset, and a browse directory, which contains 33 subdirectories with each corresponding to an archived query interface.

4. Past Usage

5. Acknowledgement

The creation of this dataset is partially supported by grants from National Science Foundation and NCSA.
Back to IWRandom Query Interfaces