IWRandom Query Interfaces
This dataset collects query interfaces of 33 deep Web sources randomly sampled from Invisible-Web.net.
The purpose of this dataset is to provide a good diversity of query interfaces from various domains.
As all deep Web sources from Invisible-Web.net are linearly numbered with an ID, we thus draw random samples from the set by the source ID, which cover 16 (out of 18) top level domains listed in Invisible-Web.net.
The dataset was created in November 2003. It was collected manually from Invisible-Web.net by drawing random samples from the sequentially numbered deep Web sources. For each source, we archived the query interface page.
Zhen Zhang, Bin He and Kevin Chen-Chuan Chang
Computer Science Department
University at Illinois at Urbana-Champaign
Date Created: November 2003
3. The Dataset
The dataset contains 33 query interfaces from 16 (out of 18) top level domains listed in Invisible-Web.net. The domains covered in the dataset include: Art and Architecture, Bibiographics, Business and Investing, Computers and Computing, Education, Entertainment, Goverment Info and Data, US and World History, Legal and Criminal Info, Search for People, Public Record, Reference, Science, Social Science and Transportation.
3.2. Browsable Dataset
The dataset is browsable through
The page lists all the 33 query interfaces, with pointers to each of them.
3.3. Downloadable Dataset
The whole dataset can also be downloaded as a gzipped tar ball from
After untar and decompression, the directory contains a browsable.html page with pointers to each query interface in the dataset, and a browse directory, which contains 33 subdirectories with each corresponding to an archived query interface.
4. Past Usage
5. AcknowledgementThe creation of this dataset is partially supported by grants from National Science
Foundation and NCSA.
Back to IWRandom Query Interfaces