November 2002

1. Overview

This dataset is a collection of deep Web sources in 4 representative domains: Books, Automobiles, Movies, and Music Records (and thus the name BAMM). Each domain contains about 50 sources. For each source, only the attribute names in the Web query interfaces are extracted to form its "query schema." BAMM does not contain other attribute information such as attribute types or values.

2. Creation

The dataset was originally constructed in November 2002 by manual collection from Web directories (e.g., now, A source is considered to be a deep Web source if it provides structured information by accepting queries over the attributes on its query interfaces. For each query interface, only attribute names are extracted. For instance, for the advanced book query of, we extract its attribute names as author, title, subject, ISBN, publisher, .... 

3. The Dataset

3.1. Summary

The domains and number of sources in each domain are summarized as below:

Domain # of sources
Books 55
Automobiles 55
Movies 52
Music Records 49

3.2. Browsable Dataset

The dataset is browsable through

The 4 domains are listed on the above page. Following links to respective domains, the extracted attribute names for each source are listed.

3.3. Downloadable Dataset

The whole dataset can be downloaded as a gzipped tar ball from This tar ball contains four domain files, each file for one domain respectively. Each domain file contains a set of sources. Each source is denoted by <SCHEMA> and followed by a set of attribute names. For instance, below is a sample domain file with two sources.  


5. Acknowledgement

The creation of this dataset is partially supported by grants from National Science Foundation and NCSA.
