#  ---------------------------------------------------------------------
#   DBBrowse - ODL (OAI) Browsing Engine
#    v1.1
#    November 2002
#  ------------------+--------------------+-----------------------------
#   Hussein Suleman  |   hussein@vt.edu   |    www.husseinsspace.com    
#  ------------------+--------------------+-+---------------------------
#   Department of Computer Science          |        www.cs.vt.edu       
#     Digital Library Research Laboratory   |       www.dlib.vt.edu      
#  -----------------------------------------+-------------+-------------
#   Virginia Polytechnic Institute and State University   |  www.vt.edu  
#  -------------------------------------------------------+-------------


Description
-----------

A indexing system to partition a data source by multiple categories
(flat and hierarchical) based on the metadata, where the data source 
is an OAI or ODL archive and the interface to request subsets of the 
data is pseudo-OAI (ODL-Browse). In effect, this provides a mechanism
to browse based on categories in the metadata.


Features
--------

- Works with any OAI or ODL archive
- Strict compliance with ODL-Browse protocol as specified on the ODL
  website (http://oai.dlib.vt.edu/odl)
- No installation or compilation - Perl scripts need only be copied
  (requires a database and DBI database connection module but all 
  other modules are built-in)
- Code layout for separate components or libraries of components
- One installation can easily be used for multiple engines
- ./configure.pl to set all parameters
- Tested with mySQL (uses standard SQL)
- Implements odlbrowse1 query language
- Will index any metadata format
- Supports multiple independent categories
- Supports sorting by controlled or uncontrolled vocabularies in 
  fields
- Hierarchies within a field are understood (eg. 'Index/Main/Sub1')
- Regular expression transformations to pre-process data
- All extensions, configurations, and containers are specified 
  using XML Schema


Requirements
------------

- mySQL or similar database, with access to create tables in a 
database that has already been created
- Perl, with modules DBI, DBD::mySQL (or DBD::Pg or ...)
- Ability to run CGI scripts


Instructions
------------

1. Copy all files with default directory structure into a directory
   from which CGI scripts may be run

2. Change to the ODL-DBBrowse/DBBrowse directory

3. Run './configure.pl' with the parameter being the name of an archive
   to index. For example,
      ./configure.pl jcdl

4. Edit config.xml in the directory corresponding to the archive name
   if necessary - it is preferable to simply rerun configure.pl since the
   script will perform sanity checks as well. 

   *****
     This component will erase all databases whenever it detects 
     config.xml has been changed. You must reharvest after every 
     change using the "harvest now all start" command.
   *****
   
5. Test the harvester
   - run harvest.pl from the archive directory
   - check the harvest.log file to see if new items were processed

6. Run the harvest.pl script from a scheduler such as cron as often as
   desired - 10 minutes is a good start. The scheduling algorithm used
   by the Harvester will only trigger when the time specified in the 
   configuration has passed.

7. Test the browse engine
   - run 'testbrowse.pl' from the archive directory with a browse query 
     (e.g., sort(title)) as a parameter

8. Test the ODL-Browse (extended OAI) web interface
   - use the Repository Explorer at http://purl.org/net/oai_explorer
     and point it to the 'browse.pl' script in the archive directory
     
9. Create additional engines for other archives using the same
   procedure - each archive will have its own directory and must use
   a different database


How to issue queries via the extended OAI interface
---------------------------------------------------

Each query must be encapsulated into an ODL request using the set 
parameter to specify the query. The format of the set parameter
should be:
   qlang/query/start/stop
where:
   qlang = language of query (usually odlbrowse1)
   query = actual query (eg. date(2001-02-02) )
   start = index of first result to return
   stop  = index of last result to return
   
Some sample extended OAI queries are shown below:
 verb=ListRecords&metadataPrefix=oai_dc&set=odlbrowse1/sort(title-,date+)/1/10
 verb=ListRecords&metadataPrefix=oai_cstc&set=odlbrowse1/subject(computer)sort(date)/11/20
 verb=ListIdentifiers&set=odlbrowse1/type(image,sound)/41/60
 verb=ListIdentifiers&set=odlbrowse1/type(image/jpeg)/1/5

The "odlbrowse1" language has the following syntax (linefeeds inserted for readability):
  ( 
    ( category '(' value ( ',' value ) * ')' ) 
    | 
    ( 'sort(' category ( '+' | '-' ) ? ( ',' category ( '+' | '-') ? ) * ')' )
  ) +
where:
 category = name of field as specified during configuration
 value = single value for category as found in metadata (after transformations)

 category(value) means that only those entries which have 'value' in the field
  corresponding to 'category' should be included in the result set
 sort(category) means the results should be sorted according to category

Examples:
 year(2001)
 author(Hussein Suleman)
 author(Hussein)
 sort(year)
 sort(year+,author-)
 year(1997)author(Hussein Suleman)
 author(Hussein Suleman)year(2001,2002)sort(title)
 author(Hussein Suleman,Edward Fox)institution(Virginia Tech)sort(year-,title+)
 

Module Layout
-------------

DBBrowse/template:
 - scripts to interface with the component

lib/Pure:
 - utility modules (in pure-perl)

lib/OAI:
 - OAI template modules
 - OAISP = service provider
 - OAIDP = data provider
 
lib/XOAI:
 - XOAI extensions to the OAI modules
 - XOAISP = service provider
 - XOAIDP = data provider
 - Harvester = scheduled harvester for ODL+OAI archives

lib/ODL/DBBrowse:
 - Browse = browse engine to classify/query XML data
 - BrowseSP = browse engine service provider/harvester
 - BrowseDP = browse engine data provider


Links/Acknowledgements
----------------------

This software is part of the larger project to build componentized
Digital Libraries based on the work of the Open Archives Initiative.
See http://oai.dlib.vt.edu/odl and http://www.openarchives.org for
more information.
