This is a group assignment - you may work in groups of up to 3 students. Write a software component, and associated test/demo code, to perform automatic classification on a set of metadata records, using a pre-defined classification system. The data source to use is the NDLTD ETD Union Catalog at OAI-PMH baseURL [http://alcme.oclc.org/ndltd/servlet/OAIHandler] The classification system to use is that from dmoz.org - there are XML downloadable versions of all dmoz data linked off the website. You need to: * Obtain the data from the source - DO NOT bombard the data provider with requests from a harvester that is being tested - thoroughly test your harvesting on a local data provider first. * Obtain the classification system from dmoz and transform it into a form you can use. * Apply a classification algorithm (direct searching, machine learning, reverse searching, etc.). * Create a simple user interface to browse through the data in a dmoz manner, but with your metadata. Your system: * should be portable, so packaging and reinstallation must be trivial - supply an installation script if necessary. Java or Perl are recommended as programming languages. * should work in "real-time" - that is, the harvester must be able to handle incremental updates to the source data. * may assume that the classification system is fixed. * should have a machine interface to the classification subsystem so the user interface is not directly linked into the system - this machine interface should use SOAP. You must support at least a request to return a list of metadata entries in a given category. You will be required to do a short demo of your system (including re-installation in a different location and possibly harvesting from a different baseURL), soon after handin. This demo will largely determine your mark for the assignment! As a handin, submit the software component, with associated documentation (README, INSTALL.txt, etc.).