How to Make a Collection - A Quick Introduction Cristian Francu francu@cs.rutgers.edu Jan 12, 2000 First, go to the directory where you installed GSDL. In order to make sure that you can run certain perl scripts you should run either setup.bash or setup.csh, depending on the shell you're using: source setup.bash or source setup.csh This scripts set variables GSDLHOME, GSDLOS and PATH. Of course you can include them in .cshrc or .profile in order to have them set automatically. Next, you should run mkcol.pl in order to create the collection. This perl script creates the necessary environment for the collection, like directories and the file collect.cfg. The script mkcol.pl is located in the directory bin/script This directory contains all the scripts that you'll need, so it's a good idea to peek at it. If you run mkcol.pl it will tell you how to use it: $ mkcol.pl usage: mkcol.pl [options] collection-name options: -creator email Your email address -maintainer email The current maintainer's email address -public true|false If this collection has anonymous access -beta true|false If this collection is still under development After running mkcol.pl the collection will reside in collect/. The next thing you should do is edit the file collect//etc/collect.cfg You should do at least two things: one is to add a line like this: collectionmeta iconcollection "http://sequence.rutgers.edu/~gsdl/collect/cstr/images/cstr.jpg" This line will set the icon of the collection (the image that users will click to access the collection once it's on-line). Make sure you type a proper URL of the image between quotes. You should do this at this moment, because if you want to change the icon you have to rebuild the collection, which is a time consuming operation. Hey, gurus, is there any simpler way to change the icon of the collection once the collection is already built? Now, the second thing you should do in the collect.cfg file is add the proper plugin on the lines: plugin GMLPlug plugin TEXTPlug plugin ArcPlug plugin RecPlug The plugins you need depend on the format of your documents. If the documents are plain text, or GSDL's own format named GML you don't need to change anything. If your documents are in other formats you should look for a proper plugin in the directory perllib/plugins A very useful plugin is HTMLPlug which can process files with .html and .htm file extensions. You would normally replace the TEXTPlug plugin with the one you want to use. Say your collection is in html format, than you would change the plugin lines to: plugin GMLPlug plugin HTMLPlug plugin ArcPlug plugin RecPlug You're finally done with collect.cfg. Suppose you are creating a collection named "tutorial". The next thing you should do is go to the directory collect/tutorial and create two directories, import and archives: cd collect/tutorial mkdir import mkdir archives The material to be indexed should reside in 'import' directory. You can either copy it there, or create links to its directory. The material to be indexed can contain directories and subdirectories. The building script will go recursively into them and search for files to be indexed. This is what the plugin RecPlug does. So, the next thing to do is make sure you have the documents to be indexed in the import directory. You are now ready to run the processing scripts. The fastest way to build a collection is in two steps: 1. process the documents in 'import' directory and generate their equivalent in .gml format in 'archives' directory 2. process the documents in 'archives' directory (now in .gml format) and create the necessary indexes in 'building' directory For the first step just run the script import.pl: import.pl tutorial Depending on the size of your documents this might take between minutes and hours. You might also want to redirect stdout and stderr to capture the possible errors to files. You can also change the verbosity of the script, just run it without arguments and you'll get a complete list of options. For the second step run the script buildcol.pl: buildcol.pl tutorial Again, depending on the size of your material to be processed this may take minutes to hours. Keep in mind that you must have enough space on your hard drive for both steps, as the .gml documents eat up about the same amount as the original documents. If everything went fine, you should now have a directory named 'building' under collect/tutorial. That directory contains the results of the processing of your documents. In order to use it you have to move the content of 'building' directory to a new directory named 'index'. First create it: cd collect/tutorial mkdir index Then move the content: mv building/* index As long as your collect.cfg file contains the line public true and the collection built successfully the gsdl software should automatically notice your new collection. The collection should now appear on the main page, which can be accessed at: http://hostname.domain.edu/cgi-bin/library?a=p&p=home (replace hostname.domain.edu with the name of your server.) Keep in mind these instructions are just a jump start to get you quickly on the run. There are more options you can use and you can explore more of GSDL by reading the documentation carefully. You can also email the creators for further details.