package projects ####################################################################### # java images/scripts ####################################################################### # the _javalinks_ macros are the flashy image links at the top right of # the page. _javalinks_ {_imagehome_} _javalinks_ [v=1] { _imagehome_
} ####################################################################### # icons ####################################################################### ## "projects and demonstrations" ## green_title ## demo ## _httpicondemo_ {_httpimg_/demo.gif} _widthdemo_ {450} _heightdemo_ {57} _icondemo_ {

New Zealand Digital Library Project members have developed a range of practical software packages in the course of their research. Much of this software is available for download.

Digital libraries and indexing

Greenstone is the digital library system generates each and every page of this website. It is freely available under the GNU General public license, and has been adopted by numerous other projects. It is used to disseminate information by humanitarian organisations including Global Help Projects and United Nations organisations.
- Our website hosts exotic collections, humanitarian collections, and reference collections.
- Other websites mirror these collections, and host many others.
- Greenstone is available for download.
MG is an enhancement of the Managing Gigabytes full-text retrieval system that provides flexible stemming methods, weighting terms, term frequencies, merged indexes, machine independent indexes, and a port to MSDOS.
- MG is available for download.
PreScript converts PostScript to plain ASCII or HTML. It detects paragraph boundaries, removes hyphenation, and interprets many ligatures.
- Prescript is available for download.

Extracting data and metadata

Sequitur is a method for inferring compositional hierarchies from strings by detecting repetition and factoring it out of the string by forming rules in a grammar. Sequitur is useful for recognizing lexical structure in strings, and excels at very long sequences.
- The Sequitur WWW interface detects structure in text sequences.
- Sequitur is available for download.
Kea is a program for automatically extracting keywords and keyphrases from the full text of documents. Candidate keyphrases are identified using rudimentary lexical processing, features are computed for each candidate, and machine learning is used to determines which candidates should be assigned as keyphrases.
- The Kea WWW interface will extract keyphrases from any web page you specify.
- Kea is available for download.

Browsing interfaces

Phind is an interface for browsing the phrases that occur in a collection. The phrases form an approximation of the topics covered. They are extracted from the noun-phrases occuring in the text, so nonsense phrases and phrases with very little information content are excluded. Each phrase is part of a hierarchy, and the user can browse more specialised topics, or retrieve documents that contain the phrase, at any point.
- Phind has been applied to the web pages of the UN Food and Agriculture Organisation.
Phrasier is a tool to support information seeking activities in a digital library. Its novel design reflects the fact that reading, writing, browsing and searching activities are rarely carried out independently of each other. They overlap and interleave in ways which have not been effectively supported by conventional information retrieval interfaces. Consequenly Phrasier blurs the distinction between writing a document and finding material related to it; between reading a document and finding others on the same or similar topics; between keyword searching and subject browsing.
- A demonstration version of Phrasier is available for download.
Kniles is a web-based system for inserting topic-based hypertext links into existing, large-scale digital library collections. The links are generated at runtime using keyphrases (provided by the author or extracted by Kea), and let you browse collections of documents that do not already have embedded hypertext links.
- Kniles has been used insert links in the text of 45,000 Computer Science Technical Reports that were originally in PostScript format.

Word segmentation

Word segmentation is designed to find word boundaries in languages like Chinese and Japanese, which are (unlike English) written without spaces or other word delimiters (except for punctuation marks). It plays a significant role in applications that use the word as the basic unit due to the fact that machine-readable Chinese text is invariably stored in unsegmented form.
- We have implemented a WWW interface for segmanting Chinese text.
- If your web browsers does not support Chinese text, illustrations of the transformation are available.

_nzdlpagefooter_
April 2000 }