package projects
#######################################################################
# java images/scripts
#######################################################################
# the _javalinks_ macros are the flashy image links at the top right of
# the page.
_javalinks_ {_imagehome_}
_javalinks_ [v=1] {
_imagehome_
}
#######################################################################
# icons
#######################################################################
## "projects and demonstrations" ## green_title ## demo ##
_httpicondemo_ {_httpimg_/demo.gif}
_widthdemo_ {450}
_heightdemo_ {57}
_icondemo_ {
New Zealand Digital Library Project members have developed a range
of practical software packages in the course of their research.
Much of this software is available for
download.
Digital libraries and indexing
- Greenstone
is the digital library system generates each and every page of
this website.
It is freely available under the GNU General public license,
and has been adopted by numerous other projects.
It is used to disseminate information by humanitarian
organisations including Global Help Projects and
United Nations organisations.
- Our website hosts exotic collections, humanitarian collections, and reference collections.
- Other websites mirror these collections, and host many others.
- Greenstone is available for download.
- MG
is an enhancement of the
Managing Gigabytes
full-text retrieval system that provides flexible stemming methods,
weighting terms, term frequencies, merged indexes,
machine independent indexes, and a port to MSDOS.
- PreScript
converts PostScript to plain ASCII or HTML.
It detects paragraph boundaries, removes hyphenation,
and interprets many ligatures.
Extracting data and metadata
-
Sequitur
is a method for inferring compositional hierarchies from strings by detecting
repetition and factoring it out of the string by forming rules in a
grammar.
Sequitur is useful for recognizing lexical structure in strings,
and excels at very long sequences.
- Kea
is a program for automatically extracting keywords and keyphrases
from the full text of documents.
Candidate keyphrases are identified using rudimentary lexical processing,
features are computed for each candidate, and machine learning is used to
determines which candidates should be assigned as keyphrases.
Browsing interfaces
-
Phind
is an interface for browsing the phrases that occur in a collection.
The phrases form an approximation of the topics covered.
They are extracted from the noun-phrases occuring in the text,
so nonsense phrases and phrases with very little information content
are excluded.
Each phrase is part of a hierarchy,
and the user can browse more specialised topics,
or retrieve documents that contain the phrase, at any point.
- Phrasier
is a tool to support information seeking activities in a digital library.
Its novel design reflects the fact that reading, writing, browsing and
searching activities are rarely carried out independently of each other.
They overlap and interleave in ways which have not been effectively supported
by conventional information retrieval interfaces.
Consequenly Phrasier blurs the distinction between
writing a document and finding material related to it;
between reading a document and finding others on the same or similar topics;
between keyword searching and subject browsing.
- A demonstration version of Phrasier is available for download.
- Kniles
is a web-based system for inserting topic-based hypertext links
into existing, large-scale digital library collections.
The links are generated at runtime using keyphrases (provided by the author
or extracted by Kea), and let you browse collections of documents that
do not already have embedded hypertext links.
Word segmentation
-
Word segmentation
is designed to find word boundaries in languages like Chinese and
Japanese, which are (unlike English) written without spaces
or other word delimiters (except for punctuation marks).
It plays a significant role in applications that use the word as the
basic unit due to the fact that machine-readable Chinese text
is invariably stored in unsegmented form.
- We have implemented a
WWW interface
for segmanting Chinese text.
- If your web browsers does not support Chinese text,
illustrations of the transformation are available.
_nzdlpagefooter_
April 2000
}