Tesseract - An Open Source OCR Engine. The tesseract extension contains the tesseract program, plus Greenstone plugins to use it during build. You can get the Tesseract extension in two ways: 1. Checkout the tar ball, and unpack it. cd greenstone3/gs2build/ext wget https://svn.greenstone.org/gs2-extensions/tesseract/trunk/tesseract-linux-x64.tar.gz tar xzvf tesseract-linux-x64.tar.gz You will need to open a new terminal and source gs3-setup.sh to have the extensions environment variables set. 2. Checkout the src, and compile it up. cd greenstone3/gs2build/ext svn co https://svn.greenstone.org/gs2-extensions/tesseract/trunk/src tesseract cd tesseract ./CASCADE-MAKE.sh ****************** TesseractPlugins ****************** The tesseract extension comes with two plugins: TesseractTextExtractor and TesseractImagePlugin. TesseractTextExtractor is a helper plugin that will run Tesseract on an image, producing a text file. TesseractImagePlugin can replaceImagePlugin, adding Tesseract OCR ability to it. TODO: Implement TesseractPagedImagePlugin.