IceCite obtained from https://github.com/ckorzen/icecite
IceCite for Greenstone was built 19 July 2017 on the research net linux machine. The version that was checked out from git and which was compiled successfully on 5 Oct 2017 produced strange sequences of alphanumeric interspersed with what could be the regular contents when run over the 24.pdf test file in step 4c. So we've since committed the version compiled on 19 July instead, as it had fewer strange contents upon conversion.
LICENSE INFO
- Icecite has an Apache license https://github.com/ckorzen/icecite/blob/master/LICENSE
this is compatible with GPL3, which we use with GS3
- BouncyCastle jars used by Icecite have an MIT license, which Dr Bainbridge says we once already worked out was compatible with the license we use for GS(3).
https://www.bouncycastle.org/licence.html
USING THE ICECITE TOOL TO CONVERT FROM PDF TO TXT
- Icecite needs Java 8. For compiling, you need JDK 8, for running, either JDK 8 or JRE 8 will suffice.
- you will need maven installed
- you will need to be able to run git commands
1. In order to compile up Icecite, you will have to set up the environment for JDK8:
export JAVA_HOME=/opt/java8
export PATH=$JAVA_HOME/bin:$PATH
2. PROXY STEP WHEN ON MACHINES THAT AREN'T RESEARCH NET:
WARNING: Behind a proxy, it's hard to compile successfully. It gets stuck timing out trying to download different files on different attempts to run "mvn install". But running "mvn install" works fine on the research net linux machine and compiles relatively quickly, taking no more than a couple of minutes.
If you're behind a proxy, make sure you've set the https_proxy environment variable correctly.
The proxy also needs to be set for maven. Refer to http://maven.apache.org/guides/mini/guide-proxies.html and https://stackoverflow.com/questions/12807112/problems-after-maven-installation-mvn-install-tries-to-download-unreachable-fi
You can create a settings.xml file, if one does not already exist, and put the contents seen on that page into it and edit it accordingly.
e.g. emacs ~/.m2/settings.xml
example-proxytruehttpproxy.cms.waikato.ac.nz3128USERNAMEPWDwww.waikato.ac.nz|*.greenstone.org
(Check the permissions. The mvn install step seems to require that All users have read access to settings.xml, but it will need to be made private as it contains the proxy pwd.)
3. Then get and compile Icecite following the instructions at https://github.com/ckorzen/icecite
git clone https://github.com/ckorzen/icecite.git --recursive
cd icecite
git pull --recurse-submodules
cd pdf-parent/
mvn install
4. Once compiled, run Icecite. The general instructions for running IceCite are at https://github.com/ckorzen/icecite
Remember, if you're running IceCite in a new terminal, ensure Java 8 is set up on the environment. This time around, it can be either a JDK8 or a JRE8.
export JAVA_HOME=/opt/java8/
export PATH=$JAVA_HOME/bin:$PATH
In order to run Icecite's PDF to text conversion abilities, you will need to use its "PDF-CLI" (PDF command line interface). This is located in icecite's pdf-cli subfolder. So go there and run the conversion executable:
cd ../../
cd icecite/pdf-cli
java -jar target/pdf-cli-*-jar-with-dependencies.jar [options] [