Class KEAKeyphraseExtractor

java.lang.Object
  |
  +--KEAKeyphraseExtractor
All Implemented Interfaces:
weka.core.OptionHandler

public class KEAKeyphraseExtractor
extends java.lang.Object
implements weka.core.OptionHandler

Extracts keyphrases from the documents in a given directory. Assumes that the file names for the documents end with ".txt". Puts extracted keyphrases into corresponding files ending with ".key" (if those are not already present). Optionally an encoding for the documents/keyphrases can be defined (e.g. for Chinese text). Documents for which ".key" exists, are used for evaluation. Valid options are:

-l "directory name"
Specifies name of directory.

-m "model name"
Specifies name of model.

-e "encoding"
Specifies encoding.

-n
Specifies number of phrases to be output (default: 5).

-d
Turns debugging mode on.

-a
Also write stemmed phrase and score into ".key" file.


Constructor Summary
KEAKeyphraseExtractor()
           
 
Method Summary
 java.util.Hashtable collectStems()
          Collects the stems of the file names.
 void extractKeyphrases(java.util.Hashtable stems)
          Builds the model from the files
 boolean getAdditionalInfo()
          Get the value of AdditionalInfo.
 boolean getDebug()
          Get the value of debug.
 java.lang.String getDirName()
          Get the value of dirName.
 java.lang.String getEncoding()
          Get the value of encoding.
 java.lang.String getModelName()
          Get the value of modelName.
 int getNumPhrases()
          Get the value of numPhrases.
 java.lang.String[] getOptions()
          Gets the current option settings.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 void loadModel()
          Loads the extraction model from the file.
static void main(java.lang.String[] ops)
          The main method.
 void setAdditionalInfo(boolean newAdditionalInfo)
          Set the value of AdditionalInfo.
 void setDebug(boolean newdebug)
          Set the value of debug.
 void setDirName(java.lang.String newdirName)
          Set the value of dirName.
 void setEncoding(java.lang.String newencoding)
          Set the value of encoding.
 void setModelName(java.lang.String newmodelName)
          Set the value of modelName.
 void setNumPhrases(int newnumPhrases)
          Set the value of numPhrases.
 void setOptions(java.lang.String[] options)
          Parses a given list of options controlling the behaviour of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KEAKeyphraseExtractor

public KEAKeyphraseExtractor()
Method Detail

getAdditionalInfo

public boolean getAdditionalInfo()
Get the value of AdditionalInfo.

Returns:
Value of AdditionalInfo.

setAdditionalInfo

public void setAdditionalInfo(boolean newAdditionalInfo)
Set the value of AdditionalInfo.

Parameters:
newAdditionalInfo - Value to assign to AdditionalInfo.

getNumPhrases

public int getNumPhrases()
Get the value of numPhrases.

Returns:
Value of numPhrases.

setNumPhrases

public void setNumPhrases(int newnumPhrases)
Set the value of numPhrases.

Parameters:
newnumPhrases - Value to assign to numPhrases.

getDebug

public boolean getDebug()
Get the value of debug.

Returns:
Value of debug.

setDebug

public void setDebug(boolean newdebug)
Set the value of debug.

Parameters:
newdebug - Value to assign to debug.

getEncoding

public java.lang.String getEncoding()
Get the value of encoding.

Returns:
Value of encoding.

setEncoding

public void setEncoding(java.lang.String newencoding)
Set the value of encoding.

Parameters:
newencoding - Value to assign to encoding.

getModelName

public java.lang.String getModelName()
Get the value of modelName.

Returns:
Value of modelName.

setModelName

public void setModelName(java.lang.String newmodelName)
Set the value of modelName.

Parameters:
newmodelName - Value to assign to modelName.

getDirName

public java.lang.String getDirName()
Get the value of dirName.

Returns:
Value of dirName.

setDirName

public void setDirName(java.lang.String newdirName)
Set the value of dirName.

Parameters:
newdirName - Value to assign to dirName.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options controlling the behaviour of this object. Valid options are:

-l "directory name"
Specifies name of directory.

-m "model name"
Specifies name of model.

-e "encoding"
Specifies encoding.

-n
Specifies number of phrases to be output (default: 5).

-d
Turns debugging mode on.

-a
Also write stemmed phrase and score into ".key" file.

Specified by:
setOptions in interface weka.core.OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current option settings.

Specified by:
getOptions in interface weka.core.OptionHandler
Returns:
an array of strings suitable for passing to setOptions

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Returns:
an enumeration of all the available options

collectStems

public java.util.Hashtable collectStems()
                                 throws java.lang.Exception
Collects the stems of the file names.

java.lang.Exception

extractKeyphrases

public void extractKeyphrases(java.util.Hashtable stems)
                       throws java.lang.Exception
Builds the model from the files

java.lang.Exception

loadModel

public void loadModel()
               throws java.lang.Exception
Loads the extraction model from the file.

java.lang.Exception

main

public static void main(java.lang.String[] ops)
The main method.