|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--KEAModelBuilder
Builds a keyphrase extraction model from the documents in a given directory. Assumes that the file names for the documents end with ".txt". Assumes that files containing corresponding author-assigned keyphrases end with ".key". Optionally an encoding for the documents/keyphrases can be defined (e.g. for Chinese text). Valid options are:
-l "directory name"
Specifies name of directory.
-m "model name"
Specifies name of model.
-e "encoding"
Specifies encoding.
-d
Turns debugging mode on.
-k
Use keyphrase frequency statistic.
-p
Disallow internal periods.
-x "length"
Sets maximum phrase length (default: 3).
-y "length"
Sets minimum phrase length (default: 1).
-o "number"
The minimum number of times a phrase needs to occur (default: 2).
-s "name of class implementing list of stop words"
Sets list of stop words to used (default: StopwordsEnglish).
-t "name of class implementing stemmer"
Sets stemmer to use (default: IteratedLovinsStemmer).
-n
Do not check for proper nouns.
Constructor Summary | |
KEAModelBuilder()
|
Method Summary | |
void |
buildModel(java.util.Hashtable stems)
Builds the model from the files |
java.util.Hashtable |
collectStems()
Collects the stems of the file names. |
boolean |
getCheckForProperNouns()
Get the M_CheckProperNouns value. |
boolean |
getDebug()
Get the value of debug. |
java.lang.String |
getDirName()
Get the value of dirName. |
boolean |
getDisallowIPeriods()
Get the value of disallowIPeriods. |
java.lang.String |
getEncoding()
Get the value of encoding. |
int |
getMaxPhraseLength()
Get the value of MaxPhraseLength. |
int |
getMinNumOccur()
Get the value of MinNumOccur. |
int |
getMinPhraseLength()
Get the value of MinPhraseLength. |
java.lang.String |
getModelName()
Get the value of modelName. |
java.lang.String[] |
getOptions()
Gets the current option settings. |
Stemmer |
getStemmer()
Get the Stemmer value. |
Stopwords |
getStopwords()
Get the M_Stopwords value. |
boolean |
getUseKFrequency()
Get the value of useKFrequency. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] ops)
The main method. |
void |
saveModel()
Saves the extraction model to the file. |
void |
setCheckForProperNouns(boolean newM_CheckProperNouns)
Set the M_CheckProperNouns value. |
void |
setDebug(boolean newdebug)
Set the value of debug. |
void |
setDirName(java.lang.String newdirName)
Set the value of dirName. |
void |
setDisallowIPeriods(boolean newdisallowIPeriods)
Set the value of disallowIPeriods. |
void |
setEncoding(java.lang.String newencoding)
Set the value of encoding. |
void |
setMaxPhraseLength(int newMaxPhraseLength)
Set the value of MaxPhraseLength. |
void |
setMinNumOccur(int newMinNumOccur)
Set the value of MinNumOccur. |
void |
setMinPhraseLength(int newMinPhraseLength)
Set the value of MinPhraseLength. |
void |
setModelName(java.lang.String newmodelName)
Set the value of modelName. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options controlling the behaviour of this object. |
void |
setStemmer(Stemmer newStemmer)
Set the Stemmer value. |
void |
setStopwords(Stopwords newM_Stopwords)
Set the M_Stopwords value. |
void |
setUseKFrequency(boolean newuseKFrequency)
Set the value of useKFrequency. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public KEAModelBuilder()
Method Detail |
public boolean getCheckForProperNouns()
public void setCheckForProperNouns(boolean newM_CheckProperNouns)
newM_CheckProperNouns
- The new M_CheckProperNouns value.public Stopwords getStopwords()
public void setStopwords(Stopwords newM_Stopwords)
newM_Stopwords
- The new M_Stopwords value.public Stemmer getStemmer()
public void setStemmer(Stemmer newStemmer)
newStemmer
- The new Stemmer value.public int getMinNumOccur()
public void setMinNumOccur(int newMinNumOccur)
newMinNumOccur
- Value to assign to MinNumOccur.public int getMaxPhraseLength()
public void setMaxPhraseLength(int newMaxPhraseLength)
newMaxPhraseLength
- Value to assign to MaxPhraseLength.public int getMinPhraseLength()
public void setMinPhraseLength(int newMinPhraseLength)
newMinPhraseLength
- Value to assign to MinPhraseLength.public boolean getDisallowIPeriods()
public void setDisallowIPeriods(boolean newdisallowIPeriods)
newdisallowIPeriods
- Value to assign to disallowIPeriods.public boolean getUseKFrequency()
public void setUseKFrequency(boolean newuseKFrequency)
newuseKFrequency
- Value to assign to useKFrequency.public boolean getDebug()
public void setDebug(boolean newdebug)
newdebug
- Value to assign to debug.public java.lang.String getEncoding()
public void setEncoding(java.lang.String newencoding)
newencoding
- Value to assign to encoding.public java.lang.String getModelName()
public void setModelName(java.lang.String newmodelName)
newmodelName
- Value to assign to modelName.public java.lang.String getDirName()
public void setDirName(java.lang.String newdirName)
newdirName
- Value to assign to dirName.public void setOptions(java.lang.String[] options) throws java.lang.Exception
-l "directory name"
Specifies name of directory.
-m "model name"
Specifies name of model.
-e "encoding"
Specifies encoding.
-d
Turns debugging mode on.
-k
Use keyphrase frequency statistic.
-p
Disallow internal periods.
-x "length"
Sets maximum phrase length (default: 3).
-y "length"
Sets minimum phrase length (default: 3).
-o "number"
The minimum number of times a phrase needs to occur (default: 2).
-s "name of class implementing list of stop words"
Sets list of stop words to used (default: StopwordsEnglish).
-t "name of class implementing stemmer"
Sets stemmer to use (default: IteratedLovinsStemmer).
-n
Do not check for proper nouns.
setOptions
in interface weka.core.OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
public java.util.Enumeration listOptions()
listOptions
in interface weka.core.OptionHandler
public java.util.Hashtable collectStems() throws java.lang.Exception
java.lang.Exception
public void buildModel(java.util.Hashtable stems) throws java.lang.Exception
java.lang.Exception
public void saveModel() throws java.lang.Exception
java.lang.Exception
public static void main(java.lang.String[] ops)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |