|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--weka.filters.Filter | +--KEAFilter
This filter converts the incoming data into data appropriate for keyphrase classification. It assumes that the dataset contains two string attributes. The first attribute should contain the text of a document. The second attribute should contain the keyphrases associated with that document (if present). The filter converts every instance (i.e. document) into a set of instances, one for each word-based n-gram in the document. The string attribute representing the document is replaced by some numeric features, the estimated probability of each n-gram being a keyphrase, and the rank of this phrase in the document according to the probability. Each new instances also has a class value associated with it. The class is "true" if the n-gram is a true keyphrase, and "false" otherwise. Of course, if the input document doesn't come with author-assigned keyphrases, the class values for that document will be missing.
Field Summary |
Fields inherited from class weka.filters.Filter |
m_NewBatch |
Constructor Summary | |
KEAFilter()
|
Method Summary | |
boolean |
batchFinished()
Signify that this batch of input to the filter is finished. |
boolean |
getCheckForProperNouns()
Get the M_CheckProperNouns value. |
boolean |
getDebug()
Get the value of Debug. |
boolean |
getDisallowInternalPeriods()
Get whether the supplied columns are to be processed |
int |
getDocumentAtt()
Get the value of DocumentAtt. |
int |
getKeyphrasesAtt()
Get the value of KeyphraseAtt. |
boolean |
getKFused()
Gets whether keyphrase frequency attribute is used. |
int |
getMaxPhraseLength()
Get the value of MaxPhraseLength. |
int |
getMinNumOccur()
Get the value of MinNumOccur. |
int |
getMinPhraseLength()
Get the value of MinPhraseLength. |
java.lang.String[] |
getOptions()
Gets the current settings of the filter. |
int |
getProbabilityIndex()
Returns the index of the phrases' probabilities in the output ARFF file. |
int |
getRankIndex()
Returns the index of the phrases' ranks in the output ARFF file. |
int |
getStemmedPhraseIndex()
Returns the index of the stemmed phrases in the output ARFF file. |
Stemmer |
getStemmer()
Get the Stemmer value. |
Stopwords |
getStopwords()
Get the M_Stopwords value. |
int |
getUnstemmedPhraseIndex()
Returns the index of the unstemmed phrases in the output ARFF file. |
java.lang.String |
globalInfo()
Returns a string describing this filter |
boolean |
input(weka.core.Instance instance)
Input an instance for filtering. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
void |
setCheckForProperNouns(boolean newM_CheckProperNouns)
Set the M_CheckProperNouns value. |
void |
setDebug(boolean newDebug)
Set the value of Debug. |
void |
setDisallowInternalPeriods(boolean disallow)
Set whether selected columns should be processed. |
void |
setDocumentAtt(int newDocumentAtt)
Set the value of DocumentAtt. |
boolean |
setInputFormat(weka.core.Instances instanceInfo)
Sets the format of the input instances. |
void |
setKeyphrasesAtt(int newKeyphrasesAtt)
Set the value of KeyphrasesAtt. |
void |
setKFused(boolean flag)
Sets whether keyphrase frequency attribute is used. |
void |
setMaxPhraseLength(int newMaxPhraseLength)
Set the value of MaxPhraseLength. |
void |
setMinNumOccur(int newMinNumOccur)
Set the value of MinNumOccur. |
void |
setMinPhraseLength(int newMinPhraseLength)
Set the value of MinPhraseLength. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options controlling the behaviour of this object. |
void |
setStemmer(Stemmer newStemmer)
Set the Stemmer value. |
void |
setStopwords(Stopwords newM_Stopwords)
Set the M_Stopwords value. |
Methods inherited from class weka.filters.Filter |
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public KEAFilter()
Method Detail |
public boolean getCheckForProperNouns()
public void setCheckForProperNouns(boolean newM_CheckProperNouns)
newM_CheckProperNouns
- The new M_CheckProperNouns value.public Stopwords getStopwords()
public void setStopwords(Stopwords newM_Stopwords)
newM_Stopwords
- The new M_Stopwords value.public Stemmer getStemmer()
public void setStemmer(Stemmer newStemmer)
newStemmer
- The new Stemmer value.public int getMinNumOccur()
public void setMinNumOccur(int newMinNumOccur)
newMinNumOccur
- Value to assign to MinNumOccur.public int getMaxPhraseLength()
public void setMaxPhraseLength(int newMaxPhraseLength)
newMaxPhraseLength
- Value to assign to MaxPhraseLength.public int getMinPhraseLength()
public void setMinPhraseLength(int newMinPhraseLength)
newMinPhraseLength
- Value to assign to MinPhraseLength.public int getStemmedPhraseIndex()
public int getUnstemmedPhraseIndex()
public int getProbabilityIndex()
public int getRankIndex()
public int getDocumentAtt()
public void setDocumentAtt(int newDocumentAtt)
newDocumentAtt
- Value to assign to DocumentAtt.public int getKeyphrasesAtt()
public void setKeyphrasesAtt(int newKeyphrasesAtt)
newKeyphrasesAtt
- Value to assign to KeyphrasesAtt.public boolean getDebug()
public void setDebug(boolean newDebug)
newDebug
- Value to assign to Debug.public void setKFused(boolean flag)
public boolean getKFused()
public boolean getDisallowInternalPeriods()
public void setDisallowInternalPeriods(boolean disallow)
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-K
Specifies whether keyphrase frequency statistic is used.
-M length
Sets the maximum phrase length (default: 3).
-L length
Sets the minimum phrase length (default: 1).
-D
Turns debugging mode on.
-I index
Sets the index of the attribute containing the documents (default: 0).
-J index
Sets the index of the attribute containing the keyphrases (default: 1).
-P
Disallow internal periods
-O number
The minimum number of times a phrase needs to occur (default: 2).
setOptions
in interface weka.core.OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
public java.util.Enumeration listOptions()
listOptions
in interface weka.core.OptionHandler
public java.lang.String globalInfo()
public boolean setInputFormat(weka.core.Instances instanceInfo) throws java.lang.Exception
setInputFormat
in class weka.filters.Filter
instanceInfo
- an Instances object containing the input
instance structure (any instances contained in the object are
ignored - only the structure is required).
java.lang.Exception
public boolean input(weka.core.Instance instance) throws java.lang.Exception
input
in class weka.filters.Filter
instance
- the input instance
java.lang.Exception
- if the input instance was not of the correct
format or if there was a problem with the filtering.public boolean batchFinished() throws java.lang.Exception
batchFinished
in class weka.filters.Filter
java.lang.Exception
- if no input structure has been definedpublic static void main(java.lang.String[] argv)
argv
- should contain arguments to the filter: use -h for help
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |