I am trying to create the java implementation for maxent classifier. I need to classify the sentences into n
different classes.
I had a look at ColumnDataClassifier in stanford maxent classifier. But I am not able to understand how to create training data. I need training data in the form where training data includes POS Tags for words for sentence, so that the features used for classifier will be like previous word, next word etc.
I am looking for training data which has sentences with POS TAGGING and sentence class mentioned. example :
My/(POS) name/(POS) is/(POS) XYZ/(POS) CLASS
Any help will be appreciated.
1 Answers
Answers 1
If I understand it correctly, you are trying to treat sentences as a set of POS tags.
In your example, the sentence "My name is XYZ" would be represented as a set of (PRP$, NN, VBZ, NNP). That would mean, every sentence is actually a binary vector of length 37 (because there are 36 possible POS tags according to this page + the CLASS outcome feature for the whole sentence)
This can be encoded for OpenNLP Maxent as follows:
PRP$=1 NN=1 VBZ=1 NNP=1 CLASS=SomeClassOfYours1
or simply:
PRP$ NN VBZ NNP CLASS=SomeClassOfYours1
(For working code-snippet see my answer here: Training models using openNLP maxent)
Some more sample data would be:
- "By 1978, Radio City had lost its glamour, and the owners of Rockefeller Center decided to demolish the aging hall."
- "In time he was entirely forgotten, many of his buildings were demolished, others insensitively altered."
- "As soon as she moved out, the mobile home was demolished, the suit said."
- ...
This would yield samples:
IN CD NNP VBD VBN PRP$ NN CC DT NNS IN TO VB VBG CLASS=SomeClassOfYours2 IN NN PRP VBD RB VBN JJ IN PRP$ NNS CLASS=SomeClassOfYours3 IN RB PRP VBD RP DT JJ NN VBN NN CLASS=SomeClassOfYours2 ...
However, I don't expect that such a classification yields good results. It would be better to make use of other structural features of a sentence, such as the parse tree or dependency tree that can be obtained using e.g. Stanford parser.
0 comments:
Post a Comment