for lucene, want processing on last token of tokenstream tokenfilter. example, given sentence "hello world", apply processing "world", not other tokens.
i can iterating entire input of tokenstream first in order offset of last token, , re-starting first token. because know offset of last token, can recognize whether current token last token or not.
however, since looping twice inefficient sure, want iterating tokenstream once, seems hard find right way.
for example, suppose myfilter looks like: (sure, myfilter basic structure of tokenfilter).
public class myfilter extends tokenfilter{ public myfilter(tokenstream input){ super(input); } @override public boolean incrementtoken() throws ioexception { if (input.incrementtoken()){ /* if(current token last token): want apply last token. */ return true; } return false; } }
how recognize if current token last 1 or not?
i may have got wrong end of stick here, think idea of stream precisely may able tell starts it's more tricky know ends... why called token*stream*
.
tokenfilter
can tell when stream starts: have override reset()
.
there method tokenfilter.end()
, of course, , try overriding that, javadoc says:
this method called consumer after last token has been consumed, after tokenstream.incrementtoken() returned false (using new tokenstream api).
... means output has been used "consumer" then.
to detect end think you'd have re-engineer tokenizer
. looking @ standardtokenizer
, example, , "business end" standardtokenizerimpl
, might quite involved. no doubt better make own simple tokeniser
: accepts strings
, or whatever, , way proceed tokenise before spewing out tokens filter(s). know how many tokens going spewed out, , (for example) you'd make number available tokenfilter
@ 1 time...