The World’s Largest Online Community for Developers

'; python - How to improve speed with Stanford NLP Tagger and NLTK - LavOzs.Com

Is there any way to use the Standford Tagger in a more performant fashion?

Each call to NLTK's wrapper starts a new java instance per analyzed string which is very very slow especially when a larger foreign language model is used...

Found the solution. It is possible to run the POS Tagger in servlet mode and then connect to it via HTTP. Perfect.


start server in background

nohup java -mx1000m -cp /var/stanford-postagger-full-2014-01-04/stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -model /var/stanford-postagger-full-2014-01-04/models/german-dewac.tagger -port 2020 >& /dev/null &

adjust firewall to limit access to port 2020 from localhost only

iptables -A INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -A INPUT -p tcp --dport 2020 -j DROP

test it with wget

wget http://localhost:2020/?die welt ist schön

shutdown server

pkill -f stanford

restore iptable settings

iptables -D INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -D INPUT -p tcp --dport 2020 -j DROP

Using nltk.tag.stanford.POSTagger.tag_sents() for tagging multiple sentences.

The tag_sents has replaced the old batch_tag function, see


Tag the sentences using batch_tag instead of tag, see

Java Stanford NLP: Part of Speech labels?
Stanford Parser and NLTK
“Large data” work flows using pandas
NLTK : combining stanford tagger and personal tagger
Stanford NER Tagger in NLTK
NLTK vs Stanford NLP
Stanford NLP Tagger via NLTK - tag_sents splits everything into chars
Can I stop Stanford POS and NER taggers from removing “#” and “@” characters?