2019-09-09: Introducing sumgram, a tool for generating the most frequent conjoined ngrams
Published:
standard ngram generators split multi-word proper nouns, sumgram avoids this by applying 2 algorithms:
— Alexander C. Nwala (@acnwala) September 9, 2019
- pos_glue_split_ngrams:
utilizes POS labels from @stanfordnlp CoreNLP Server (https://t.co/n2nOelViqb) to identify proper nouns of type:
NNP [IN|CC]? NNP ([IN|CC]? NNP)*