agatha.construct.ngram_util module

agatha.construct.ngram_util.get_frequent_ngrams(analyzed_sentences, max_ngram_length, min_ngram_support, min_ngram_support_per_partition, ngram_sample_rate, token_field='tokens', ngram_field='ngrams')

Adds a new field containing a list of all mined n-grams. N-grams are tuples of strings such that at least one string is not a stopword. Strings are collected from the lemmas of sentences. To be counted, an ngram must occur in at least min_ngram_support sentences.

Return type

Bag