agatha.ml.abstract_generator.tokenizer module

class agatha.ml.abstract_generator.tokenizer.AbstractGeneratorTokenizer(tokenizer_model_path, extra_data_path, lowercase)

Bases: object

decode_dep(idx)
Return type

str

decode_entity_label(idx)
Return type

str

decode_mesh(idx)
Return type

str

decode_pos(idx)
Return type

str

decode_text(ids)
Return type

str

decode_year(idx)
Return type

int

encode_dep(dep)
Return type

int

encode_entity_label(entity_label)
Return type

int

encode_for_generation(initial_text=None, year=None, mesh_terms=None, allow_unknown_terms=False)

Given initial text and condition data, produce model_in. Intended use:

model = … model.forward(**model.tokenizer.encode_for_generation(

initial_text, year, terms

))

Return type

Dict[str, LongTensor]

encode_mesh(mesh)
Return type

int

encode_pos(pos)
Return type

int

encode_sentence(sentence, is_first=False, is_last=False)
Return type

Dict[str, List[int]]

encode_year(year)
Return type

int

len_dep()
Return type

int

len_entity_label()
Return type

int

len_mesh()
Return type

int

len_pos()
Return type

int

len_text()
Return type

int

len_year()
Return type

int

simple_encode_text(text)
Return type

List[int]

agatha.ml.abstract_generator.tokenizer.get_current_year()