agatha.construct.document_pipeline module¶
-
agatha.construct.document_pipeline.
get_covid_documents
(config)¶ - Return type
Bag
-
agatha.construct.document_pipeline.
get_medline_documents
(config)¶ - Return type
Bag
-
agatha.construct.document_pipeline.
perform_document_independent_tasks
(config, documents, ckpt_prefix, semrep_work_dir=None)¶ Performs Tasks that don’t require communication between documents
Performs all of the document processing operations that are required to happen on each document separately. This is important to separate between different input textual features because this allows us to update/invalidate particular sets of checkpoints faster.
- Parameters
config (
ConstructConfig
) – Constriction Configurationdocuments (
Bag
) – Collection of texts to processckpt_prefix (
str
) – To stop collisions, and to improve caching, each call to this function should have a different prefix indicating the type of the corresponding documents. For instance, calling this with medline documents could get the medline prefix.semrep_work_dir (
Optional
[Path
]) – The location to store semrep intermediate files. Only used if semrep has been installed and configured.
- Return type
None