agatha.util.sqlite3_lookup module¶
-
class
agatha.util.sqlite3_lookup.
Sqlite3Bow
(db_path, table_name='sentences', key_column_name='id', value_column_name='bow', **kwargs)¶ Bases:
agatha.util.sqlite3_lookup.Sqlite3LookupTable
For backwards compatibility, Sqlite3Bow allows for alternate default table, key, and value names. However, newer tables following the default Sqlite3LookupTable schema will still work.
-
class
agatha.util.sqlite3_lookup.
Sqlite3Graph
(db_path, table_name='graph', key_column_name='node', value_column_name='neighbors', **kwargs)¶ Bases:
agatha.util.sqlite3_lookup.Sqlite3LookupTable
For backwards compatibility, Sqlite3Graph allows for alternate default table, key, and value names. However, newer tables following the default Sqlite3LookupTable schema will still work.
-
class
agatha.util.sqlite3_lookup.
Sqlite3LookupTable
(db_path, table_name='lookup_table', key_column_name='key', value_column_name='value', disable_cache=False)¶ Bases:
object
Dict-like interface for Sqlite3 key-value tables
Assumes that the provided sqlite3 path has a table containing string keys and json-encoded string values. By default, the table name is lookup_table, with columns key and value.
This interface is pickle-able, and provides caching and preloading. Note that instances of this object that are recovered from pickles will _NOT_ retain the preloading or caching information from the original.
- Parameters
db_path (
Path
) – The file-system location of the Sqlite3 file.table_name (
str
) – The sql table name to find within db_path.key_column_name (
str
) – The string column of table_name. Performance of the Sqlite3LookupTable will depend on whether an index has been created on key_column_name.value_column_name (
str
) – The json-encoded string column of table_namedisable_cache (
bool
) – If set, objects resulted from json parsing will not be cached
-
clear_cache
()¶ Removes contents of internal cache
- Return type
None
-
connected
()¶ True if the database connection has been made.
- Return type
bool
-
disable_cache
()¶ Disables the use of internal cache
- Return type
None
-
enable_cache
()¶ Enables the use of internal cache
- Return type
None
-
is_preloaded
()¶ True if database has been loaded to memory.
- Return type
bool
-
iterate
(where=None)¶ Returns an iterator to the underlying database. If where is specified, returned rows will be conditioned. Note, when writing a where clause that columns are key and value
-
keys
()¶ Get all keys from the Sqlite3 Table.
Recalls _all_ keys from the connected database. This operation may be slow or even infeasible for larger tables.
- Return type
Set
[str
]- Returns
The set of all keys from the connected database.
-
agatha.util.sqlite3_lookup.
compile_kv_json_dir_to_sqlite3
(json_data_dir, result_database_path, agatha_install_path, merge_duplicates, verbose)¶ Merges all key/value json entries into an indexed sqlite3 table
This function assumes that json_dir contains many *.json files. Each file should contain one json object per line. Each object should contain a “key” and a “value” field. This function will use the c++ create_lookup_table by executing a subprocess.
- Parameters
json_data_dir (
Path
) – The location containing *.jso. files.result_database_path (
Path
) – The location to store the result sqlite3 db.agatha_install_path (
Path
) – The location containing the “tools” directory, where create_lookup_table has been built.merge_duplicates (
bool
) – The create_lookup_table utility has two modes. If merge_duplicates is False, then we assume there are no key collisions and each value is stored as-is. If True, then we combine values associated with duplicate keys into arrays of unique elements.verbose (
bool
) – If set, print intermediate output of create_lookup_table.
- Return type
None
-
agatha.util.sqlite3_lookup.
create_lookup_table
(key_value_records, result_database_path, intermediate_data_dir, agatha_install_path, merge_duplicates=False, verbose=False)¶ Creates an Sqlite3 table compatible with Sqlite3LookupTable
Each element of the key_value_records bag is converted to json and written to disk. Then, one machine calls the create_lookup_table tool in order to index all records into an Sqlite3LookupTable compatible database. Warning, if used in a distributed setting, the master node will be the one to call the create_lookup_table utility.
- key_value_records: A dask bag containing dicts. Each dict should have a “key”
and a “value” field.
result_database_path: The location to write the Sqlite3 file. intermediate_data_dir: The location to write intermediate json text files.
Warning, if any json files exist beforehand, they will be erased.
- agatha_install_path: The root of Agatha, wherein the tools directory can be
located.
- merge_duplicates: If set, create_lookup_table will perform the more
expensive operation of combining distinct values associated with the same key.
- verbose: If set, the create_lookup_table utility will print intermediate
output.
- Return type
None
-
agatha.util.sqlite3_lookup.
export_key_value_records
(key_value_records, export_dir)¶ Converts a Dask bag of Dicts into a collection of json files.
In order to create a lookup table, we must first export all data as json. This function maps each element of the input bag to a json encoded string and writes one file per partition to the export_dir. WARNING: this function will delete any json files already present in export_dir.
- Parameters
key_value_records (
Bag
) – A dask bag containing dicts.export_dir (
Path
) – The location to write json files. Will erase any if present beforehand.
- Return type
None