Yet Another Omop Loader¶
Introduction¶
Yet Another OMOP Loader is a tool designed to load data into an OMOP Common Data Model Postgres database.
- The tool is designed to be run at multiple stages of a pipeline, with the following stages.
Create the database schemas and tables
Load the vocabularies
Load the data
Create the primary keys
Create the indicies
Create the foreign keys
Each stage is idempotent so can be re-run to ensure a database is in a known state, the tool will not attempt to recreate objects that already exist or load data into tables which already contain data. Seperate schemas can be specified for data and vocabularies so a single database can hold multiple OMOP datasets which share commmon vocabularies.
- To configure your environment ready to use the tool:
Create an empty database and a user with admin access. Set the database name and user name into
config.DB_CONN_STRDownload the latest DDL files for postgres from https://github.com/OHDSI/CommonDataModel/tree/main/inst/ddl/5.4
Set
config.DDL_FILE,config.CONSTRAINTS_FILE,config.INDICIES_FILEandconfig.KEYS_FILEto point to the relevant files.Prepare your data into CSV files. It should be one file per CDM table. If the filename doesn’t match the table name (with a .csv suffix) you can set
config.DATA_PATTERNto extact the table name from the file name.Set
config.DAT_PATHto point to your CSV files.Download a vocab zip file from Athena and set
config.VOCABS_ZIPto the path of the zip file.Set other environment variables as required.
Installation¶
- Install the script requirements using pip install -r requirements.txt- If you wish to rebuild the documentation also run pip install -r requirements.docs.txt
Usage¶
usage: omoploader.py [-h] [-d] [-dr] [-sc] [--omopschema OMOPSCHEMA]
[--vocabschema VOCABSCHEMA]
{clean,build,vocabs,load,pkeys,index,fkeys,all,reload} ...
Positional Arguments¶
- action
Possible choices: clean, build, vocabs, load, pkeys, index, fkeys, all, reload
Database operation
Named Arguments¶
- -d, --debug
Display debug logging.
Default:
False- -dr, --dryrun
Rollback the transaction on completion.
Default:
False- -sc, --skipcheck
Skips checking the state of the database before running action.
Default:
False- --omopschema
OMOP Schema. Overrides config.DB_OMOP_SCHEMA
- --vocabschema
Vocab Schema. Overrides config.DB_VOCAB_SCHEMA
Sub-commands¶
clean¶
Removes all objects by deleting the schemas
omoploader.py clean [-h]
build¶
Builds the CDM Tables
omoploader.py build [-h]
vocabs¶
Loads the Vocabularies
omoploader.py vocabs [-h]
load¶
Loads the CSV data
omoploader.py load [-h]
pkeys¶
Builds the primary keys
omoploader.py pkeys [-h]
index¶
Builds the indexes
omoploader.py index [-h]
fkeys¶
Builds the foreign keys
omoploader.py fkeys [-h]
all¶
Runs all actions except for clean
omoploader.py all [-h]
reload¶
Reloads the CSV data
omoploader.py reload [-h]
Configuration Settings¶
- config.DDL_FILE¶
Path to the Postgres DDL file as downloaded from the OHDSI Github. Set from the YAOL_DDL_FILE env var.
- config.CONSTRAINTS_FILE¶
Path to the Postgres constraints sql file as downloaded from the OHDSI Github. Set from the YAOL_CONSTRAINTS_FILE env var.
- config.INDICIES_FILE¶
Path to the Postgres indexes sql file as downloaded from the OHDSI Github. Set from the YAOL_INDICIES_FILE env var.
- config.KEYS_FILE¶
Path to the Postgres primary keys sql file as downloaded from the OHDSI Github. Set from the YAOL_KEYS_FILE env var.
- config.DATA_PATH¶
Path to the CSV files containing omop data. One per table. Set from the YAOL_DATA_PATH env var.
- config.DATA_PATTERN¶
Regular expression to extract the OMOP table name from the CSV file name. Set from the YAOL_DATA_PATTERN env var.
- config.DB_CONN_STR¶
Postgres connection string. Set from the YAOL_DB_CONN_STR env var.
- config.DB_OMOP_SCHEMA¶
Name of the Schema to use for the CDM data tables. Set from the YAOL_DB_OMOP_SCHEMA env var.
- config.DB_VOCAB_SCHEMA¶
Name of the Schema to use for the vocab tables. Set from the YAOL_DB_VOCAB_SCHEMA env var. Defaults to DB_OMOP_SCHEMA
- config.DB_RESULTS_SCHEMA¶
Name of the Schema to use for the Achilles results tables. Set from the YAOL_DB_RESULTS_SCHEMA env var.
- config.VOCABS_ZIP¶
Path to a zip file containg OMOP vocabulary files as downlaoded from Athena. Set from the YAOL_VOCAB_ZIP env var.
Functions¶
- omoploader.add_schema(ddl_file, schema_name, vocab_schema_name)¶
- Reads an OMOP DDL file as downloaded from the OHDSI github and replaces the schema template variable
with the specified schema.
- Parameters:
ddl_file (str) – The path to the OMOP DDL file.
schema_name (str) – The schema name to replace the @cdmDatabaseSchema placeholder with.
vocab_schema_name (str) – The schema name to replace the @cdmDatabaseSchema placeholder with for vocab tablees
- Returns:
A string containing the contents of the ddl file with the specifed schema set.
- Return type:
str
- omoploader.build(conn)¶
Calls
build_cdm()with the values ofconfig.DB_OMOP_SCHEMA,config.DDL_FILEandconfig.DB_RESULTS_SCHEMA- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
- Returns:
None
- Return type:
None
- omoploader.build_cdm(conn, schema_name, vocab_schema_name, ddl_file, results_schema_name)¶
Build the OMOP CDM Tables by executing the OMOP DDL file. Does nothing if the already exist (by replacing the CREATE TABLE statements with CREATE TABLE IF NOT EXISTS statements)
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema to create and build the tables in. This replaces the schema template variable in the DDL file.
vocab_schema_name (str) – The name of the CDM schema to create and build the tables in. This replaces the schema template variable in the DDL file for vocab tables.
results_schema_name (str) – The name of the results schema to create and build the results the tables in (not currently used)
- Returns:
None
- Return type:
None
- omoploader.build_fkeys(conn, schema_name, vocab_schema_name, constraints_file)¶
Build the OMOP CDM foreign keys by executing the OMOP Constrains file. Does nothing if they already exist.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file.
vocab_schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file for vocab tables.
constraints_file (str) – The name of the file containing the SQL statements to create the foreign keys.
- Returns:
None
- Return type:
None
- omoploader.build_indicies(conn, schema_name, vocab_schema_name, indices_file)¶
Build the OMOP CDM Indexes by executing the OMOP Indexes file. Does nothing if they already exist.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file.
vocab_schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file for vocab tables.
indices_file (str) – The name of the file containing the SQL statements to create the indexes.
- Returns:
None
- Return type:
None
- omoploader.build_pkeys(conn, schema_name, vocab_schema_name, pkeys_file)¶
Build the OMOP CDM Primary Keys by executing the OMOP Primary Keys file. Does nothing if they already exist.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file.
vocab_schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file for vocab tables.
pkeys_file (str) – The name of the file containing the SQL statements to create the Keys.
- Returns:
None
- Return type:
None
- omoploader.build_table_map(data_pattern, data_path)¶
Reads the files in a folder and uses a regular expression to extract the OMOP table name from the file name.
- Parameters:
conn – A psycopg connection object to the postgres database
data_path (str) – The folder containing data files to be loaded into OMOP tables.
- Returns:
A a list of tuples of (file name,omop table name)
- Return type:
tuple
- omoploader.clean(conn)¶
Calls
drop_cdm()with the values ofconfig.DB_OMOP_SCHEMA,config.DB_VOCAB_SCHEMAandconfig.DB_RESULTS_SCHEMA- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
- Returns:
None
- Return type:
None
- omoploader.drop_cdm(conn, schema_name, vocab_schema_name, results_schema_name)¶
Drops the specifed schemas from the database. Does nothing if they calready exist.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema to remove.
vocab_schema_name (str) – The name of the vocab schema to remove.
results_schema_name (str) – The name of the results schema to remove.
- Returns:
None
- Return type:
None
- omoploader.fkeys(conn, delete_first=False, skip_check=False)¶
Ensures indexes are created by calling
indicies()then callsbuild_fkeys()with the valuesconfig.DB_OMOP_SCHEMA,config.DB_VOCAB_SCHEMA,config.CONSTRAINTS_FILE.- Parameters:
conn (bool) – A psycopg connection object to the postgres database
delete_first (bool) – Delete contraints and then re-create
skip_check – If true, no check is performed on the state of the database first.
- Returns:
None
- Return type:
None
- omoploader.get_args_parser()¶
Builds a parser to handle the command line arguments.
- Returns:
Parser for arguments
- Return type:
argparse.ArgumentParser
- omoploader.handle_args()¶
Parses the command line arguments.
- Returns:
Namespace containing parsed arguments
- Return type:
argparse.Namespace
- omoploader.index(conn, delete_first=False, skip_check=False)¶
Ensures keys are created by calling
keys()then callsbuild_indicies()with the valuesconfig.DB_OMOP_SCHEMA,config.DB_VOCAB_SCHEMA, andconfig.INDICIES_FILE.- Parameters:
conn (bool) – A psycopg connection object to the postgres database
delete_first (bool) – Delete index and then re-create
skip_check – If true, no check is performed on the state of the database first.
- Returns:
None
- Return type:
None
- omoploader.load(conn, delete_first=False, skip_check=False)¶
Ensures vocabs are loaded by calling
vocabs(), builds a table to file map and then callsload_data_csv()with the values of andconfig.DB_OMOP_SCHEMA.- Parameters:
conn (bool) – A psycopg connection object to the postgres database
delete_first (bool) – Delete data from each table before loading
skip_check – If true, no check is performed on the state of the database first.
- Returns:
None
- Return type:
None
- omoploader.load_data_csv(conn, db_schema, table_map, delete_first=False)¶
Loads data from CSV files into OMOP tables. Expects one file per table. N.B. This will not load data into any table which already contains data (i.e if count(*)>0).
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file.
table_map (list(tuple)) – A list of tuples specifying a fully qualified file path and an OMOP table name to load the data into i.e [(file name,table name)].
delete_first (bool) – Delete all rows from table before loading data. Defaults to False. Data will not be loaded to any table contaning data.
- Returns:
None
- Return type:
None
- omoploader.load_vocabs_from_zip(conn, db_schema, zip_file)¶
Loads OMOP vocabluaries from a zip file as downloaded from Athena. N.B. This does not currently handle vocabs which require a license/post processing (e.g. CPT4).
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema. This replaces the schema template variable in the sql file for the vocab tables.
zip_file (str) – The path to the zip file containing vocab files.
- Returns:
None
- Return type:
None
- omoploader.pkeys(conn, delete_first=False, skip_check=False)¶
Ensures data is loaded by calling
load()then callsbuild_keys()with the valuesconfig.DB_OMOP_SCHEMA,config.DB_VOCAB_SCHEMA, andconfig.KEYS_FILE.- Parameters:
conn (bool) – A psycopg connection object to the postgres database
delete_first (bool) – Delete key from each table and then re-create
skip_check – If true, no check is performed on the state of the database first.
- Returns:
None
- Return type:
None
- omoploader.run_sql_template(conn, schema_name, vocab_schema_name, template_file)¶
Executes a file of SQL statements as downloaded from the OHDSI github. The contents of the file are first passed to the add_schema function to replace the schema placeholder.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The schema name to run the statements against.
vocab_schema_name (str) – The schema name to run the statements against for vocab tables.
template_file – The file of sql statements to run.
- Returns:
None
- Return type:
None
- omoploader.setup_logging(debug)¶
Sets the logging level. Sets to DEBUG if debug paramter is true, otherwise sets INFO
- Parameters:
debug (boolean) – Specifes whether debug should be set.
- Returns:
None
- Return type:
None
- omoploader.vocabs(conn, skip_check=False)¶
Ensures tables are built by calling
build()and then Callsload_vocabs_file_zip()with the values ofconfig.DB_OMOP_SCHEMA,config.VOCABS_ZIP.- Parameters:
conn (bool) – A psycopg connection object to the postgres database
skip_check – If true, no check is performed on the state of the database first.
- Returns:
None
- Return type:
None
- dbutils.index_exists(conn, index_name)¶
Checks whether the given index exists.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
index_name (str) – The name of the index to check.
- Returns:
True if the index exists
- Return type:
bool
- dbutils.is_vocab_table(table_name)¶
Checks whether the given table is a vocabulary table.
- Parameters:
table_name (str) – The name of the table to check.
- Returns:
True if the table is a vocabulary table
- Return type:
bool
- dbutils.key_exists(conn, key_name)¶
Checks whether the given primary or foreign key exists.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
key_name (str) – The name of the key to check.
- Returns:
True if the key exists
- Return type:
bool
- dbutils.schema_exists(conn, schema_name)¶
Checks whether the given schema exists.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the schema to check.
- Returns:
True if the schema exists
- Return type:
bool
- dbutils.table_is_empty(conn, schema_name, table_name)¶
Checks whether the given table is empty.
- Parameters:
conn (psycopg.connection) – A psycopg connection object to the postgres database
schema_name (str) – The name of the CDM schema. This replaces the schema template varaible in the sql file.
table_name (str) – The name of the table to check.
- Returns:
True if the table is empty
- Return type:
bool