Skip to content

Commit e212a9b

Browse files
authored
Merge pull request #17 from Big-Life-Lab/doc
add manual
2 parents 9c426b7 + 3bd9289 commit e212a9b

26 files changed

+2196
-1112
lines changed

INSTALL.md

Lines changed: 0 additions & 33 deletions
This file was deleted.

README.md

Lines changed: 13 additions & 850 deletions
Large diffs are not rendered by default.

docs/manual/.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# generated files
2+
/measures-OHRI.csv
3+
search.json
4+
site_libs
5+
debug.txt
6+
7+
# generated dirs
8+
/.quarto/
9+
/_book/
10+
/build/

docs/manual/_quarto.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
project:
2+
type: book
3+
output-dir: build
4+
5+
book:
6+
title: 'PHES-ODM Sharing Library Manual'
7+
author: 'OHRI'
8+
chapters:
9+
- index.qmd
10+
- install.qmd
11+
- getting-started.qmd
12+
- cli.qmd
13+
- api.qmd
14+
appendices:
15+
- schemas.qmd
16+
- python.qmd
17+
- sqlite.qmd
18+
19+
pdf-engine: pdflatex
20+
toc: true

docs/manual/api.qmd

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# API {#sec-api}
2+
3+
## Reference
4+
5+
<!-- TODO generate this API reference from the source code automatically -->
6+
7+
```python
8+
9+
def parse(schema_path: str, orgs: List[str] = []) -> OrgTableQueries:
10+
'''returns queries for each org and table, generated from the rules
11+
specified in `schema_file`
12+
13+
:raises OSError, ParseError:
14+
'''
15+
16+
17+
def connect(data_source: str, tables: Set[str] = set()) -> Connection:
18+
'''returns a connection object that can be used together with a query
19+
object to retrieve data from `data_source`
20+
21+
:raises DataSourceError:'''
22+
23+
24+
def get_data(c: Connection, tq: TableQuery) -> pd.DataFrame:
25+
'''returns the data extracted from running query `q` on data-source
26+
connection `c`, as a pandas DataFrame
27+
28+
:raises DataSourceError:'''
29+
30+
31+
def get_counts(c: Connection, tq: TableQuery) -> Dict[RuleId, int]:
32+
'''returns the row counts for each rule
33+
34+
:raises DataSourceError:'''
35+
36+
def get_columns(c: Connection, tq: TableQuery
37+
) -> Tuple[RuleId, List[ColumnName]]:
38+
'''returns the select-rule id together with the column names that would be
39+
extracted when calling `get_data`
40+
41+
:raises DataSourceError:'''
42+
43+
44+
def extract(
45+
schema_path: str,
46+
data_source: str,
47+
orgs: List[str] = [],
48+
) -> Dict[OrgName, Dict[TableName, pd.DataFrame]]:
49+
'''returns a Pandas DataFrame per table per org
50+
51+
:param data_source: a file path or database url (in SQLAlchemy format)
52+
:param schema_path: rule schema file path
53+
:param orgs: orgs to share with, or all if empty
54+
55+
:raises DataSourceError:
56+
'''
57+
```
58+
59+
## Usage
60+
61+
### Examples
62+
63+
**Common definitions:**
64+
65+
```{python}
66+
#|echo: False
67+
from common import DATA, SCHEMA, load_csv_md, print_file
68+
69+
def my_processing_func(data):
70+
# what a user-made function may look like
71+
pass
72+
```
73+
74+
```{python}
75+
import pandas as pd
76+
import odm_sharing.sharing as sh
77+
78+
ORG = 'OHRI'
79+
ORGS = [ORG]
80+
```
81+
82+
**High-level one-shot function:**
83+
84+
```{python}
85+
results = sh.extract(SCHEMA, DATA, ORGS)
86+
for org, table_data in results.items():
87+
for table_name, data_frame in table_data.items():
88+
my_processing_func(data_frame)
89+
```
90+
91+
**Low-level sample code:**
92+
93+
```{python}
94+
def describe_table_query(con, table, query):
95+
print(f'query table: {table}')
96+
97+
(select_rule_id, columns) = sh.get_columns(con, query)
98+
print(f'query columns (from rule {select_rule_id}):')
99+
print(','.join(columns))
100+
101+
print('query counts per rule:')
102+
rule_counts = sh.get_counts(con, query)
103+
for ruleId, count in rule_counts.items():
104+
print(f'{ruleId} | {count}')
105+
106+
con = sh.connect(DATA)
107+
table_queries = sh.parse(SCHEMA, ORGS)
108+
for table, query in table_queries[ORG].items():
109+
describe_table_query(con, table, query)
110+
my_processing_func(sh.get_data(con, query))
111+
```
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
measureRepID,sampleID,measure,value,unit,aggregation
2+
o.08.08.20covN1,o.08.08.20,covN1,0.00036,gcPMMoV,meanNr
3+
o.08.08.20covN2,o.08.08.20,covN1,0.00003,gcPMMoV,sdNr
4+
o.08.08.20covN4,o.08.08.20,covN2,0.00002,gcPMMoV,meanNr
5+
o.08.08.20covN3,o.08.08.20,covN2,0.00004,gcPMMoV,sdNr

docs/manual/assets/minimal/schema.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
ruleID,table,mode,key,operator,value,notes
2+
1,measures,select,NA,NA,all,"select all columns from the measures table"
3+
2,measures,filter,measure,=,covN1,"where measure equals covN1"
4+
3,NA,share,OHRI,NA,1;2,"use rule 1 & 2 for the OHRI organization"

docs/manual/assets/odm-logo.png

473 KB
Loading
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ruleID,mode,key,value,notes
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ruleID,table,mode,key,operator,value,notes

docs/manual/cli.qmd

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
```{python}
2+
#| echo: false
3+
from odm_sharing.tools.share import OutFmt, share
4+
5+
from common import DATA, SCHEMA, load_csv_md, print_file
6+
```
7+
8+
# CLI {#sec-cli}
9+
10+
## Reference
11+
12+
```bash
13+
odm-share [OPTION]... SCHEMA INPUT
14+
```
15+
16+
Arguments:
17+
18+
- SCHEMA
19+
20+
sharing schema file path
21+
22+
- INPUT
23+
24+
spreadsheet file path or [SQLAlchemy database URL](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls)
25+
26+
Options:
27+
28+
- `--orgs=NAME[,...]`
29+
30+
comma separated list of organizations to output data for, defaults to all
31+
32+
- `--outfmt=FORMAT`
33+
34+
output format (excel or csv), defaults to input format
35+
36+
- `--outdir=PATH`
37+
38+
output file directory, defaults to the current directory
39+
40+
- `-d`, `--debug`:
41+
42+
output debug info to STDOUT (and ./debug.txt) instead of creating sharable
43+
output files. This shows which tables and columns are selected, and how
44+
many rows each filter returns.
45+
46+
- `-q`, `--quiet`:
47+
48+
don't log to STDOUT
49+
50+
One or multiple sharable output files will be created in the chosen output
51+
directory according to the chosen output format and organization(s). Each
52+
output file will have the input name followed by the recipient org name.
53+
54+
### Errors
55+
56+
Error messages will be printed to the terminal (STDERR) when something is
57+
wrong. The message starts with telling where the error originated, including
58+
the filename and line number or rule id. Here's a few examples:
59+
60+
When headers are missing from the schema:
61+
62+
```{python}
63+
#| echo: false
64+
share('assets/schema-missing-headers.csv', 'assets/measures.csv')
65+
```
66+
67+
When no share-rules are contained in the schema:
68+
69+
```{python}
70+
#| echo: false
71+
share('assets/schema-missing-rules.csv', 'assets/measures.csv')
72+
```
73+
74+
## Usage
75+
76+
### Examples
77+
78+
#### Using a CSV file
79+
80+
To share a single table `measures.csv`, using the sharing schema `schema.csv`,
81+
the following command can be used:
82+
83+
```bash
84+
odm-share schema.csv measures.csv
85+
```
86+
87+
It will make an output file called `measures-<org>.csv` for each organization
88+
specified in the schema, with filtered data that is ready to share.
89+
90+
#### Using an Excel file
91+
92+
Excel files can be used as input to share multiple tables at once:
93+
94+
```bash
95+
odm-share schema.csv data.xlsx
96+
```
97+
98+
It will make an output file called `<org>.xlsx` for each organization in the
99+
schema.
100+
101+
#### Using a database
102+
103+
To use a MySQL database as input (with the pymysql package):
104+
105+
```bash
106+
odm-share schema.csv mysql+pymysql://user:pass@host/db
107+
```
108+
109+
Same as above, using a MS SQL Server database through ODBC (with the pyodbc
110+
package):
111+
112+
```bash
113+
odm-share schema.csv mssql+pyodbc://user:pass@mydsn
114+
```
115+
116+
#### Using additional options
117+
118+
- Share CSV files from an Excel file:
119+
120+
```bash
121+
odm-share --outfmt=CSV schema.csv data.xlsx
122+
```
123+
124+
- Create a sharable excel file in the "~/files" directory, for the "OHRI" and
125+
"TOH" organizations, applying the rules from schema.csv on the input from
126+
data.xlsx:
127+
128+
```bash
129+
odm-share --orgs=OHRI,TOH --outdir=~/files schema.csv data.xlsx
130+
```
131+
132+
### Debugging
133+
134+
Debug mode provides information about what would happen when using a specific
135+
schema, without pulling the actual data to be shared. Debugging is enabled by
136+
passing the `--debug` flag, or simply `-d`.
137+
138+
Here's an example using the sample files from [getting started](getting-started.qmd):
139+
140+
```bash
141+
odm-share --debug schema.csv data.xlsx
142+
```
143+
```{python}
144+
#| echo: false
145+
share(SCHEMA, DATA, debug=True)
146+
```
147+
148+
Here we can see the columns that would be selected, as well as the number of
149+
rows each rule would produce. Specifically, we can see that 4 rows would be
150+
selected by rule #1, but the filter in rule #2 reduces that number to 2, which
151+
is the final count as confirmed in the count for rule #3.

docs/manual/common.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import pandas as pd
2+
from tabulate import tabulate
3+
4+
5+
SCHEMA = 'assets/minimal/schema.csv'
6+
DATA = 'assets/minimal/measures.csv'
7+
8+
9+
def load_csv_md(path):
10+
'''read csv file and convert it to markdown'''
11+
df = pd.read_csv(path, keep_default_na=False)
12+
md = tabulate(df, headers=df.columns.to_list(), showindex=False)
13+
return md
14+
15+
16+
def print_file(path):
17+
with open(path, 'r') as f:
18+
print(f.read())

0 commit comments

Comments
 (0)