Skip to content

Commit 8124f2d

Browse files
committed
set default num days to export to all since export start date
Since BigQuery tables are currently not partitioned by date, each query processes and bills for all rows in the table regardless of filters applied (date and country, at the moment). To take advantage of this, this pipeline will pull all dates from the specified start date to the current date by default. This setting should be updated to a shorter date window (~14 days seems reasonable) if/when the tables are converted to "partitioned" format.
1 parent 7d2542e commit 8124f2d

File tree

2 files changed

+14
-10
lines changed

2 files changed

+14
-10
lines changed

google_symptoms/delphi_google_symptoms/pull.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ def preprocess(df, level):
9494
return df
9595

9696

97-
def get_date_range(export_start_date, retrieve_days_before_now):
97+
def get_date_range(export_start_date, num_export_days):
9898
"""Produce date range to retrieve data for.
9999
100100
Calculate start of date range as a static offset from the end date
@@ -105,7 +105,7 @@ def get_date_range(export_start_date, retrieve_days_before_now):
105105
----------
106106
export_start_date: date
107107
first date to retrieve data for
108-
retrieve_days_before_now: int
108+
num_export_days: int
109109
number of days before end date ("now") to export
110110
111111
Returns
@@ -115,13 +115,17 @@ def get_date_range(export_start_date, retrieve_days_before_now):
115115
PAD_DAYS = 7
116116

117117
end_date = date.today()
118-
# Don't fetch data before the user-set start date. Convert both
119-
# dates/datetimes to date to avoid error from trying to compare
120-
# different types.
121-
start_date = max(
122-
end_date - timedelta(days=retrieve_days_before_now),
123-
export_start_date.date()
124-
)
118+
if num_export_days == "all":
119+
# Get all dates since export_start_date.
120+
start_date = export_start_date
121+
else:
122+
# Don't fetch data before the user-set start date. Convert both
123+
# dates/datetimes to date to avoid error from trying to compare
124+
# different types.
125+
start_date = max(
126+
end_date - timedelta(days=num_export_days),
127+
export_start_date.date()
128+
)
125129

126130
retrieve_dates = [
127131
start_date - timedelta(days=PAD_DAYS - 1),

google_symptoms/delphi_google_symptoms/run.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def run_module():
3232
export_start_date = datetime.strptime(
3333
params["export_start_date"], "%Y-%m-%d")
3434
export_dir = params["export_dir"]
35-
num_export_days = params.get("num_export_days", 14)
35+
num_export_days = params.get("num_export_days", "all")
3636

3737
logger = get_structured_logger(
3838
__name__, filename=params.get("log_filename"),

0 commit comments

Comments
 (0)