Skip to content

Commit 054a3c9

Browse files
committed
Merge branch 'dev' into Tina_doc_page_reorganization
2 parents 412a47b + 35d67a7 commit 054a3c9

34 files changed

+181
-363
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 4.1.23
2+
current_version = 4.1.24
33
commit = False
44
tag = False
55

.git-blame-ignore-revs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,5 @@ b9ceb400d9248c8271e8342275664ac5524e335d
2020
07ed83e5768f717ab0f9a62a9209e4e2cffa058d
2121
# style(black): format wiki acquisition
2222
923852eafa86b8f8b182d499489249ba8f815843
23+
# lint: trailing whitespace changes
24+
81179c5f144b8f25421e799e823e18cde43c84f9

dev/local/setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[metadata]
22
name = Delphi Development
3-
version = 4.1.23
3+
version = 4.1.24
44

55
[options]
66
packages =

docs/api/covidcast-signals/covid-act-now.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,13 @@ grand_parent: COVIDcast Main Endpoint
1515
* **Time type:** day (see [date format docs](../covidcast_times.md))
1616
* **License:** [CC BY-NC](../covidcast_licensing.md#creative-commons-attribution-noncommercial)
1717

18-
The COVID Act Now (CAN) data source provides COVID-19 testing statistics, such as positivity rates and total tests performed.
19-
The county-level positivity rates and test totals are pulled directly from CAN.
20-
While CAN provides this data potentially from multiple sources, we only use data sourced from the
18+
The [COVID Act Now (CAN)](https://covidactnow.org/) data source provides COVID-19 testing statistics, such as positivity rates and total tests performed.
19+
The county-level positivity rates and test totals are pulled directly from CAN using [their API](https://covidactnow.org/data-api).
20+
While CAN provides this data potentially from multiple sources, we only use data that CAN sources from the
2121
[CDC's COVID-19 Integrated County View](https://covid.cdc.gov/covid-data-tracker/#county-view).
2222

23+
Delphi's mirror of the CAN data was deactivated in December 2021 (last issue 2021-12-10) in favor of the [DSEW CPR data](./dsew-cpr.md), which reports the same information under the `covid_naat_pct_positive_7dav` signal.
24+
2325

2426
| Signal | Description |
2527
|--------------------------------|----------------------------------------------------------------|
@@ -34,9 +36,9 @@ While CAN provides this data potentially from multiple sources, we only use data
3436

3537
## Estimation
3638

37-
The quantities received from CAN / CDC are the county-level positivity rate and total tests,
38-
which are based on the counts of PCR specimens tested.
39-
In particular, they are also already smoothed with a 7-day-average.
39+
We receive county-level positivity rate and total tests from CAN, originating from the CDC.
40+
These quantiles are based on the counts of PCR specimens tested.
41+
They are also already smoothed with a 7-day-average.
4042

4143
For a fixed location $$i$$ and time $$t$$, let $$Y_{it}$$ denote the number of PCR specimens
4244
tested that have a positive result. Let $$N_{it}$$ denote the total number of PCR specimens tested.
@@ -79,38 +81,41 @@ $$
7981

8082
### Smoothing
8183

82-
No additional smoothing is done to avoid double-smoothing, since the data pulled from CAN / CDC
84+
No additional smoothing is done to avoid double-smoothing, since the CAN data
8385
is already smoothed with a 7-day-average.
8486

8587
## Limitations
8688

87-
Estimates for geographical levels beyond counties may be inaccurate due to how aggregations
88-
are done on smoothed values instead of the raw values. Ideally we would aggregate raw values
89+
Estimates for geographical levels beyond counties may be inaccurate because our aggregations
90+
are performed on smoothed values instead of the raw values.
91+
Ideally we would aggregate raw values
8992
then smooth, but the raw values are not accessible in this case.
9093

91-
The positivity rate here should not be interpreted as the population positivity rate as
94+
The reported test positivity rate should not be interpreted as the population positivity rate as
9295
the testing performed are typically not randomly sampled, especially for early data
9396
with lower testing volumes.
9497

9598
A few counties, most notably in California, are also not covered by this data source.
9699

97-
Entries with zero total tests performed are also suppressed, even if it was actually the case that
100+
Entries with zero total tests performed are suppressed, even if it was actually the case that
98101
no tests were performed for the day.
99102

100103
## Lag and Backfill
101104

102105
The lag for these signals varies depending on the reporting patterns of individual counties.
103106
Most counties have their latest data report with a lag of 2 days, while others can take 9 days
104-
or more in the case of California counties.
107+
or more, as is the case with California counties.
105108

106-
These signals are also backfilled as backlogged test results could get assigned to older 7-day timeframes.
107-
Most recent test positivity rates do not change substantially with backfill (having a median delta of close to 0).
108-
However, most recent total tests performed is expected to increase in later data revisions (having a median increase of 7%).
109+
Revisions are sometimes made to the data. For example, backlogged test results can get assigned to past dates.
110+
The majority of recent test positivity rates do not change substantially with backfill (having a median delta of close to 0).
111+
However, the majority of recent total tests performed is expected to increase in later data revisions (having a median increase of 7%).
109112
Values more than 5 days in the past are expected to remain fairly static (with total tests performed
110113
having a median increase of 1% of less), as most major revisions have already occurred.
111114

112115
## Source and Licensing
113116

114-
County-level testing data is scraped by CAN from the
117+
County-level testing data is scraped by [CAN](https://covidactnow.org/) from the
115118
[CDC's COVID-19 Integrated County View](https://covid.cdc.gov/covid-data-tracker/#county-view),
116119
and made available through [CAN's API](https://covidactnow.org/tools).
120+
121+
The data is made available under a [CC BY-NC](../covidcast_licensing.md#creative-commons-attribution-noncommercial) license.

docs/api/covidcast-signals/hhs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Department of Health & Human Services (inactive)
3-
parent: Data Sources and Signals
2+
title: Department of Health & Human Services
3+
parent: Inactive Signals
44
grand_parent: COVIDcast Main Endpoint
55
---
66

docs/epidata_development.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ $ [sudo] make test pdb=1
5050
$ [sudo] make test test=repos/delphi/delphi-epidata/integrations/acquisition
5151
```
5252

53-
You can read the commands executed by the Makefile [here](../dev/local/Makefile).
53+
You can read the commands executed by the Makefile [here](https://github.com/cmu-delphi/delphi-epidata/blob/dev/dev/local/Makefile).
5454

5555
## Rapid Iteration and Bind Mounts
5656

@@ -88,8 +88,8 @@ You can test your changes manually by:
8888

8989
What follows is a worked demonstration based on the `fluview` endpoint. Before
9090
starting, make sure that you have the `delphi_database_epidata`,
91-
`delphi_web_epidata`, and `delphi_redis` containers running; if you don't, see
92-
the Makefile instructions above.
91+
`delphi_web_epidata`, and `delphi_redis` containers running (with `docker ps`);
92+
if you don't, see the Makefile instructions above.
9393

9494
First, let's insert some fake data into the `fluview` table:
9595

docs/index.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,16 @@ has_children: true
44
nav_order: 1
55
---
66

7-
# Delphi's Epidata API
7+
# The Epidata API
88

99
Delphi's Epidata API provides real-time access to epidemiological surveillance data.
1010
It is built and maintained by the Carnegie Mellon University [Delphi research
1111
group](https://delphi.cmu.edu/). The Epidata API includes:
1212

13-
- [COVIDcast data](api/covidcast.md), providing daily updates about COVID-19
14-
activity across the United States. [API clients](api/covidcast_clients.md) for
15-
quick access to COVID data are available.
16-
- [Data about other diseases](api/README.md), including influenza, dengue, and
17-
other diseases tracked by Delphi through various data streams.
13+
* The [main endpoint (COVIDcast)](api/covidcast.md), providing daily updates about current COVID-19 and influenza activity across the United States.
14+
* A [variety of other endpoints](api/README.md), providing primarily historical data about various diseases including COVID-19, influenza, dengue fever, and norovirus in several countries.
15+
16+
A [full-featured R client](api/client_libraries.md) is available for quick access to all data. While we continue developing a full-featured Python client, the [legacy Python client](api/client_libraries.md#python) remains available. The main endpoint can also be accessed with a [dedicated COVIDcast client](api/covidcast_clients.md).
1817

1918
Anyone may access the Epidata API anonymously without providing any personal
2019
data. Anonymous API access is currently rate-limited and restricted to public

integrations/client/test_delphi_epidata.py

Lines changed: 64 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
# standard library
44
import time
5+
import json
56
from json import JSONDecodeError
6-
from requests.models import Response
77
from unittest.mock import MagicMock, patch
88

99
# first party
@@ -49,6 +49,12 @@ def localSetUp(self):
4949
secrets.db.host = 'delphi_database_epidata'
5050
secrets.db.epi = ('user', 'pass')
5151

52+
@pytest.fixture(autouse=True)
53+
def capsys(self, capsys):
54+
"""Hook capsys (stdout and stderr) into this test class."""
55+
56+
self.capsys = capsys
57+
5258
def test_covidcast(self):
5359
"""Test that the covidcast endpoint returns expected data."""
5460

@@ -238,46 +244,46 @@ def raise_for_status(self): pass
238244

239245
try:
240246
with self.subTest(name='test multiple GET'):
241-
with self.assertLogs('delphi_epidata_client', level='INFO') as logs:
242-
get.reset_mock()
243-
get.return_value = MockResponse(b'{"key": "value"}', 200)
244-
Epidata._request_with_retry("test_endpoint1", params={"key1": "value1"})
245-
Epidata._request_with_retry("test_endpoint2", params={"key2": "value2"})
247+
get.reset_mock()
248+
get.return_value = MockResponse(b'{"key": "value"}', 200)
249+
Epidata._request_with_retry("test_endpoint1", params={"key1": "value1"})
250+
Epidata._request_with_retry("test_endpoint2", params={"key2": "value2"})
246251

247-
output = logs.output
252+
captured = self.capsys.readouterr()
253+
output = captured.err.splitlines()
248254
self.assertEqual(len(output), 4) # [request, response, request, response]
249255
self.assertIn("Sending GET request", output[0])
250-
self.assertIn("\"url\": \"http://delphi_web_epidata/epidata/test_endpoint1/\"", output[0])
251-
self.assertIn("\"params\": {\"key1\": \"value1\"}", output[0])
256+
self.assertIn("\'url\': \'http://delphi_web_epidata/epidata/test_endpoint1/\'", output[0])
257+
self.assertIn("\'params\': {\'key1\': \'value1\'}", output[0])
252258
self.assertIn("Received response", output[1])
253-
self.assertIn("\"status_code\": 200", output[1])
254-
self.assertIn("\"len\": 16", output[1])
259+
self.assertIn("\'status_code\': 200", output[1])
260+
self.assertIn("\'len\': 16", output[1])
255261
self.assertIn("Sending GET request", output[2])
256-
self.assertIn("\"url\": \"http://delphi_web_epidata/epidata/test_endpoint2/\"", output[2])
257-
self.assertIn("\"params\": {\"key2\": \"value2\"}", output[2])
262+
self.assertIn("\'url\': \'http://delphi_web_epidata/epidata/test_endpoint2/\'", output[2])
263+
self.assertIn("\'params\': {\'key2\': \'value2\'}", output[2])
258264
self.assertIn("Received response", output[3])
259-
self.assertIn("\"status_code\": 200", output[3])
260-
self.assertIn("\"len\": 16", output[3])
265+
self.assertIn("\'status_code\': 200", output[3])
266+
self.assertIn("\'len\': 16", output[3])
261267

262268
with self.subTest(name='test GET and POST'):
263-
with self.assertLogs('delphi_epidata_client', level='INFO') as logs:
264-
get.reset_mock()
265-
get.return_value = MockResponse(b'{"key": "value"}', 414)
266-
post.reset_mock()
267-
post.return_value = MockResponse(b'{"key": "value"}', 200)
268-
Epidata._request_with_retry("test_endpoint3", params={"key3": "value3"})
269-
270-
output = logs.output
271-
self.assertEqual(len(output), 3) # [request, response, request, response]
269+
get.reset_mock()
270+
get.return_value = MockResponse(b'{"key": "value"}', 414)
271+
post.reset_mock()
272+
post.return_value = MockResponse(b'{"key": "value"}', 200)
273+
Epidata._request_with_retry("test_endpoint3", params={"key3": "value3"})
274+
275+
captured = self.capsys.readouterr()
276+
output = captured.err.splitlines()
277+
self.assertEqual(len(output), 3) # [request, retry, response]
272278
self.assertIn("Sending GET request", output[0])
273-
self.assertIn("\"url\": \"http://delphi_web_epidata/epidata/test_endpoint3/\"", output[0])
274-
self.assertIn("\"params\": {\"key3\": \"value3\"}", output[0])
279+
self.assertIn("\'url\': \'http://delphi_web_epidata/epidata/test_endpoint3/\'", output[0])
280+
self.assertIn("\'params\': {\'key3\': \'value3\'}", output[0])
275281
self.assertIn("Received 414 response, retrying as POST request", output[1])
276-
self.assertIn("\"url\": \"http://delphi_web_epidata/epidata/test_endpoint3/\"", output[1])
277-
self.assertIn("\"params\": {\"key3\": \"value3\"}", output[1])
282+
self.assertIn("\'url\': \'http://delphi_web_epidata/epidata/test_endpoint3/\'", output[1])
283+
self.assertIn("\'params\': {\'key3\': \'value3\'}", output[1])
278284
self.assertIn("Received response", output[2])
279-
self.assertIn("\"status_code\": 200", output[2])
280-
self.assertIn("\"len\": 16", output[2])
285+
self.assertIn("\'status_code\': 200", output[2])
286+
self.assertIn("\'len\': 16", output[2])
281287
finally: # make sure this global is always reset
282288
Epidata.debug = False
283289

@@ -288,18 +294,42 @@ def test_sandbox(self, get, post):
288294
Epidata.debug = True
289295
Epidata.sandbox = True
290296
try:
291-
with self.assertLogs('delphi_epidata_client', level='INFO') as logs:
292-
Epidata.covidcast('src', 'sig', 'day', 'county', 20200414, '01234')
293-
output = logs.output
297+
Epidata.covidcast('src', 'sig', 'day', 'county', 20200414, '01234')
298+
captured = self.capsys.readouterr()
299+
output = captured.err.splitlines()
294300
self.assertEqual(len(output), 1)
295301
self.assertIn("Sending GET request", output[0])
296-
self.assertIn("\"url\": \"http://delphi_web_epidata/epidata/covidcast/\"", output[0])
302+
self.assertIn("\'url\': \'http://delphi_web_epidata/epidata/covidcast/\'", output[0])
297303
get.assert_not_called()
298304
post.assert_not_called()
299305
finally: # make sure these globals are always reset
300306
Epidata.debug = False
301307
Epidata.sandbox = False
302308

309+
@patch('requests.get')
310+
def test_version_check(self, get):
311+
"""Test that the _version_check() function correctly logs a version discrepancy."""
312+
class MockJson:
313+
def __init__(self, content, status_code):
314+
self.content = content
315+
self.status_code = status_code
316+
def raise_for_status(self): pass
317+
def json(self): return json.loads(self.content)
318+
get.reset_mock()
319+
get.return_value = MockJson(b'{"info": {"version": "0.0.1"}}', 200)
320+
Epidata._version_check()
321+
captured = self.capsys.readouterr()
322+
output = captured.err.splitlines()
323+
self.assertEqual(len(output), 1)
324+
self.assertIn("Client version not up to date", output[0])
325+
self.assertIn("\'latest_version\': \'0.0.1\'", output[0])
326+
327+
@patch('delphi.epidata.client.delphi_epidata.Epidata._version_check')
328+
def test_version_check_once(self, version_check):
329+
"""Test that the _version_check() function is only called once on initial module import."""
330+
from delphi.epidata.client.delphi_epidata import Epidata
331+
version_check.assert_not_called()
332+
303333
def test_geo_value(self):
304334
"""test different variants of geo types: single, *, multi."""
305335

src/acquisition/covid_hosp/common/database.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
# first party
1313
import delphi.operations.secrets as secrets
14-
from delphi.epidata.common.logger import get_structured_logger
14+
from delphi_utils import get_structured_logger
1515

1616
Columndef = namedtuple("Columndef", "csv_name sql_name dtype")
1717

src/acquisition/covidcast/csv_importer.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,9 @@
1313
import pandas as pd
1414

1515
# first party
16-
from delphi_utils import Nans
16+
from delphi_utils import get_structured_logger, Nans
1717
from delphi.utils.epiweek import delta_epiweeks
1818
from delphi.epidata.common.covidcast_row import CovidcastRow
19-
from delphi.epidata.common.logger import get_structured_logger
2019

2120
DataFrameRow = NamedTuple('DFRow', [
2221
('geo_id', str),

src/acquisition/covidcast/csv_to_database.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from delphi.epidata.acquisition.covidcast.csv_importer import CsvImporter, PathDetails
1212
from delphi.epidata.acquisition.covidcast.database import Database, DBLoadStateException
1313
from delphi.epidata.acquisition.covidcast.file_archiver import FileArchiver
14-
from delphi.epidata.common.logger import get_structured_logger
14+
from delphi_utils import get_structured_logger
1515

1616

1717
def get_argument_parser():

src/acquisition/covidcast/database.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
# first party
1616
import delphi.operations.secrets as secrets
17-
from delphi.epidata.common.logger import get_structured_logger
17+
from delphi_utils import get_structured_logger
1818
from delphi.epidata.common.covidcast_row import CovidcastRow
1919

2020

@@ -117,28 +117,28 @@ def insert_or_update_batch(self, cc_rows: List[CovidcastRow], batch_size=2**20,
117117
get_structured_logger("insert_or_update_batch").fatal(err_msg)
118118
raise DBLoadStateException(err_msg)
119119

120-
# NOTE: `value_update_timestamp` is hardcoded to "NOW" (which is appropriate) and
120+
# NOTE: `value_update_timestamp` is hardcoded to "NOW" (which is appropriate) and
121121
# `is_latest_issue` is hardcoded to 1 (which is temporary and addressed later in this method)
122122
insert_into_loader_sql = f'''
123123
INSERT INTO `{self.load_table}`
124124
(`source`, `signal`, `time_type`, `geo_type`, `time_value`, `geo_value`,
125-
`value_updated_timestamp`, `value`, `stderr`, `sample_size`, `issue`, `lag`,
125+
`value_updated_timestamp`, `value`, `stderr`, `sample_size`, `issue`, `lag`,
126126
`is_latest_issue`, `missing_value`, `missing_stderr`, `missing_sample_size`)
127127
VALUES
128-
(%s, %s, %s, %s, %s, %s,
129-
UNIX_TIMESTAMP(NOW()), %s, %s, %s, %s, %s,
128+
(%s, %s, %s, %s, %s, %s,
129+
UNIX_TIMESTAMP(NOW()), %s, %s, %s, %s, %s,
130130
1, %s, %s, %s)
131131
'''
132132

133133
# all load table entries are already marked "is_latest_issue".
134134
# if an entry in the load table is NOT in the latest table, it is clearly now the latest value for that key (so we do nothing (thanks to INNER join)).
135135
# if an entry *IS* in both load and latest tables, but latest table issue is newer, unmark is_latest_issue in load.
136136
fix_is_latest_issue_sql = f'''
137-
UPDATE
138-
`{self.load_table}` JOIN `{self.latest_view}`
139-
USING (`source`, `signal`, `geo_type`, `geo_value`, `time_type`, `time_value`)
140-
SET `{self.load_table}`.`is_latest_issue`=0
141-
WHERE `{self.load_table}`.`issue` < `{self.latest_view}`.`issue`
137+
UPDATE
138+
`{self.load_table}` JOIN `{self.latest_view}`
139+
USING (`source`, `signal`, `geo_type`, `geo_value`, `time_type`, `time_value`)
140+
SET `{self.load_table}`.`is_latest_issue`=0
141+
WHERE `{self.load_table}`.`issue` < `{self.latest_view}`.`issue`
142142
'''
143143

144144
# TODO: consider handling cc_rows as a generator instead of a list

src/acquisition/covidcast/file_archiver.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import shutil
77

88
# first party
9-
from delphi.epidata.common.logger import get_structured_logger
9+
from delphi_utils import get_structured_logger
1010

1111
class FileArchiver:
1212
"""Archives files by moving and compressing."""

src/client/delphi_epidata.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Epidata <- (function() {
1515
# API base url
1616
BASE_URL <- getOption('epidata.url', default = 'https://api.delphi.cmu.edu/epidata/')
1717

18-
client_version <- '4.1.23'
18+
client_version <- '4.1.24'
1919

2020
auth <- getOption("epidata.auth", default = NA)
2121

0 commit comments

Comments
 (0)