Skip to content

nssp documentation draft #1439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 25, 2024
100 changes: 100 additions & 0 deletions docs/api/covidcast-signals/nssp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: NSSP emergency department visits
parent: Data Sources and Signals
grand_parent: COVIDcast Main Endpoint
---
# National Syndromic Surveillance Program (NSSP) Emerency Department (ED) visits
{: .no_toc}

* **Source name:** `nssp`
* **Earliest issue available:** (TODO ask Minh)
* **Number of data revisions since 19 May 2020:** 0
* **Date of last change:** Never
* **Available for:** county, hrr, msa, state, nation (see [geography coding docs](../covidcast_geography.md))
* **Time type:** week (see [date format docs](../covidcast_times.md))
* **License:** [Public Domain US Government](https://www.usa.gov/government-works)

## Overview

[The National Syndromic Surveillance Program (NSSP)](https://www.cdc.gov/nssp/php/about/index.html) is an effort to track epidemiologically relevant conditions.
This dataset in particular tracks emergency department (ED) visits arising from a subset of influenza-like illnesses, specifically influenza, COVID-19, and RSV.
It is derived from the CDC's [Respiratory Virus Response NSSP Emergency Department Visit Trajectories dataset](https://data.cdc.gov/Public-Health-Surveillance/2023-Respiratory-Virus-Response-NSSP-Emergency-Dep/rdmq-nq56/about_data), which started reporting data in late 2022.
As of May 2024, NSSP received data from 78% of US EDs.

| Signal | Description |
|---------------------------------|-------------------------------------------------------------------------|
| `pct_visits_covid` | Percent of ED visits that had a discharge diagnosis code of covid |
| `pct_visits_influenza` | Percent of ED visits that had a discharge diagnosis code of influenza |
| `pct_visits_rsv` | Percent of ED visits that had a discharge diagnosis code of rsv |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this is late in the game, so optional change, but signal names would be clearer as pct_er_visits_*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pct_ed_visits_*? I'm still not sure about the difference tbh. I'd leave it up to Minh, as she'd have the most to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering "ed", but it's not how laypeople refer to it. As far as I'm aware, only healthcare workers call the ER the "ED". So I think for a more general audience, "er" is clearer, although it does differ from the dataset name.

Will ask Roni

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmdefries, sorry I'm late here!

Minor suggestion: In the TOC, consider adding a link to the "Overview" to maintain consistency with the TOCs of signals such as "Change Healthcare" and "Doctor Visits".

Nothing else stands out, the headings are great. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for looking! Though I don't really follow what you mean. When I look at the table of contents in doctor-visits:

## Table of Contents
{: .no_toc .text-delta}
1. TOC
{:toc}
## Estimation

it seems just the same as what's here?
## Table of contents
{: .no_toc .text-delta}
1. TOC
{:toc}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for a response from Roni about naming

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsweber2 sorry for the confusion: you can prob just ignore my comment! I think the "Overview" section I'm pointing to may be contained in the {:toc} . Here's a visual where in doctor's visits the TOC has "Overview" circled in red:
image
Thanks!

| `pct_visits_combined` | Percent of ED visits that had a discharge diagnosis code of covid, influenza, or rsv |
| `smoothed_pct_visits_covid` | 3 week moving average of `pct_visits_covid` |
| `smoothed_pct_visits_influenza` | 3 week moving average of `pct_visits_influenza` |
| `smoothed_pct_visits_rsv` | 3 week moving average of `pct_visits_rsv` |
| `smoothed_pct_visits_combined` | 3 week moving average of `pct_visits_combined` |

## Table of contents
{: .no_toc .text-delta}

1. TOC
{:toc}

## Estimation

The percent visits signals are calculated as a fraction of visits at facilities reporting to NSSP, rather than all facilities in the area.
County and state level data is reported as-is from NSSP, without modification, while `hrr` and `msa` are estimated by Delphi.

### Geographic weighting
As the original data is a percentage and raw case counts are not available, `hrr` and `msa` values are computed from county-level data using a weighted mean. Each county is assigned a weight equal to its population in the last census (2020).
This assumes that the number of ED visits is proportional to the overall population of a county, i.e. the per-capita ED visit rate is the same for all counties, which may not be the case (for example, denser counties may have easier access to EDs and thus higher rates of ED visits per capita).

State-level data is reported separately, and is **not** simply an average of the county-level data, but may contain facilities omitted at the regional level (for example, if small facilities are excluded for privacy reasons).[^1]

### Smoothing

Smoothed signals are calculated using a simple 3 week moving average of the relevant weekly signals. Note that since the unsmoothed `pct_visits_*` signals report weekly data, each smoothed signal value is computed from 3 points rather than 21, as would be used if the unsmoothed data were reported daily.


## Limitations

There is substantial missingness at the county level. This tends to impact more rural and lower population locations. See the [missingness section below](#missingness) for more information.

NSSP notes that not every patient entering an ED is tested for the conditions of interest, so the data may undercount total cases and as a result underreport percent visits.

Our [geographic weighting approach](#geographic-weighting) assumes that the number of ED visits is proportional to the overall population of a county. However, in reality, there are various factors that could affect the accuracy of this assumption.

For example, we might expect denser, more urban counties to have 1) more and larger EDs and 2) easier access to EDs. The first factor may mean that residents of rural counties are more likely to go to EDs in urban counties. The second factor may increase the total number of ED visits that someone living in an urban county will make, that is, the average urban resident may make more ED visits than the average rural resident.

As a result, total ED visits per capita in rural counties may be lower than total ED visits per capita in urban counties. Since our weighting approach uses population as the weighting factor, rural counties would tend to be overrepresented in estimated values.

Some low population counties occasionally report outliers, e.g. 33.33%, 50%, 100% of ER visits being covid-related. As of May 2024, an analysis shows around 10 unusually high values across the full history of all signals, so they are rare. We expect that these high rates are by chance, due to a small total number of ED visits in a given week.


## Missingness

As of May 2024, NSSP received data from 78% of US EDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a link for this number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's on the very bottom of the main NSSP index. Felt a little redundant linking to it again, but could definitely include

The most noticeable gaps in [county coverage](https://www.cdc.gov/nssp/media/images/2024/04/Participation-with-date.png) are in California, Colorado, Missouri, Oklahoma, and Virginia, though most states have some counties that are absent.

The following states have no country-level data at all: CA, WA, AK, AZ, AL, CO, SD, ND, MO, AR, FL, OH, NH, CT, NJ.
Counties which do not have data are listed, but with an `NA` value.

At the state level, South Dakota, Missouri, and territories are not reported.


## Lag and Backfill

The weekly signal is reported on Friday mornings, adding data from the prior week.
For example, on Friday, 2024-04-19, the source added new data from the week ending 2024-04-13.

This data source has frequent backfill, primarily arising from newly included EDs. When a new facility joins the reporting network, its historical data is added to the dataset, resulting in changes to historical values for every geographic level that ED is part of (county through nation). Because of this, the broadest geographic levels are more likely to be revised.

In previous revisions, we have noted changes to values dating back about 2 years.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: ideally, we'd have more detail here. For example,

  • how often do revisions happen?
  • how often are new facilities added?
  • are there any patterns to where new facilities are added?
  • how much do values change as a result?

@minhkhul

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a similar revision analysis posted on the other sources? I don't ever remember seeing them. I agree it would be useful, but that sounds more like an overall doc overhaul thing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this type of info for some sources. It's something that Roni has requested more detail about recently and that we've wanted info about for various signals at various points in time.

I don't think this is critical to add now, but it will be easiest, since we've done the statistical analysis recently. I was hoping that @minhkhul or you had come across info about this during the analysis.

Questions about are some things that came to mind but are more guiding questions for the type of detail I'd be looking for. They don't all need to be answered exactly, but any additional info would be helpful.

Ultimately, this is up to y'all.



## Source and Licensing

This source is derived from the CDC's [Respiratory Virus Response NSSP Emergency Department Visit Trajectories dataset](https://data.cdc.gov/Public-Health-Surveillance/2023-Respiratory-Virus-Response-NSSP-Emergency-Dep/rdmq-nq56/about_data).
There is another version of the dataset that includes [state data only](https://data.cdc.gov/Public-Health-Surveillance/2023-Respiratory-Virus-Response-NSSP-Emergency-Dep/7mra-9cq9/about_data).

This data was originally published by the National Center for Health Statistics, and is made available here as a convenience to the forecasting community under the terms of the original license, which is [U.S. Government Public Domain](https://www.usa.gov/government-copyright).

[^1]: (TODO should probably confirm this in some way)
Loading