You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 18, 2024. It is now read-only.
...lays out procedures for creating the public index of agency datasets. As it is currently written, the agencies "are only required to list datasets with an “Access Level” value of “public,”".
The schema defines "public" and restricted in the following manner:
"Choices: Public (is or could be made publicly available), Restricted (available under certain conditions),"
These definitions do not adequately define "public" or "restricted."
If restricted data can be made available under certain conditions, it should be able to be listed publicly. It may be that "restricted" means "able to be released to subsets of the public, like a qualified research community, under certain conditions." If this is the intent behind the "restricted" category, that should be made clear (it's unclear what "under certain conditions" means). Even if that's the intent of the "restricted" definition, that's a form of being made public (the "could" from the public category), and should result in those datasets being listed in the public data index.
In other words, "public" and "restricted" should be better defined, and the requirement that agencies list all of their data that "could" be made public should be applied to both "public" and "restricted" datasets, as they've been defined.
The guidance should provide details in the following form:
"Datasets are considered datasets that "could" be made publicly available if:
*certain information would need to be removed from the dataset before release
*significant resources would need to be allocated to digitize or prepare the information for release
*the data can only be released to a limited community due to privacy concerns
*an extraction process can create a new dataset on top of the current dataset to provide public value
etc"
Additionally, data that is affirmatively marked as "private" should not be automatically withheld from public listing; even if an agency determines that a dataset cannot be released publicly, that is a different determination from deciding whether to publicly acknowledge the dataset's existence.
Ultimately, the labels that determine whether datasets get publicly listed should be designed based on whether it's possible to acknowledge the dataset's existence publicly, which is a different decision from whether it's possible to release it, regardless of how much extraction, transformation, digitization, or anonymization is necessary to do it. Since the current OMB directive says that the public data index should include all data that "could" be made public, that "could" should be defined clearly, and empower public oversight of agency information policy decisions.
The text was updated successfully, but these errors were encountered:
As opposed to the current approach to describe public/restricted why not describe the conditions for use (‘use constraints’) or access (‘access constraints’) as defined by FGDC.
To pull out what I think is a main point of @johnwonderlich's - the OMB directive says the public inventory should explicitly include data that may not yet be marked as "public", but Project Open Data doesn't reflect this as explicitly as it probably should.
I was pausing on responding to this comment until the official new implementation guidance came out. It provides clearer guidance on the accessLevel field. It also updates the terminology slightly, making the three choices "public" "restricted public" and "non-public." There will also be an accessLevelComment field where you can (for "restricted public") explain how to gain access or (for "non-public") log the reason why the dataset cannot be released.
It's important to note that since the public inventory only requires an agency to include "public" (not "restricted public" or "non-public") datasets, this field can be used for internal purposes only or shared publicly as an agency sees fit.
Section 2 of the implementation guide:
http://project-open-data.github.io/implementation-guide/
...lays out procedures for creating the public index of agency datasets. As it is currently written, the agencies "are only required to list datasets with an “Access Level” value of “public,”".
The schema defines "public" and restricted in the following manner:
http://project-open-data.github.io/schema/
"Choices: Public (is or could be made publicly available), Restricted (available under certain conditions),"
These definitions do not adequately define "public" or "restricted."
If restricted data can be made available under certain conditions, it should be able to be listed publicly. It may be that "restricted" means "able to be released to subsets of the public, like a qualified research community, under certain conditions." If this is the intent behind the "restricted" category, that should be made clear (it's unclear what "under certain conditions" means). Even if that's the intent of the "restricted" definition, that's a form of being made public (the "could" from the public category), and should result in those datasets being listed in the public data index.
In other words, "public" and "restricted" should be better defined, and the requirement that agencies list all of their data that "could" be made public should be applied to both "public" and "restricted" datasets, as they've been defined.
The guidance should provide details in the following form:
"Datasets are considered datasets that "could" be made publicly available if:
*certain information would need to be removed from the dataset before release
*significant resources would need to be allocated to digitize or prepare the information for release
*the data can only be released to a limited community due to privacy concerns
*an extraction process can create a new dataset on top of the current dataset to provide public value
etc"
Additionally, data that is affirmatively marked as "private" should not be automatically withheld from public listing; even if an agency determines that a dataset cannot be released publicly, that is a different determination from deciding whether to publicly acknowledge the dataset's existence.
Ultimately, the labels that determine whether datasets get publicly listed should be designed based on whether it's possible to acknowledge the dataset's existence publicly, which is a different decision from whether it's possible to release it, regardless of how much extraction, transformation, digitization, or anonymization is necessary to do it. Since the current OMB directive says that the public data index should include all data that "could" be made public, that "could" should be defined clearly, and empower public oversight of agency information policy decisions.
The text was updated successfully, but these errors were encountered: