Add reasonForNonRelease to schema #93

MarinaNitze · 2013-07-23T01:19:53Z

For datasets with accessLevel = private, an agency has to document with its Office of General Counsel (or other designated entity) why it can't be released.

While this field would not be surfaced in the public data inventory, it should be captured in an additional metadata field for the Enterprise Data Inventory, and required for datasets where accessLevel = private.

The rationale is that simply documenting a reason for not releasing a dataset with OGC does not guarantee that any identifier or other information is collected alongside that reason. This information rightly belongs in the (private) Enterprise Data Inventory.

Agencies could selectively use the recordAccessLevel metadata field proposed earlier to surface private datasets and their reason for not being released.

Not thrilled with my suggested name, feel free to propose a better one!

waldoj · 2013-07-29T20:13:14Z

Marina, what do you envision going in this field? Free-form descriptive text? I don't know anything about the processes within government that will track this documentation—is the most logical way to relate a dataset to its private-only rationale to do so within a field like this?

BernHyland · 2013-07-29T20:35:34Z

Hi Marina,
Apparently I replied to wrong address. Resending.

Begin forwarded message:

From: Bernadette Hyland [email protected]
Subject: Re: [project-open-data.github.io] Add operatingUnit Field (#89)
Date: July 29, 2013 4:31:50 PM EDT
To: "project-open-data/project-open-data.github.io" [email protected]
Cc: Fadi Maali [email protected], John Erickson [email protected]

Hi Marina,
A lot is known about the problem you describe & some really smart people have already cracked this nut. Agencies collect, curate and publish data in all sorts of ways with all sorts of contact information, often office emails & telephone numbers. Contact info comes in a wide variety of formats containing no name, first name, last name, salutation, title, agency, organization, and addresses -- the bane of many data administrators existence. There are policy & technology issues at play here.

IMO, the Open Data Project could do a great service to get behind, through providing input, a standard vocabulary for describing government published datasets. One such effort that has benefited from some really smart people dedicated to Web standards & government transparency is the RDF vocabulary called DCAT.[1] I'm sure there are others too, but I'm familiar with this project.

DCAT is nearly publication as an open Web standard and has been produced in a transparent, peer-reviewed manner. I encourage you to post your questions & feedback to [email protected] so we can work cooperatively to advance open government publication efforts. If you're facing some things that haven't been contemplated by DCAT, now would be a great time to address this.

Cheers,

Bernadette Hyland
CEO, 3 Round Stones, Inc
co-chair W3C Government Linked Data WG

[1] http://www.w3.org/TR/vocab-dcat/

On Jul 22, 2013, at 8:51 PM, MarinaMartin [email protected] wrote:

While datasets are ultimately owned by an agency, they are really collected and maintained on an operating unit basis. While contact names and emails may change, a dataset's associated operating unit probably will not. Making this a new, required field makes it clearer where to go with questions for the public consuming the data, the agency officials responsible for updating the metadata, and other agencies looking to access the data. It can also help agencies assess internal compliance with publishing data, and is likely to be part of an agency internal data management system for workflow purposes.

Different agencies call their sub-units different things: departments, POCs, bureaus, etc. In asking around, "operating unit" was most generic, but I'm open to an even more generic term.

What do you all think?

—
Reply to this email directly or view it on GitHub.

seanherron · 2013-07-31T02:11:02Z

Ideally, there would be something like a FOIA-type system, where if data doesn't meet one of a number of criteria for nonrelease it would be required to be released, and thus this field would need to be one of the predefined criteria. Logistically, however, this may be too ambitious. It may be good for us to create a set of "acceptable" criteria that we could give to agencies as suggested guidance for why a dataset may not be releasable (and the reverse).

What about NonReleaseJustification or RestrictionJustification?

MarinaNitze · 2013-08-01T02:54:37Z

@waldoj Yes I envision it as being a free-text field. The agencies already have to collect this information for each new dataset created/collected that's not going to be released, going forward. So isn't it logical to store this reason in the Enterprise data inventory (which, remember, is private -- not the public inventory)? They're storing it anyway -- but without a field they will, if I were to guess, store them separately and in a harder-to-find-internally spot.

@seanherron I think the list of options here is way too broad and will be defined by agencies' general counsels. I would suggest leaving this as a free text field and not providing criteria.

MarinaNitze · 2013-08-01T02:57:01Z

P.S. I have no problem with changing the name of this suggested field.

MarinaNitze · 2013-08-01T02:59:13Z

@BernHyland We made great efforts to match DCAT in this schema wherever possible -- the only two existing fields that do not match DCAT are accessLevel and systemOfRecord. This issue is specifically about giving agencies a place to document the reason for NOT releasing a particular dataset, in their internal-only enterprise data asset inventories. I'm not so so sure that is widely applicable enough to warrant inclusion in a standard like DCAT but I appreciate the reminder to stay involved in those conversations!

gbinal · 2013-08-07T16:28:39Z

Does the benefit of encouraging better behavior outweigh the complexity that adding this brings? I don't think it's an overly large addition to the agency workload but it is an added lift. In general, I always worry about the Christmas tree effect when it comes to adding further to what each agency is required to do.

seanherron · 2013-08-07T16:52:17Z

@gbinal I think if some sort of rationale isn't included people will either a) assume that the intent is nefarious and that we are hiding the data for no good reason, b) email the POC and ask for clarification/release, or c) forget about it entirely. For high-volume and frequently desired datasets (maybe some of the HHS data that has potential for PII, etc) putting a reasonable statement out there as to why it's private is good for transparency and will reduce the number of queries to the POC and angry tweets if they assume it's private for a questionable reason.

My concern would be that agencies wouldn't provide this information for legal reasons or would provide obtuse legalese that is difficult to parse and understand. If it's not going to be used, then there's not a lot of value in adding it.

konklone · 2013-08-07T17:10:17Z

I understand the Christmas tree argument, but in this case it seems merited. It'll only add work for non-released datasets (which are already saving a lot of work by not being released!), and agencies should have an on-the-record reason for not releasing a dataset anyway.

I also support keeping this a free text field, rather than selecting a preset exemption, to encourage a descriptive rationale. The field won't mean much if it doesn't communicate more than a category.

MarinaNitze · 2013-08-14T20:48:01Z

The discussion has moved towards combining the intent of this proposed field with the accessDetails field in #90. I'm closing this discussion -- please chat over at #90.

seanherron mentioned this issue Jul 31, 2013

Updated license field to contain additional information #98

Closed

This was referenced Aug 7, 2013

Add operatingUnit Field #89

Closed

Add recordAccessLevel #92

Closed

MarinaNitze closed this as completed Aug 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reasonForNonRelease to schema #93

Add reasonForNonRelease to schema #93

MarinaNitze commented Jul 23, 2013

waldoj commented Jul 29, 2013

BernHyland commented Jul 29, 2013

seanherron commented Jul 31, 2013

MarinaNitze commented Aug 1, 2013

MarinaNitze commented Aug 1, 2013

MarinaNitze commented Aug 1, 2013

gbinal commented Aug 7, 2013

seanherron commented Aug 7, 2013

konklone commented Aug 7, 2013

MarinaNitze commented Aug 14, 2013

Add reasonForNonRelease to schema #93

Add reasonForNonRelease to schema #93

Comments

MarinaNitze commented Jul 23, 2013

waldoj commented Jul 29, 2013

BernHyland commented Jul 29, 2013

seanherron commented Jul 31, 2013

MarinaNitze commented Aug 1, 2013

MarinaNitze commented Aug 1, 2013

MarinaNitze commented Aug 1, 2013

gbinal commented Aug 7, 2013

seanherron commented Aug 7, 2013

konklone commented Aug 7, 2013

MarinaNitze commented Aug 14, 2013