Skip to content
This repository was archived by the owner on Jun 18, 2024. It is now read-only.

Changing publisher cardinality to allow multiple values #96

Closed
wants to merge 1 commit into from

Conversation

seanherron
Copy link
Contributor

It would be useful to change the publisher field to allow for multiple agency contributions. This would be useful for datasets that are a result of joint agency collaboration (e.g. many NASA/NOAA datasets) or datasets that are from child agencies (such as HHS/FDA, Commerce/NOAA, etc) where there would be a benefit from listing both agencies as publishers.

Multiple publishers can simply be listed as comma delineated. This may also relate to #89

@MarinaNitze
Copy link
Contributor

Agreed in theory, but: I know we don't want to make metadata decisions based on CKAN, but how would this work to roll-up to an organization? CKAN doesn't have 1:n organizations. Default that the first in the list is the organization for search filtering purposes? Which agency is responsible for maintaining the dataset and providing a point of contact?

@seanherron
Copy link
Contributor Author

Is this field intended to match to CKAN's organization authentication (http://docs.ckan.org/en/latest/authorization.html)? I was viewing this as being different than organization from an access-control standpoint.

For search-filtering, is there a reason why the dataset couldn't appear under all searches with a linked organization (apart from the technical challenge of modifying CKAN to support that notion)? As for POC and management, many times those collaborations already have an established point person that can be responsible for maintaining the dataset. One could argue that the organization should simply be whatever agency that person works for, but I don't think we'd be representing the data accurately if we impose that limitation.

@mhogeweg
Copy link
Contributor

mhogeweg commented Aug 1, 2013

don't want to beat a dead horse, but the ISO 19115 metadata specification has an elegant solution for different contacts/organizations involved in metadata. a very common construct to separate the data custodian from the distributor from the metadata creator. Often the same, but especially when organizational approval processes are at play it is worthwhile collecting the different roles involved with the released data.

I have included mapping between ISO, FGDC and DCAT in pull request #74.

@seanherron
Copy link
Contributor Author

Thanks, @mhogeweg. I'm envious of your knowledge of these metadata specifications. I think there needs to be a careful balance between having too many metadata fields to fill out (and thus raising the barrier of entry since people will need to read a bunch of documentation and understand what field does what) and getting enough information to cast an accurate picture of the dataset. I'm not sure where that balance lies.

@mhogeweg
Copy link
Contributor

mhogeweg commented Aug 1, 2013

What's interesting is that many of the federal, state, and local governments already produce this more elaborate metadata for their data holdings. Check out a small sample:

These agencies already contributed their metadata and shared there (geospatial) data through programs like Geospatial One-Stop and later geo.data.gov. For them it's actually extra work to now prepare the DCAT structure.

As part of our developing the Geoportal Server to create these catalogs we have enabled exposing the catalogs in DCAT format, based on their existing FGDC/ISO metadata. But this all hinges on the mapping between their current FGDC/ISO metadata and DCAT (that I proposed in that pull request #74). All those existing implementers would benefit greatly from an agreed-upon mapping.

@MarinaNitze
Copy link
Contributor

We can use the publisher field as simply a string field, accepting comma-separated 1:n entries, and save the "organization" field in CKAN for the actual publisher of the respective data.json file. But that requires our anticipated API of agency.gov/.mil domains to be a little more robust because it will need a standard name for each organization instead of just [agency].gov. Make sense?

@mhogeweg
Copy link
Contributor

mhogeweg commented Aug 1, 2013

@MarinaMartin yes that does. NOAA for example has a pretty structured way of naming the specific group/department within NOAA that is related to the data in some way.

An example is this record of a dataset on Lake Level Reconstructions from the National Climatic Data Center. The full organization name is included as:

DOC/NOAA/NESDIS/NCDC > National Climatic Data Center, NESDIS, NOAA, U.S. Department of Commerce

which shows the full organizational place of NCDC within Department of Commerce.

I also suggest looking at reusing the OMB agency/bureau and Treasury codes that are included as an appendix in OMB Circular A11.

@MarinaNitze
Copy link
Contributor

After extensive discussion, consensus is for the "publisher" field to remain free text, and the addition of two new fields -- bureauCode and programOffice -- can provide a formal taxonomy. The bureauCode field runs off OMB Circular A-11, Appendix C which has unique codes for each agency and bureau. This applies across the entire federal government. The programOffice field is suggested to align to the Federal Programs Inventory at performance.gov, which agencies are encouraged to update as needed to be fully comprehensive. See #44.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants