Skip to content
This repository was archived by the owner on Jun 18, 2024. It is now read-only.

Proposed Final Metadata Schema 1.0 #44

Merged
merged 23 commits into from
Sep 20, 2013
Merged

Proposed Final Metadata Schema 1.0 #44

merged 23 commits into from
Sep 20, 2013

Conversation

MarinaNitze
Copy link
Contributor

Compiles all proposed changes with general consensus into this commit, for final approval.

The Category field guidance incorrectly referred to the metadata name as "category" when it should be "theme."
@jpmckinney
Copy link
Contributor

Note: also fixes #18

This was referenced Aug 25, 2013
@jpmckinney
Copy link
Contributor

keyword was renamed to keywords in #113. However, the RDF term is dcat:keyword. In order to follow the DCAT specification, the singular dcat:keyword must be used. Will the new documentation point out this difference between the RDF serialization and other serializations? It seems simpler to just use the same term for all serializations.

@MarinaNitze
Copy link
Contributor Author

@jpmckinney In RDF, each keyword is listed individually, but here, we have an array. Is it still appropriate to keep it singular? That seems confusing and was why consensus was to rename it to the plural.

@jpmckinney
Copy link
Contributor

In RDF, terms for properties are usually singular, even though you can state a property multiple times (unless it has cardinality restrictions). In some RDF serializations, instead of making multiple statements (which you must do in, for example, RDF/XML) you can instead use an array-like syntax, like in Turtle, e.g.:

ex:dataset dcat:keyword ["foo", "bar", "baz"] .

which is equivalent to:

ex:dataset dcat:keyword "foo" .
ex:dataset dcat:keyword "bar" .
ex:dataset dcat:keyword "baz" .

If this project is implementing DCAT, shouldn't it use the same terms? Is the confusion really so severe that it makes sense to deviate from a standard adopted by multiple organizations?

@mhogeweg
Copy link
Contributor

I agree with @jpmckinney that keeping the property name 'keyword' instead of 'keywords' is preferred. similar to the above, various XML-based metadata encodings also use a single keyword/subject/themekey/... that is repeated as needed.

@MarinaNitze MarinaNitze mentioned this pull request Aug 26, 2013
@MarinaNitze
Copy link
Contributor Author

I think good arguments were made to keep "keyword" singular. Taking these into account in #44.

MarinaNitze added a commit that referenced this pull request Aug 26, 2013
In response to discussion in #44
@seanherron
Copy link
Contributor

Pinging this for action - we should either merge these in soon or wait until after November.

@haleyvandyck
Copy link
Contributor

Thanks for the ping. Because changes to the metadata schema have policy implications, they must go through an internal review process (see Project Open Data Governance for more detail on review processes).

We are expecting to have final changes merged to the metadata next week, which will formally be V1.0 of the schema. Following those changes, and consistent with whats outlined in the governance, additional changes to the metadata schema will be evaluated in 6 month increments going forward to enable stability and version control.

Thanks for everyone's help and patience improving the first round of the schema.

@seanherron
Copy link
Contributor

@seanherron
Copy link
Contributor

Hey @MarinaMartin, maybe I missed this, but why does the cardinality of accessURL change from 0,n to 0,1? According to #16, distribution should simply be a collection of accessURLs and some metadata like filetype. If that's the case, shouldn't there be a N cardinality for that field? If not, what purpose does distribution now have since we're associating a one-to-one relationship between dataset access files and entries in the schema?

Sorry if this is addresses elsewhere, haven't been able to find it.

@MarinaNitze
Copy link
Contributor Author

@seanherron The accessURL field itself should only contain 1 URL. You could include multiple accessURL fields in a distribution array. So isn't that 0,1? Maybe I confused it.

@seanherron
Copy link
Contributor

@MarinaMartin I guess I'm confused of the use case then - why would you include multiple accessURLs in distribution and only one in the accessURL field?

@MarinaNitze
Copy link
Contributor Author

@seanherron Because you don't have to use distribution. If your dataset just has 1 download location, just use the accessURL field. But if you have multiple download URLs, then distribution should be a concatenation of multiple accessURL + format pairs.

@mhogeweg
Copy link
Contributor

perhaps a clarification of the cardinality is in place. accessURL has a cardinality of (1,n), but is only required when the file is available for public download. that means there will be situations when there is no accessURL. hence cardinality could be (0,n).

distribution has cardinality of (0,n) and is set to not be required. slightly different condition from accessURL. but why would distribution be occurring multiple times if it is represented as a (one) array containing multiple pairs of accessURL/format?

it appears the cardinality is used both to indicate whether the field is optional/mandatory and whether there's one occurrence or multiple. but this doesn't address a field like keyword that occurs once but is a comma-separated list of terms, arrays, etc. Perhaps include an object model (UML) of the dcat json structure?

@seanherron
Copy link
Contributor

@mhogeweg the cardinality of accessURL changes in this commit from (1,n) to (0,1) https://github.com/project-open-data/project-open-data.github.io/pull/44/files

@JoshData
Copy link
Contributor

Should programOffice have cardinality (1, n) instead of (0, n)?

Why is format an array? Typically a URL will respond consistently with a single MIME type. (Or is this to support HTTP Accept?)

@JoshData
Copy link
Contributor

accrualPeriodicity's example should be in title case

language's example needs to be updated to be an array of strings rather than a comma-separated string.

references's example should be an array but it looks a bit messed up.

PrimaryITInvestmentUII is the only field with an uppercase initial letter for JSON. (Not objecting, just flagging in case it is unintentional.)

systemOfRecords lost its details table (the table with cardinality etc.)

@mhogeweg
Copy link
Contributor

@JoshData yes, programOffice should have cardinality (1,n) if it's mandatory (as seems to be the intention per the changes).

suggest modifying the example for language to be an array: change {"language":"es-MX, wo, nv, en-US"} to: { "language": [ "es-MX", "wo", "nv", "en-US" ] }. In general: test JSON examples with http://jsonlint.com/.

@seanherron
Copy link
Contributor

@mhogeweg agree re: language

@haleyvandyck haleyvandyck merged commit e09f2fd into project-open-data:master Sep 20, 2013
@haleyvandyck
Copy link
Contributor

Thank you all for your incredible contributions and discussion to help improve the first version of the metadata schema.

At long last, we have finally cleared the changes represented in #44 through the White House processes.

In addition to the changes represented in this request we will be making some additional updates on the treatment of BureauCode and the addition of a ProgramCode as well. Details coming soon.

This pull request will constitute v1.0 of the schema. Per the project open data governance, we will continue to evaluate changes to the schema over time on regular 6 month intervals. These changes and future ones will be tracked at /metadata-changelog.

Thank you all for taking in part in this exciting, precedent setting project -- we look forward to continue to work with you all.

@jpmckinney
Copy link
Contributor

Great work! It's been a pleasure participating in the schema's development. Will the examples be updated to match the schema, or should an issue be opened for that?

@waldoj
Copy link

waldoj commented Sep 20, 2013

👍 x 💯

@haleyvandyck
Copy link
Contributor

@jpmckinney yes--please feel free to open an issue or a pull request with specfic changes

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants