Normalize language tag component of i18n datatype IRI? #337

kasei · 2020-01-17T23:57:26Z

Object to RDF Conversion step 13.2 constructs a datatype IRI based on language and direction:

If rdfDirection is i18n-datatype, set datatype to the result of appending language and the value of @direction in item separated by an underscore ("_") to https://www.w3.org/ns/i18n#. Initialize literal as an RDF literal using value and datatype.

While there is "MAY" normative text about normalizing language tags, there doesn't seem to be any text about normalizing the language tag values when included in a datatype IRI like this. Test tdi10 seems to assume the value is normalized in lowercase. Moreover, I would think that not normalizing could cause lots of trouble with unexpected data (e.g. two literals that differ only in the case of the language tag component of this datatype IRI; shouldn't such literals have the same value?).

The text was updated successfully, but these errors were encountered:

gkellogg · 2020-01-18T00:31:35Z

This as introduced in #167, but I don't think we discussed anything specifically about continuing to do the lower-case normalization when creating the i18n datatype, which makes sense. @iherman do you remember discussing this?

iherman · 2020-01-24T06:05:43Z

@gkellogg right, I do not think we discussed this. That being said, normalizing for the purpose of the URI handling does make sense imho. For the sake of consistency, we should follow the same rules as in #167.

We have to realize, however, that the language tags have a tradition that is more complex than that. AFAIK, they usually write 'en-US' instead of 'en-us'. The rules in #167 leave that intact, they just say that the tag must be valid, and we should probably say the same thing for this datatype.

Which does mean that the datatype comparison cannot be character by character but, instead, based on a regex. Ugly, but that is the world we live in:-)

iherman · 2020-01-24T18:11:23Z

This issue was discussed in a meeting.

RESOLVED: In toRDF recommend normalization of language tag based URIs
RESOLVED: … also compound literal form

View the transcript

3.1. Normalize language tags
link: #337
Rob Sanderson: This is about our workaround language tags in i18n namespace.
Gregg Kellogg: We removed requirements to normalize language tags to lowercase, because it is problematic for many people in i18n community. When creating RDF, we have possibility that 2 processors create different data types.
… The question is if that is what we intended. To allow 2 diff datatypes by 2 different processors.
Ivan Herman: That would be wrong in terms of RDF.
… If you have 2 datatypes with different cases, RDF sees those as not equal.
… Maybe pchampin_ will say I’m wrong…
Pierre-Antoine Champin: I slightly disagree. 2 different URLs may denote different things, but also the same, but this depends on implementation.
… It’s hard to require all impls to support these different i18n datatypes and consider them equal.
… We could say that they are semantically equivalent.
Ivan Herman: Yes, we can do that. I don’t know if we are discussing something that is insignificant.
… If we do that, and have an implementation that does datatype reasoning, then that impl will likely fall on its face.
… Datatype reasoning is quite a challenge. Many implementations just check char-by-char.
… We could say that if you use these datatypes, that you are supposed to lowercase language tags.
… It’s ugly, but I don’t see a better choice.
Rob Sanderson: Is there some i18n requirement?
Ivan Herman: No, it’s a habit that there is mixing of cases.
… Usual way is:
Ivan Herman: the usual way is : en-US and not en-us
Ivan Herman: We should not require normalization when using lang tags the old way, but we should when using i18n datatypes.
Rob Sanderson: Is the set of characters that is permissible in URIs and language tags compatible?
Ivan Herman: Just ASCII characters.
Pierre-Antoine Champin: There will be 2 kinds of RDF impls: ones not recognizing our custom IRIs, and those that do.
… Those that will take into account our custom datatypes, can interpret them as lang tags and do smart things.
… The roundtripping would be lost when direction is used. I’m still in favour of not normalizing them.
Gregg Kellogg: I’m neutral on normalization. We should add a non-normative note in any case.
Dave Longley: Was it a mistake to not normalize language tags when they were invented?
Ivan Herman: Invented by whom?
Dave Longley: Not JSON-LD, but the group that came up with it 30 years ago…
Ivan Herman: We can not change it because it’s out there already.
Rob Sanderson: We can fix it for reduced datatype IRI.
Dave Longley: It looks like this grew organically, so the spec was built around it.
… What we introduce is new, so we can enforce normalization.
… So we simplify part of the space.
Ivan Herman: I don’t disagree.
… How important is it to roundtrip on such a detail?
… Because that is why we are discussing this.
… I don’t think it’s important.
… So I would normalize it.
Gregg Kellogg: From RDF Concepts: “A literal is a language-tagged string if the third element is present. Lexical representations of language tags may be converted to lower case. The value space of language tags is always in lower case.”
Gregg Kellogg: We did change the language of JSON-LD, which always normalized language tags, which was over-strict.
… RDF spec says that language tags may be lowercased.
… We are talking here about special case: roundtripping.
… It’s a minor thing what we are going to do.
… I would support that we change the language in toRdf, that language tags be normalized in compound literals and i18n.
Rob Sanderson: We ran into this in practice when having to do case-insensitive language tag comparison.
Proposed resolution: In toRDF recommend normalization of language tag based URIs (Rob Sanderson)
Pierre-Antoine Champin: +1
Rob Sanderson: +1
Ivan Herman: +1
Benjamin Young: +0
Gregg Kellogg: +1
Dave Longley: +1
Ruben Taelman: +1
Harold Solbrig: +0
David I. Lehn: +1
Adam Soroka: +1
Resolution #2: In toRDF recommend normalization of language tag based URIs
Proposed resolution: … also compound literal form (Rob Sanderson)
Rob Sanderson: +1
Pierre-Antoine Champin: +1
Ruben Taelman: +1
Dave Longley: +1
Gregg Kellogg: +1
Adam Soroka: +1
Resolution #3: … also compound literal form
Ivan Herman: +1
David I. Lehn: +1
Benjamin Young: +1

…e language tags. For w3c/json-ld-api#337.

…compound literal. For #337.

gkellogg · 2020-01-28T23:48:58Z

@kasei (and @himorin), the algorithm was updated in PR #363 to normalize language tags when creating i18n literals or compound objects. Note that these operations are non-normative.

kasei · 2020-01-29T00:05:19Z

@gkellogg Looks good, though I'm not sure what you mean by "these operations are non-normative." Nothing in #363 suggests non-normative operations, does it? If it is non-normative, are there any tests that should be marked as requiring this normalization to pass?

gkellogg · 2020-01-29T00:24:19Z

The syntax document describes both i18n datatypes and compound literals as experimental and non-normative. The algorithms condition this behavior on the rdfDirection option, which defaults to null. The tests which use this are marked using the rdfDirection option. We don't really have a way of describing tests as being non-normative themselves.

gkellogg · 2020-01-29T00:27:27Z

Note that toRdf/di10 and toRdf/di12 require this normalization.

kasei · 2020-01-29T00:29:42Z

FWIW, tests like this in SPARQL use new values asserted for the mf:requires predicate. That was enough to indicate that you could be conformant without passing those tests if you didn't support the named feature.

gkellogg · 2020-01-29T00:31:14Z

Yes, adding something like mf:requires might be a good idea for this and some other cases such as HTML content extraction.

…e language tags. For w3c/json-ld-api#337.

…compound literal. For #337.

gkellogg added the wr:open label Jan 18, 2020

himorin added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Jan 19, 2020

himorin mentioned this issue Jan 21, 2020

Normalize language tag component of i18n datatype IRI? w3c/i18n-activity#840

Closed

gkellogg added a commit to w3c/json-ld-syntax that referenced this issue Jan 28, 2020

Update description of i18n datatype and Compound Literal to lower cas…

dd4105e

…e language tags. For w3c/json-ld-api#337.

gkellogg mentioned this issue Jan 28, 2020

Update description of i18n datatype w3c/json-ld-syntax#327

Merged

gkellogg added a commit that referenced this issue Jan 28, 2020

Normalize language tags to lower case when creating i18n datatype or …

7771d14

…compound literal. For #337.

gkellogg mentioned this issue Jan 28, 2020

Normalize language tags to lower case #363

Merged

gkellogg added wr:pending wr:spec-updated-partial and removed wr:open labels Jan 28, 2020

gkellogg added wr:commenter-agreed and removed wr:pending labels Jan 29, 2020

gkellogg mentioned this issue Jan 29, 2020

Use mf:requires for optional test behavior #365

Closed

gkellogg added a commit to w3c/json-ld-syntax that referenced this issue Jan 29, 2020

Update description of i18n datatype and Compound Literal to lower cas…

43bc5e2

…e language tags. For w3c/json-ld-api#337.

gkellogg added a commit that referenced this issue Jan 29, 2020

Normalize language tags to lower case when creating i18n datatype or …

c25e18c

…compound literal. For #337.

gkellogg added wr:spec-updated and removed wr:spec-updated-partial labels Jan 29, 2020

gkellogg closed this as completed Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalize language tag component of i18n datatype IRI? #337

Normalize language tag component of i18n datatype IRI? #337

kasei commented Jan 17, 2020

gkellogg commented Jan 18, 2020

Uh oh!

iherman commented Jan 24, 2020

Uh oh!

iherman commented Jan 24, 2020

Uh oh!

gkellogg commented Jan 28, 2020

Uh oh!

kasei commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020 •

edited

Loading

Uh oh!

kasei commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020

Uh oh!

Normalize language tag component of i18n datatype IRI? #337

Normalize language tag component of i18n datatype IRI? #337

Comments

kasei commented Jan 17, 2020

gkellogg commented Jan 18, 2020

Uh oh!

iherman commented Jan 24, 2020

Uh oh!

iherman commented Jan 24, 2020

Uh oh!

gkellogg commented Jan 28, 2020

Uh oh!

kasei commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasei commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020

Uh oh!

gkellogg commented Jan 29, 2020 •

edited

Loading