Skip to content

RDF Deserialization and @index #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vcharpenay opened this issue Feb 22, 2019 · 19 comments
Closed

RDF Deserialization and @index #65

vcharpenay opened this issue Feb 22, 2019 · 19 comments

Comments

@vcharpenay
Copy link

vcharpenay commented Feb 22, 2019

The use case I am going to present comes from the Web of Things W3C WG. Its main deliverable is the Thing Description model, for which we would like to define a JSON-LD serialization.

Currently, we have a pre-processing algorithm that adds @id, @type and @context keys to model elements. The model includes a number of indexed definitions, for which @index works just fine. However, it happens that @index keys are also used as identifiers for the definition they map to and we would like to be able to query them among a collection of TD documents stored in an RDF store. For instance, assuming the following TD document:

{
  "@context": "http://www.w3.org/ns/td",
  "@type": "Thing",
  "properties": {
    "temperature": { "type": "number" },
    "onOff": { "type": "boolean" }
  }
}

we'd like the following query to return something like "temperature" instead of a blank node:

select ?index where {
  ?thing a td:Thing ;
      td:properties ?index .
  ?index a jsonschema:NumberDataSchema .
}

In other words, we'd like to have a way to keep @index keys in the RDF graph after JSON-LD deserialization. As far as I know, it is currently not possible. We thought of two possible approaches: ID indexing or extended RDF deserialization.

ID indexing

Properties in the example above could be indexed by @id and not by @index. However, this solution comes with 2 issues: first, we would need to define a @base for these identifiers (ideally the root node's @id0 and, second, we have stumbled many times upon the problem that identifiers collide because the same index key is used several times in the same TD document. Our current solution is to assign a unique JSON pointer to all indexed definitions. Can this approach be standardized?

Extended RDF deserialization

Index values could be included in the RDF graph, using some RDF term defined under the jsonld namespace. Something like:

_:thing a td:Thing ;
    td:properties _:p1, _:p2 .
_:p1 a jsonschema:NumberDataSchema ;
    jsonld:indexKey "temperature"^^xsd:string .
_:p2 a jsonschema:BooleanDataSchema ;
    jsonld:indexKey "onOff"^^xsd:string .

Is that something that is within the scope of JSON-LD 1.1? It would make the full transformation to RDF fully bidirectional, as far as the TD model is concerned.

(See also the related issue w3c/wot-thing-description#444.)

@pchampin
Copy link
Contributor

ID Indexing

How is it different from https://w3c.github.io/json-ld-syntax/#node-identifier-indexing ?

Extended RDF deserialization

This looks like a nasty can of worm. What if some bnodes miss the jsonld:indexKey? What if some have multiple values?

@vcharpenay
Copy link
Author

vcharpenay commented Feb 22, 2019

Thanks for your comment!

How is it different from https://w3c.github.io/json-ld-syntax/#node-identifier-indexing ?

yes, that is essentially it. I raised that issue to introduce @container: @id some time ago but it turned out it did not solve completely our use case.

This looks like a nasty can of worm. What if some bnodes miss the jsonld:indexKey? What if some have multiple values?

I agree it looks a bit esoteric, but the same rules would apply as when compacting nodes without @index member : if the triple is not present, the node get indexed by @none and if there is more than one node under the same index key, an array is instantiated. By the way, I was glad to see that this corner case was already addressed in JSON-LD 1.1 for pure compaction.

@gkellogg
Copy link
Member

I could see a toRdf option which would serialize @index as something like jsonld:index with a string value, which would require a reciprocal option to fromRdf to reverse it. As it is @index is the only syntactic element which is not described by the RDF model.

As it's simply an extra triple, we could conceivably do it without an option (or with an option defaulting to true), as generating more triples is typically not an issue for serializers.

@gkellogg
Copy link
Member

if the triple is not present, the node get indexed by @none and if there is more than one node under the same index key, an array is instantiated.

Note that this won't be the case for expansion on serialization to RDF, as the @none key is used only in the index-map case to index nodes without an @index.

As an alternate, we could re-consider the notion of indexing on an arbitrary property. There were no good use cases, and it would add a fair bit of complexity, so it was not followed up on.

You might also consider @type indexing, which allows you to use the vocabulary space for values. It would require assigning an appropriate type to each entity, but you can use scoped contexts to help manage that. See Note Type Indexing.

@iherman
Copy link
Member

iherman commented Mar 1, 2019

As it's simply an extra triple, we could conceivably do it without an option (or with an option defaulting to true), as generating more triples is typically not an issue for serializers.

I'd prefer it this way. An extra flag may become a drag, easy to forget and leading to unexpected errors. An extra triple isn't a big deal.

The only caveat: as far as I know, jsonld:index would be the only RDF resource appearing in the generated graph in a jsonld namespace. That may open the floodgates for other terms...

@azaroth42
Copy link
Contributor

I would prefer an option defaulting to True to reflect the index term into the graph. This way it can be turned off if needed, but to Ivan's point you don't need to remember to turn it on.

@pchampin
Copy link
Contributor

pchampin commented Mar 5, 2019

@iherman wrote:

The only caveat: as far as I know, jsonld:index would be the only RDF resource appearing in the generated graph in a jsonld namespace. That may open the floodgates for other terms...

I'm not a big fan of this either. Ideally, I would much rather have "a notion of indexing on an arbitrary property" as suggested by @gkellogg. Although this looks like a can of worm we may not want to open...

@gkellogg
Copy link
Member

gkellogg commented Mar 9, 2019

I'd say keep it simple, and issue such an extra triple. There are plenty of precedents for specs using their own namespace for creating such types or properties; of course an option could default to the jsonld:index IRI, and allow null or some other IRI, but that would seem to be overly flexible.

@davidlehn
Copy link
Contributor

Take into consideration the effect on canonicalization since triples are being added.

@iherman
Copy link
Member

iherman commented Mar 13, 2019

Take into consideration the effect on canonicalization since triples are being added.

which probably means that this should not be an optional feature.

@pchampin
Copy link
Contributor

The more I think about it, the more I think that the "indexing by arbitrary property" approach is better.

It could work like this:

{
  "@context": {
    "ex": "http://ex.co/",
    "foo": {
      "@id": "ex:foo",
      "@container": "@index",
      "@index": "ex:myIndex"  // <--- this is the proposed new feature
    }
  },
  "foo": {
    "bar": {
      "@id": "#BAR"
    },
    "baz": {
      "@id": "#BAZ"
    }
  }
}

This would expand to

[
  {
    "http://ex.co/foo": [
      {
        "@id": "https://json-ld.org/playground/#BAR",
        "http://ex.co/myIndex": "bar"  // <--- instead of "@index": "bar"
      },
      {
        "@id": "https://json-ld.org/playground/#BAZ",
        "http://ex.co/foo": "baz"   // <--- instead of "@index": "baz"
      }
    ]
  }
]

and so additional triples would be materialized, holding the index values.

This proposal does not add any keyword (we just reuse @index in a different place). It extends the current spec in a rather natural way, in my opinion. What I might underestimate is the amount of complexity it adds to the standard algoithms in the API document...

@iherman
Copy link
Member

iherman commented Mar 13, 2019

I like this. What this means is that the author would have to make one step further indicating (a) whether he/she wants to have the index appear in the generated RDF and (b) using what name.

@vcharpenay would that be a viable option for you?

@vcharpenay
Copy link
Author

Thanks for the discussion, I like the various suggestions without any particular preference. I must say, though, that @pchampin's proposal turns out to be very close to something I suggested some time ago: see json-ld/json-ld.org#430 (case 2).

However, @gkellogg already spent some time trying to implement it and it indeed comes with significant complexity, in particular w.r.t. compaction. See his comment.

@gkellogg
Copy link
Member

I agree that it would be better to allow indexing on arbitrary properties. When looking at it, I thought that it had added a fair amount of complexity to the expansion/compaction algorithms, but it may end up being simpler than that. Ultimately, we'll need to decide if the added complexity is worth the value.

Looking at @pchampin's proposal, it looks to me that specifying a value for @index in the term definition would be interpreted to mean use the value of that property (if any) instead of the value of @index when expanding and compacting, which also leaves some consideration if there are multiple such values.

@iherman
Copy link
Member

iherman commented Mar 14, 2019

Looking at @pchampin's proposal, it looks to me that specifying a value for @index in the term definition would be interpreted to mean use the value of that property (if any) instead of the value of @index when expanding and compacting

I think you are right that this may create a different perception. Maybe what we have to do is to define a new keyword for this. Re-using the same keyword for multiple different purposes has been a problem (for me at least) in JSON-LD, and we indeed should not add to the confusion.

But I do not think that is a fundamental problem with @pchampin's proposal, that is only the syntax aspect of it.

@pchampin
Copy link
Contributor

To be more precise, here's how I see things. If the term definition contains "@index": "ex:myIndex",

  • during compaction, instead of adding "@index": "index value", we add "http://example.org/mIndex": "index value"\;
  • during expansion, instead of looking for @index to determine the key to use, we look for http://example.org/myIndex.

(...) which also leaves some consideration if there are multiple such values.

We already deal with that issue with "@container": "@type". I suggest we do the same (take the first one).

I foresee another issue though: in expanded form, the values of an property are node objects. I suggest we only consider the @value of these node object (and ignore those who don't have a @value) for the purpose of indexing.

@iherman
Copy link
Member

iherman commented Mar 15, 2019

This issue was discussed in a meeting.

  • RESOLVED: Add a new keyword to be used with @container:@index to indicate the property to use instead of @index to solve WoT request
View the transcript 4.2. RDF Deserialization and @index
Benjamin Young: #65
Victor Charpenay: not much to add to the ticket
… we are working on a Thing Description which is intended to be JSON-LD
… especially with respect to transformation into RDF
… in the TD document, we make heavy us of @index
Kazuyuki Ashimura: -> https://w3c.github.io/wot-thing-description/#note-jsonld10-processing WoT Thing Description - Transformation to JSON-LD&RDF
Victor Charpenay: we want to be able to use SPARQL against the data, or roundtrip it
… the intention of the issue is to keep the indexed values in some way
… the activity on this issue is appreciated
Rob Sanderson: to summarize the proposals:
… indexing by id is one potential solution
… but it’s possible for a term to be used in different ways within the same document
Victor Charpenay: and that’s not a theoretical problem. we had that real problem as we implemented
… oracle ran into this problem in building an API
Rob Sanderson: another proposal; use a property within the graph like “indexkey” to maintain the info
… which would afford some degree for roundtripping
… and then a lot of discussion about which way to go
Gregg Kellogg: I originally thought we might allow an arbitrary IRI to be used in @container, which would have allowed arbitrary indexing
… but pchampin suggested we reuse @index
Dave Longley: +1 for “light touch” solutions
Pierre-Antoine Champin: my concern is that compaction would be more complex, but I know realize that gkellogg could speak to this
… would it make a big change if we just looked for a specific property for indexing, and not an arbitrary one
Gregg Kellogg: I think it’s similar enough to what we’re already doing with types to not be, but we’d have to try to be sure. E.g. what if it’s a node object vs. a value object?
… so you could lose other aspects (language, etc.)
Victor Charpenay: isn’t that dealt with in the compaction algorithm?
Gregg Kellogg: literals become value objects, everything with an id becomes a node object
Benjamin Young: https://w3c.github.io/json-ld-syntax/#dfn-node-object vs. https://w3c.github.io/json-ld-syntax/#dfn-value-object
Dave Longley: error if there is no “simple value”
Ivan Herman: I come to this as a user
… I am in favor of pchampin’s proposal because the other one would make this automatic
… whether or not I want it
… but pchampin’s idea makes the creator make this thing explicit
… in this case, that’s better and cleaner
… provided that the extra work is okay by WoT
Dave Longley: +1 to pchampin’s approach as well (more natural, no extra artifacts), and doesn’t break RDF canonicalization/signature/hashing issues.
Victor Charpenay: yes, to use pchampin’s approach we just create a new predicate
Rob Sanderson: do properties of this kind have semantics?
… or can we encode them using @nest?
… which gives us JSON structure but adds no RDF/graph structure
Victor Charpenay: a constant effort in the WG has been to make everything linkable
… so the property should be representable in something (RDF?)
Victor Charpenay: a Thing Description should be serializable in RDF
… everything should be kept, nothing should be “pure JSON”
Jeff Mixter: +1 to Rob’s concern about what the semantics of the properties in relation to the Thing
Ivan Herman: so @nest is not usable?
Victor Charpenay: correct
Rob Sanderson: in the ontology, what is the range of “properties”
Victor Charpenay: a singular “propertaffordance”
… ths is NOT a representation of the properties of the thing described
Rob Sanderson: so a Thing Desc could have separate properties/predicates/values, each of which has the same key, and each could have diff meanings?
Victor Charpenay: structurally that’s not possible
… because each key should point to an object
… but in RDF, yes, that would be possible
… and we want to keep the relationship with RDF
Benjamin Young: https://w3c.github.io/wot-thing-description/#property-serialization-json
Pierre-Antoine Champin: following on azaroth’s idea, would it make sense to say that temperature and on/off resolves to predicates that are subclasses of “property”?
… so that some kind of inference would provide the affordance?
… because my gut tells me that @nest is a good candidate here
Benjamin Young: the WoT is based in part on JSON Schema, so if we solve for this, we solve for the growing list of JSON Schema-based specs
… including OpenAPI, which brings a flock of JSON APIs specified in that way
… lots of collateral value
Victor Charpenay: in fact we intend to publish a vocab for describing JSON Schemas in RDF
Rob Sanderson: are there properties of the blank node that is the property’s property’s that would conflict with property’s of the thing itself
Victor Charpenay: you are suggesting to take the intermediate object as an individual
Rob Sanderson: yes. I’m trying to explore if there’s a reason that “properties” itself needs to be in there
Victor Charpenay: there is more to the Thing Desc. there are actions, there are events
… they cannot be merged
… into the root object
… conflicting keys would result
Rob Sanderson: e.g. an on/off action could be distinguished
Victor Charpenay: yes
Pierre-Antoine Champin: which is also a good reason not to use it as an @id map :)
Rob Sanderson: so we can’t use @nest? should we run with pchampin’s suggestion?
… thoughts?
Benjamin Young: the return trip from RDF is inhibited
Ivan Herman: +1 to bigbluehat
Benjamin Young: @nest would mean that you’ve lost that properties space
… you’d have to sort that out outside the compaction algo
Gregg Kellogg: @nest, because it has no semantics, can’t be used here
… so we’re back to indexing arbitrary properties, and pchampin’s idea is the cleanest
Ivan Herman: all in favor of pchampin’s idea
… but let me throw in some bikeshedding
… should we use the syntax as proposed term or not?/
… we may have to look at whether we need a fresh keyword
Pierre-Antoine Champin: I really don’t see how it is misleading… but we can discuss this later, definitely :)
Gregg Kellogg: I don’t think it’s misleading, either.
Rob Sanderson: pchampin, will you make a proposal?
Pierre-Antoine Champin: I propose adding a new keyword in the term dfns to be used with @container : @index to be used to specify an arbitrary property on which to index
Proposed resolution: Add a new keyword to be used with @container:@index to indicate the property to use instead of @index to solve WoT request (Rob Sanderson)
Rob Sanderson: +1
Dave Longley: +1
Adam Soroka: +1
Jeff Mixter: +1
Gregg Kellogg: +1
Harold Solbrig: +1
Ivan Herman: +1
Ruben Taelman: +1
Pierre-Antoine Champin: +1
Benjamin Young: +
Benjamin Young: +1
David I. Lehn: +1
Resolution #1: Add a new keyword to be used with @container:@index to indicate the property to use instead of @index to solve WoT request

@pchampin
Copy link
Contributor

FTR: this issue is addressed by w3c/json-ld-syntax#145 and #74

@gkellogg
Copy link
Member

Closing as a duplicate of #74 and w3c/json-ld-syntax#145.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants