Skip to content

JSON-LD in HTML #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 16, 2018
Merged

JSON-LD in HTML #68

merged 12 commits into from
Nov 16, 2018

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Sep 23, 2018

Update the JSON-LD in HTML section to be normative, describe dataset extraction, how to deal with multiple script elements and script element targeting using fragments.

I took some license to describe targeting specific script element using fragments and on the treatment of embedded escape sequences and comments.

Fixes #57.

This should not be merged before appropriate changes made to json-ld-api.


Preview | Diff

@gkellogg gkellogg requested a review from iherman September 23, 2018 23:30
@gkellogg gkellogg changed the title JSON LD in HTML JSON-LD in HTML Sep 23, 2018
@gkellogg gkellogg requested a review from BigBlueHat September 24, 2018 16:05
@gkellogg gkellogg force-pushed the json-ld-in-html branch 5 times, most recently from 087b6ef to 64fe47a Compare September 26, 2018 19:08
Copy link
Member

@BigBlueHat BigBlueHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this seems great, but I raised some concerns/questions/tweaks inline. Thanks for tackling this, @gkellogg!

@gkellogg
Copy link
Member Author

Rebasing, so inline comments may be lost.

gkellogg and others added 4 commits November 5, 2018 12:36
…extraction, how to deal with multiple script elements and script element targeting using fragments.

Fixes #23 and fixes #57.
…e about prospect of dynamically changing base. This reflects discussion from #23 (comment) and resulting TAG advice w3ctag/design-reviews#312 (comment).
Copy link
Member

@BigBlueHat BigBlueHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should wait on some of this until we've further discussed the potential opportunity of the TAG's review alongside the risk of prematurely closing #23 without revisiting it in light of the TAG's review and suggested explorations.

Copy link
Member

@BigBlueHat BigBlueHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to merging this work. Hopefully with this merged, we can encourage implementation--which should help more discussions around actual usage scenarios beyond just the Schema.org one(s).

@BigBlueHat
Copy link
Member

Actually...before we merge this work, are we certain that the 4.12.1.3. Restrictions for contents of script elements apply to data blocks?

That "restrictions" section only seems to pertain to writing <script> content intended to be parsed by the browser's JS engine. For example, there's no reference from the "data block" examples to this "restrictions" section (and vice versa).

Thoughts?

@gkellogg
Copy link
Member Author

The restriction is on the contents of a script element, data blocks are a subset of all script elements. I don't follow how you infer that the section doesn't apply to data blocks. It would seem self-evident that the contents of data blocks need to have provisions for making sure that the content of the data block could be properly escaped.

In any case, these rules are necessary for properly processing JSON-LD scripts.

@BigBlueHat
Copy link
Member

BigBlueHat commented Nov 12, 2018

@gkellogg apologies...you're right of course... I'd forgotten I'd tested all that earlier.

Toss this into http://jsbin.com/ or similar to see how it breaks:

  <script type="application/ld+json" id="stuff">
  {
    "@context": {"@vocab": "http://example.com/"},
    "widget": "var example = 'Consider this string: <!-- <script>';console.log(example);"
  }
  </script>
  <script>
    <!-- whatever's below here won't run because what's above here is busted... T_T -->
    console.log(stuff.textContent);
  </script>

Escaping things as you've defined does fix it....sad...but true. 😩

@BigBlueHat
Copy link
Member

...hence all the red highlighting added by the Markdown parser... 🤕

@BigBlueHat
Copy link
Member

In any case, these rules are necessary for properly processing JSON-LD scripts.

They're actually needed to not break the rest of the HTML and/or DOM tree afaict. So, it's an HTML parsing requirement that's "infecting" the JSON(-LD) (or JS or YAML or...) when embedded...

@gkellogg gkellogg mentioned this pull request Nov 15, 2018
@gkellogg
Copy link
Member Author

This issue was discussed in a meeting.

  • Resolution #2: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized
View the transcript What is ‘base’ for embedded json-ld?

Ivan Herman: link: #23

Ivan Herman: Link of the PR: #93

Benjamin Young: #68

Gregg Kellogg: w3c/json-ld-api#50

Benjamin Young: we discussed that one at tpac
… we sent it in for TAG review, and they basically widened the scope

Gregg Kellogg: there are 2 open PRs
… 1) basic support for json-ld in html
… 2) PR-93 adds text to specifically add text to add html as base
… in the API spec, it’s PR-50

Adam Soroka: quick question, what are we expected to do with their comments?
… shall we respond?

Ivan Herman: what they propose is interesting but beyond our charter
… this would elevate json-ld
… but yeah beyond our charter
… I would say this is something the CG has to pick up
… and we can cross the bridge at some time, but if this is realistic from a manpower perspective I dont know

Benjamin Young: #68 (comment)

Ivan Herman: regarding the PR-93, there is some stuff about having XML

Benjamin Young: the thing I just linked shows how script tags affect html parsing
… it’s syntactically correct json-ld

Gregg Kellogg: what I did in the PR-68 I call out specifics on how to handle those blocks if the media type is application/json
… I think I’ve taken in the specifics on how content of script tags has to be handled and adjusted in for our needs
… we asked specific questions to TAG, got an answer but they kinda got a bit over enthusiastic
… out of this needs to come something that improves web platform

Benjamin Young: the HTML comments stuff as really bothered me since I’ve read it
… but it seems to primarily affect only HTML parsing
… question is how much of this we need to have in the spec
… json-ld in script tags vs “raw” json-ld
… both have totally different escaping rules and what not.. and none of that has something to do with html base

Ivan Herman: for the comment storing, the whole section is a normative thing
… I have the impression this is an HTML problem
… which we should certainly mention, but maybe not as part of a normative section

Benjamin Young: HTML5 spec level text about parsing <script> tags https://www.w3.org/TR/html5/semantics-scripting.html#restrictions-for-contents-of-script-elements

Ivan Herman: we should officially answer to the TAG and will officially add to the standard what they said about base
… and that we try to get the CG involved

Adam Soroka: +1

Gregg Kellogg: comments in html and escaping.. it depends on the encoding
… we don’t need to give guidance on how to handle this

Ivan Herman: https://pr-preview.s3.amazonaws.com/w3c/json-ld-syntax/pull/68.html#ex-103-embedding-json-ld-in-html-with-comments

Ivan Herman: it has to be valid json-ld
… that’s invalid json

Gregg Kellogg: that’s something you see quite often

Benjamin Young: ivan: because https://www.w3.org/TR/html5/semantics-scripting.html#restrictions-for-contents-of-script-elements

Gregg Kellogg: comments are often used just to make sure there are no other issues embedded in the script elements that would cause any issues

Benjamin Young: I did quite some digging on that issue
… the DOM parsing is only concerned with what’s inside the tags
… it’s just treated as raw string
… if there’s something inside the json-ld an html parser would choke on
… the json-ld would need to be treated in such a way such that an html parser wouldn’t choke on it

Pierre-Antoine Champin: one crazy idea by looking at the json-ld embedded in html comments: you could add a js comment in front of the html comment, making it valid javascript
… it could become technically correct

Benjamin Young: sadly it wouldn’t
… it would continue being parsed
… we could make it our own parsing space
… the question is how far the parser gets before it finds the ending script tag

Gregg Kellogg: the json-ld would not be allowed to contain anything that could be interpreted as html and/or html comments
… not really feasible and also not helping our mission
… I’ve outlined multiple approaches to tackle this
… there are only a few cases where json-ld would contain things that resemble comments

Harold Solbrig: why is this an json-ld issue but not a javascript issue?

Gregg Kellogg: [explains why it isn’t]

Gregg Kellogg: it did some test cases for this, exploring corner cases we know of
… I don’t know how to move forward unless addressing at least some of the stuff from the PR

Gregg Kellogg: it describes script tags and data blocks are a subset

Benjamin Young: what’s breaking it, is the potential of one to too early close the script tag
… so this would need somehow being taken care of
… the risk is, the json-ld could contain content that jacks up the html it’s contained in

Ivan Herman: is it so horrible to say, if I put json-ld in a script tag I’m supposed to escape anything that html would need to have escaped
… thus a json-ld parser would have to do the unescaping
… but you are in HTML regardless.. so

Gregg Kellogg: for someone who’s actually looking at the source, those entities become rather annoying

Ivan Herman: realistically, I don’t know how often this would happen

Benjamin Young: the escaping issue is very similar of putting json-ld inside a text env.

Ivan Herman: I think it’s perfectly reasonable to accept both PRs, close the issue
… and open a new issue on the specific problem

Gregg Kellogg: it’s a editor’s draft not a working draft

Ivan Herman: we would open a issue right away

Benjamin Young: I would only +1 this, if we add a big red AT RISK disclaimer

Ivan Herman: a lot of very important things are pending for now
… I think it’s an edge case

Adam Soroka: I don’t think we should use a phrase like “AT RISK” but more something along the lines of “will be part of the final spec but might undergo some changes”

Ivan Herman: we cannot commit ourselves to having always consistent editor’s drafts

Benjamin Young: I’m not sure we have reached consensus on all the things contained

Gregg Kellogg: I cannot work on other open issues

Pierre-Antoine Champin: what about a parameter on the media type hinting at having to do unescaping? (like application/ld+json;escaped=html)

Proposed resolution: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized (Benjamin Young)

Benjamin Young: +1

Adam Soroka: +1

Ivan Herman: what does “that” mean?

Gregg Kellogg: +1

Benjamin Young: I don’t want to have stuff merged without reaching consensus
… but I could live with having it marked as being “at risk” or similar

Ivan Herman: putting things that are already done “at risk” would be going backwards
… opening a new issue that highlights things that are still being discussed would be ok
… having the feeling that ~90% are done

Adam Soroka: I have to generally agree with ivan

Ivan Herman: +1

Adam Soroka: it seems for me very unlikely that we would stop talking about it

David Newbury: +1

Harold Solbrig: +1

Benjamin Young: I’m fine with merging those

Simon Steyskal: +1

Resolution #2: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized

Pierre-Antoine Champin: +1

@gkellogg gkellogg merged commit 155b409 into master Nov 16, 2018
@gkellogg gkellogg deleted the json-ld-in-html branch November 16, 2018 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants