Skip to content

Update documentLoader to return parsed document #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gkellogg opened this issue May 2, 2019 · 1 comment
Closed

Update documentLoader to return parsed document #85

gkellogg opened this issue May 2, 2019 · 1 comment

Comments

@gkellogg
Copy link
Member

gkellogg commented May 2, 2019

The documentation for documentLoader documentation is a bit lax, but it seems to provide for retrieving processed documents.

In the API, the documentLoader option is of type DocumentLoaderCallback, which is a Promise<USVString> (USVString url);. Adding an optional options parameter can allow to provide more information, such as for use in processing contexts, or when extractAllScripts is specified. But, it indicates that the promise returns a USVString.

We also define a RemoteDocument type, which is really what the documentLoader promise should result in. RemoteDocument has a document attribute of any, and is stated to either return the raw payload or parsed document. It would be useful for us to limit this to the parsed document, which would allow us to encapsulate more of the HTML processing within this definition (see w3c/json-ld-syntax#167 (comment)). But, this might be considered breaking the 1.0 API contract.

The Context Processing Algorithm doesn't explicitly describe using the documentLoader promise to load remote contexts, although it is implicit in the description of the documentLoader. This is somewhat complicated by the attempt to keep all WebIDL information, such as promises, out of the algorithms themselves, which remains a challenging aspect.

With suitable wording, I propose that we move all of the HTML-specific processing rules into the definition of RemoteDocument and suitably parameterize calls to documentLoader (including within the Context Processing Algorithm) to handle the various cases, and result in the internal JSON representation. We can handle the 1.0 contract, which allows an external information to simply return the raw document, by wrapping the callback in code which detects this, and provides it's own processing of the results.

@gkellogg
Copy link
Member Author

PR #87 reviewed by @dlongley. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants