Dump sphinx domains docstrings #74

dojutsu-user · 2019-07-21T16:09:22Z

This is working correctly on local.
Tested on projects:

docs
kuma (no file was generated because there were no sphinx domain objects except std:label and std:doc)
sphinx docs
notfound
requests docs

In this json file, we will be having id and content of the sphinx domains.
We can then load the file in our tasks.py and get the content of the sphinx domains from its anchor property (which comes from objects.inv fle).

Ref docs: https://www.sphinx-doc.org/en/master/extdev/nodes.html#nodes-for-domain-specific-object-descriptions

Related PR: readthedocs/readthedocs.org#5979

dojutsu-user · 2019-07-21T17:19:43Z

docutils released a new version 0.15, and it is giving import error in python 2.7.

ericholscher · 2019-07-22T17:47:05Z

Hrm, were you not able to get the text content out of the domain objects in Sphinx themselves? If we're just parsing the HTML directly, we could likely be doing that just from the HTML output and the pyquery approach we are already taking in search indexing.

Currently, this will only let us parse content for docs that are rebuilt. If we are doing HTML parsing, we can do it in the existing search code with existing HTML, which will work for all built docs.

dojutsu-user · 2019-07-22T17:49:46Z

@ericholscher
No -- unfortunately, I couldn't find the data in domain objects in Sphinx.

dojutsu-user · 2019-07-22T19:57:54Z

@ericholscher

Currently, this will only let us parse content for docs that are rebuilt. If we are doing HTML parsing, we can do it in the existing search code with existing HTML, which will work for all built docs.

But we need to parse the content before creating sphinx objects, I am thinking to store the docstrings with this step -- https://github.com/readthedocs/readthedocs.org/blob/70494250385978e72f788ea7e62225e0aaaa5186/readthedocs/projects/tasks.py#L1449

we may have trouble figuring out which docstrings belongs to which domain if we parse the content with parse_json.py logic.

Also -- a project needs to be rebuild to index domain objects properly. So, I think we can go in this direction.

ericholscher · 2019-07-22T22:13:52Z

Is there an approach where we can dump the data we need here, mapping the names to their domain and anything else we need. Then we can parse it in parse_json.py, and add the additional data if we have it? That is the approach we're taking w/ other data, and seems like the best outcome, since we can support all built docs, but then make rebuilds better 👍

dojutsu-user · 2019-07-23T09:26:05Z

@ericholscher
I think, this can be achieved by readthedocs/readthedocs.org#5979

dojutsu-user · 2019-07-23T14:02:30Z

I am thinking that this PR might not be needed here.
Will update the status.

dojutsu-user · 2019-07-24T15:48:55Z

I think this is not needed anymore.
Closing on readthedocs/readthedocs.org#5979.

dump more sphinx domains data for indexing

54ef844

dojutsu-user added 2 commits July 23, 2019 14:23

don't dump pagename in the json

1f99868

update comment

9789d13

dojutsu-user mentioned this pull request Jul 23, 2019

Index more domain data into elasticsearch readthedocs/readthedocs.org#5979

Merged

dojutsu-user closed this Jul 24, 2019

dojutsu-user deleted the dump-more-sphinx-data branch July 24, 2019 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dump sphinx domains docstrings #74

Dump sphinx domains docstrings #74

Uh oh!

dojutsu-user commented Jul 21, 2019 •

edited

Loading

Uh oh!

dojutsu-user commented Jul 21, 2019 •

edited

Loading

Uh oh!

ericholscher commented Jul 22, 2019 •

edited

Loading

Uh oh!

dojutsu-user commented Jul 22, 2019

Uh oh!

dojutsu-user commented Jul 22, 2019 •

edited

Loading

Uh oh!

ericholscher commented Jul 22, 2019 •

edited

Loading

Uh oh!

dojutsu-user commented Jul 23, 2019

Uh oh!

dojutsu-user commented Jul 23, 2019

Uh oh!

dojutsu-user commented Jul 24, 2019

Uh oh!

Uh oh!

Dump sphinx domains docstrings #74

Dump sphinx domains docstrings #74

Uh oh!

Conversation

dojutsu-user commented Jul 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dojutsu-user commented Jul 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericholscher commented Jul 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dojutsu-user commented Jul 22, 2019

Uh oh!

dojutsu-user commented Jul 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericholscher commented Jul 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dojutsu-user commented Jul 23, 2019

Uh oh!

dojutsu-user commented Jul 23, 2019

Uh oh!

dojutsu-user commented Jul 24, 2019

Uh oh!

Uh oh!

dojutsu-user commented Jul 21, 2019 •

edited

Loading

dojutsu-user commented Jul 21, 2019 •

edited

Loading

ericholscher commented Jul 22, 2019 •

edited

Loading

dojutsu-user commented Jul 22, 2019 •

edited

Loading

ericholscher commented Jul 22, 2019 •

edited

Loading