Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Dump sphinx domains docstrings #74

Closed

Conversation

dojutsu-user
Copy link
Member

@dojutsu-user dojutsu-user commented Jul 21, 2019

This is working correctly on local.
Tested on projects:

  • docs
  • kuma (no file was generated because there were no sphinx domain objects except std:label and std:doc)
  • sphinx docs
  • notfound
  • requests docs

In this json file, we will be having id and content of the sphinx domains.
We can then load the file in our tasks.py and get the content of the sphinx domains from its anchor property (which comes from objects.inv fle).

Ref docs: https://www.sphinx-doc.org/en/master/extdev/nodes.html#nodes-for-domain-specific-object-descriptions

Related PR: readthedocs/readthedocs.org#5979

@dojutsu-user
Copy link
Member Author

dojutsu-user commented Jul 21, 2019

docutils released a new version 0.15, and it is giving import error in python 2.7.

@ericholscher
Copy link
Member

ericholscher commented Jul 22, 2019

Hrm, were you not able to get the text content out of the domain objects in Sphinx themselves? If we're just parsing the HTML directly, we could likely be doing that just from the HTML output and the pyquery approach we are already taking in search indexing.

Currently, this will only let us parse content for docs that are rebuilt. If we are doing HTML parsing, we can do it in the existing search code with existing HTML, which will work for all built docs.

@dojutsu-user
Copy link
Member Author

@ericholscher
No -- unfortunately, I couldn't find the data in domain objects in Sphinx.

@dojutsu-user
Copy link
Member Author

dojutsu-user commented Jul 22, 2019

@ericholscher

Currently, this will only let us parse content for docs that are rebuilt. If we are doing HTML parsing, we can do it in the existing search code with existing HTML, which will work for all built docs.

But we need to parse the content before creating sphinx objects, I am thinking to store the docstrings with this step -- https://github.com/readthedocs/readthedocs.org/blob/70494250385978e72f788ea7e62225e0aaaa5186/readthedocs/projects/tasks.py#L1449

we may have trouble figuring out which docstrings belongs to which domain if we parse the content with parse_json.py logic.

Also -- a project needs to be rebuild to index domain objects properly. So, I think we can go in this direction.

@ericholscher
Copy link
Member

ericholscher commented Jul 22, 2019

Is there an approach where we can dump the data we need here, mapping the names to their domain and anything else we need. Then we can parse it in parse_json.py, and add the additional data if we have it? That is the approach we're taking w/ other data, and seems like the best outcome, since we can support all built docs, but then make rebuilds better 👍

@dojutsu-user
Copy link
Member Author

@ericholscher
I think, this can be achieved by readthedocs/readthedocs.org#5979

@dojutsu-user
Copy link
Member Author

I am thinking that this PR might not be needed here.
Will update the status.

@dojutsu-user
Copy link
Member Author

I think this is not needed anymore.
Closing on readthedocs/readthedocs.org#5979.

@dojutsu-user dojutsu-user deleted the dump-more-sphinx-data branch July 24, 2019 15:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants