-
Notifications
You must be signed in to change notification settings - Fork 35
Increase the amount of data we're saving during build #62
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this will be used and where. In the end we are generating a .json
file here with paths
, types
and titles
but I can't give good feedback here because I'm lacking context.
readthedocs_ext/readthedocs.py
Outdated
build_json = os.path.abspath( | ||
os.path.join(app.outdir, '..', 'json') | ||
) | ||
outjson = os.path.join(build_json, 'readthedocs-sphinx-domain-names' + '.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +
can be removed.
It's used in readthedocs/readthedocs.org#5290 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks correct for me. We should write a test like this one at least https://github.com/rtfd/readthedocs-sphinx-ext/blob/f1a01c51c675d36ac365162ea06814544c2aa410/tests/test_integration.py#L65-L73
] | ||
# Only run JSON output once during HTML build | ||
# This saves resources and keeps filepaths correct, | ||
# because singlehtml filepaths are different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, search isn't supported in singlhtml builders :)
readthedocs_ext/readthedocs.py
Outdated
|
||
paths = {} | ||
for page, title in app.env.titles.items(): | ||
paths[app.builder.get_target_uri(page)] = app.env.doc2path(page, base=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge this with the other for loop, but I guess this is more readable, and not so much difference in the performance with our big servers p:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to merge the loops as well. We can also calculate the get_target_uri
once only.
Is this PR still relevant? If so, what's missing? |
Yes, I just need to finish testing it. |
Alright, this should be ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good to me!
To be able to give better feedback, I'd like to know where this will be used (and how) so I can check that we have the data and the structure we need.
Finally, the code is hard to follow because there are some Sphinx internals that are not very explicit. So, I'd like to see some comments next to functions like app.env.doc2path
or objects like app.env.domains.name
/app.env.domains.type
with an example of the expected result.
readthedocs_ext/readthedocs.py
Outdated
|
||
paths = {} | ||
for page, title in app.env.titles.items(): | ||
paths[app.builder.get_target_uri(page)] = app.env.doc2path(page, base=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to merge the loops as well. We can also calculate the get_target_uri
once only.
'_build/json/readthedocs-sphinx-domain-names.json', | ||
[ | ||
'py:exception', 'js:class', | ||
'"index.html": "Welcome to pyexample', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can go more strict here and compare against the full JSON file if it's not incredibly big, or at least a portion of it.
Reading the test it's hard to realize what we are expecting to be the content of that file. From the Python function it seems there is a specific structure containing types
, titles
and paths
.
build_json = os.path.abspath( | ||
os.path.join(app.outdir, '..', 'json') | ||
) | ||
outjson = os.path.join(build_json, 'readthedocs-sphinx-domain-names.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to get this path could be moved to a function (on utils or similar) since we are repeating it from generate_json_artifacts
.
readthedocs_ext/readthedocs.py
Outdated
log.exception( | ||
'Failure in JSON search dump for page {page}'.format(page=outjson) | ||
) | ||
|
||
|
||
def dump_sphinx_data(app, exception): | ||
""" | ||
Dump a bunch of additional Sphinx data that is useful during search indexing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring could be expanded with the output structure expected of the dict mentioning what is each field and some examples about how are the values (paths --to see if they are relative/absolute, etc) for them.
log.exception( | ||
'Failure in JSON search dump for page {page}'.format(page=outjson) | ||
) | ||
|
||
|
||
def dump_sphinx_data(app, exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should handle the case where exception
is not None
and just return
.
Nevermind. I found the exact place: |
As a general question, why don't we save everything that we need in our own JSON created by this extension instead of mixing some data from here and some data parsed from It seems that we are processing the data in two different places when we can do it only in this extension and then only use it from the Django App to create the Also, I didn't find where the |
I'd like to get |
Agreed. This is probably a better path forward. I will think about if we should do it now, or later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enough to be merged and start using it :)
We can improve it later if we want. In particular the processing of the data in multiple places. Also, there are some refactor that can be done.
The main thing w/ parsing the data multiple places is all the old repos that we need to support. This will only work for indexing going forward, but objects.inv will always exist. |
This is used for better search indexing