-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add modeling for intersphinx data #5289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This removes the second entry point to site search, and unifies all our searching to use fancy facets and be nice.
I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think? |
readthedocs/domaindata/models.py
Outdated
) | ||
version = models.ForeignKey(Version, verbose_name=_('Version'), | ||
related_name='domain_data') | ||
modified_date = models.DateTimeField(_('Publication date'), auto_now=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could start using this standard #4864
readthedocs/domaindata/models.py
Outdated
version = models.ForeignKey(Version, verbose_name=_('Version'), | ||
related_name='domain_data') | ||
modified_date = models.DateTimeField(_('Publication date'), auto_now=True) | ||
commit = models.CharField(_('Commit'), max_length=255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? I think we already have this from the version 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a file can be removed from a specific commit, and we need to track this to delete it.
readthedocs/domaindata/models.py
Outdated
''' | ||
|
||
@property | ||
def doc_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the correct name, I wasn't able to find something that could be used instead. I mean, this is just the full object not, the type of document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function::
or py:function:'spam'
.
Maybe "role_name" is a better name.
for name, einfo in sorted(invdata[key].items()): | ||
url = einfo[2] | ||
if '#' in url: | ||
doc_name, anchor = url.split('#') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use urlparse https://docs.python.org/3/library/urllib.parse.html, but the logic here is very simple, so not really sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, feels unnecessary. It's also not a full URL, so it might be confused by it.
@safwanrahman Can you say a bit more about this? They aren't representing a file that we have on disk, so I don't think it's a good match. It could be useful to optionally have them point to an HTMLFile on the model, but I'm not sure what the value would be? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and I think this has potential. Although, I'm not 100% sure how or where you are thinking to use this.
I left some Python style comments that I think can be considered.
As I understand this PR, we are converting the objects.inv
into data on our db. If I'm correct, won't this be a huge amount of data going to our db that could potentially cause an issue?
readthedocs/domaindata/admin.py
Outdated
from .models import DomainData | ||
|
||
|
||
class DomainDataAdmin(admin.ModelAdmin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DomainData
name conflicts with our Domain
model to me.
Considering that we are talking about Sphinx Domains here, I'd name all of these ones (including the Django App) as SphinxDomain
. I think it will be better to get the meaning immediately when re reading about this.
readthedocs/domaindata/models.py
Outdated
|
||
project = models.ForeignKey( | ||
Project, | ||
related_name='domain_data', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we will have project.domains
and project.domain_data
. I think it will be clearer as project.domains
(current relationship) and project.sphinx_domains
(or project.sphinxdomains
), IMO.
readthedocs/domaindata/models.py
Outdated
def __str__(self): | ||
return f''' | ||
DomainData [{self.project.slug}:{self.version.slug}] | ||
[{self.domain}:{self.type}] {self.name} -> {self.doc_name}#{self.anchor} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be wrapped.
readthedocs/domaindata/models.py
Outdated
''' | ||
|
||
@property | ||
def doc_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function::
or py:function:'spam'
.
Maybe "role_name" is a better name.
readthedocs/domaindata/models.py
Outdated
return f'{self.domain}:{self.type}' | ||
|
||
@property | ||
def doc_url(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd name it as docs_url
to follow the same pattern that we have in other places.
readthedocs/projects/tasks.py
Outdated
invdata = intersphinx.fetch_inventory(MockApp(), '', object_file) | ||
for key in sorted(invdata or {}): | ||
domain, _type = key.split(':') | ||
for name, einfo in sorted(invdata[key].items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it clearer to use .items()
in the first loop here?
for key, value in invdata.items():
domain, _type = key.split(':')
for name, einfo in value:
readthedocs/projects/tasks.py
Outdated
log.warning('Sphinx MockApp: %s', msg) | ||
|
||
invdata = intersphinx.fetch_inventory(MockApp(), '', object_file) | ||
for key in sorted(invdata or {}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these two loops need to be sorted
?
for key in sorted(invdata or {}): | ||
domain, _type = key.split(':') | ||
for name, einfo in sorted(invdata[key].items()): | ||
url = einfo[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a small comment with previous to this line, with the expected structure of this einfo
object. Like
# (key, name, url, doc type, etc)
# ('py', 'function', '/some-nice-url/python-function/', 'func', ...)
or... a link to the docs.
I found debugging this complicated later.
readthedocs/projects/tasks.py
Outdated
else: | ||
doc_name, anchor = url, '' | ||
display_name = einfo[3] | ||
obj, _ = DomainData.objects.get_or_create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fail the first time because commit
is not passed here and it's not marked as null=True
.
readthedocs/domaindata/api.py
Outdated
'name', | ||
'display_name', | ||
'doc_type', | ||
'doc_url', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is the most important field we want for the use case you mentioned, as we will want to embed the section when hovering with the mouse or something like that, right?
@ericholscher If each of the domain data could pointed to a HTMLFile object, we can index the |
@@ -364,7 +364,7 @@ def setUp(self): | |||
'api_webhook_gitlab': {'status_code': 405}, | |||
'api_webhook_bitbucket': {'status_code': 405}, | |||
'api_webhook_generic': {'status_code': 403}, | |||
'domaindata-detail': {'status_code': 404}, | |||
'sphinx_domains-detail': {'status_code': 404}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't matter too much, but to keep consistency here with the other endpoints, I'd say that it should be called sphinxdomains-detail
(without the _
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is autogenerated by REST framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes... but it comes from the register
: https://github.com/rtfd/readthedocs.org/pull/5289/files/021e12d9472f3e4cb1f7f69413e0af0e25808c86..ab7a30cf1cb40012ce71704824723aa5c1597b67#diff-48d1599aaf2a6be451a9f14bdf0cd81aR38
I should have commented that other line instead :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It also shouldn't be plural. Fixed.
Would love to get this merged for deploy this week as well, if folks have time to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ready to merge to me.
Just wanted to re-raise my comment about consistency on the DRF URLs. I'm not blocking on that, anyway but I think it worth to change it if possible.
@@ -364,7 +364,7 @@ def setUp(self): | |||
'api_webhook_gitlab': {'status_code': 405}, | |||
'api_webhook_bitbucket': {'status_code': 405}, | |||
'api_webhook_generic': {'status_code': 403}, | |||
'domaindata-detail': {'status_code': 404}, | |||
'sphinx_domains-detail': {'status_code': 404}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes... but it comes from the register
: https://github.com/rtfd/readthedocs.org/pull/5289/files/021e12d9472f3e4cb1f7f69413e0af0e25808c86..ab7a30cf1cb40012ce71704824723aa5c1597b67#diff-48d1599aaf2a6be451a9f14bdf0cd81aR38
I should have commented that other line instead :)
Co-Authored-By: ericholscher <[email protected]>
This adds modeling for Intersphinx data. It will be automatically updated on build, so that we have it indexed. The goal here will be to add this to search results, but I broke up the PR's to make them easier to review and merge.
Closes #4767