Skip to content

Add modeling for intersphinx data #5289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 27, 2019
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
83c6f40
Remove old test references to intersphinx
ericholscher Oct 23, 2018
db5fbd8
Initial commit of domains app
ericholscher Oct 23, 2018
c2cd164
Merge branch 'allow-local-files-in-dev' into intersphinx-modeling
ericholscher Oct 24, 2018
41b2857
Add initial domaindata modeling and API integration
ericholscher Oct 30, 2018
806d3c2
Merge remote-tracking branch 'origin/master' into intersphinx-modeling
ericholscher Oct 30, 2018
0f82c26
Add indexing of DomainData to the builds.
ericholscher Oct 30, 2018
639b219
Merge remote-tracking branch 'origin/master' into intersphinx-modeling
ericholscher Jan 31, 2019
67d909a
Merge branch 'readd-search-signals' into intersphinx-modeling
ericholscher Jan 31, 2019
537c66a
Cleanup of some domain stuff
ericholscher Jan 31, 2019
647ae3f
Add search to Domains :tada:
ericholscher Jan 31, 2019
5b7d599
Merge branch 'readd-search-signals' into intersphinx-modeling
ericholscher Jan 31, 2019
1dfabe8
Add initial search implementation to DomainData
ericholscher Jan 31, 2019
08296ae
Move project search to a subset of site search
ericholscher Feb 1, 2019
1e087fb
Small cleanup around faceting
ericholscher Feb 1, 2019
12bdf04
Limit resuts by DomainData type
ericholscher Feb 4, 2019
8e015c9
Merge remote-tracking branch 'origin/master' into intersphinx-modeling
ericholscher Feb 13, 2019
444e739
Remove old domains app
ericholscher Feb 13, 2019
1672421
Remove search changes
ericholscher Feb 13, 2019
462545e
Missed a few
ericholscher Feb 13, 2019
f53e589
Missed a bit more
ericholscher Feb 13, 2019
98aae46
remove random bits that got merged in
ericholscher Feb 13, 2019
c934046
Undo url fix
ericholscher Feb 14, 2019
96fa45a
Fix linting issues
ericholscher Feb 14, 2019
d8d75d5
Merge remote-tracking branch 'origin/master' into intersphinx-modeling
ericholscher Feb 20, 2019
09b3c80
Fix tests and linting
ericholscher Feb 20, 2019
0da9283
Review feedback
ericholscher Feb 20, 2019
1bc31d0
Fix lint
ericholscher Feb 20, 2019
97851c4
Move sphinx to a runtime requirement.
ericholscher Feb 20, 2019
021e12d
Use timestamped model for standardized class names
ericholscher Feb 20, 2019
ab7a30c
Address review feedback
ericholscher Feb 22, 2019
fa72754
Merge remote-tracking branch 'origin/master' into intersphinx-modeling
ericholscher Feb 27, 2019
4ea9121
Fix API name
ericholscher Feb 27, 2019
ec71993
Fix test too
ericholscher Feb 27, 2019
18877c1
Update readthedocs/projects/tasks.py
stsewd Feb 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
12 changes: 12 additions & 0 deletions readthedocs/domaindata/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
"""Domain Admin classes"""
from django.contrib import admin
from .models import DomainData


class DomainDataAdmin(admin.ModelAdmin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DomainData name conflicts with our Domain model to me.

Considering that we are talking about Sphinx Domains here, I'd name all of these ones (including the Django App) as SphinxDomain. I think it will be better to get the meaning immediately when re reading about this.

list_filter = ('type', 'project')
raw_id_fields = ('project', 'version')
search_fields = ('doc_name', 'name')


admin.site.register(DomainData, DomainDataAdmin)
28 changes: 28 additions & 0 deletions readthedocs/domaindata/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""Domain API classes"""

from rest_framework import serializers

from readthedocs.restapi.views.model_views import UserSelectViewSet
from .models import DomainData


class DomainDataSerializer(serializers.ModelSerializer):
project = serializers.SlugRelatedField(slug_field='slug', read_only=True)
version = serializers.SlugRelatedField(slug_field='slug', read_only=True)

class Meta:
model = DomainData
fields = ('project', 'version', 'name', 'display_name', 'doc_type', 'doc_url')


class DomainDataAdminSerializer(DomainDataSerializer):

class Meta(DomainDataSerializer.Meta):
fields = '__all__'


class DomainDataAPIView(UserSelectViewSet): # pylint: disable=too-many-ancestors
model = DomainData
serializer_class = DomainDataSerializer
admin_serializer_class = DomainDataAdminSerializer
filter_fields = ('project__slug', 'version__slug', 'domain', 'type', 'doc_name', 'name')
78 changes: 78 additions & 0 deletions readthedocs/domaindata/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
"""
Sphinx Domain modeling.

http://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html
"""

from django.db import models
from django.utils.encoding import python_2_unicode_compatible
from django.utils.translation import ugettext_lazy as _

from readthedocs.builds.models import Version
from readthedocs.core.resolver import resolve
from readthedocs.projects.models import Project
from readthedocs.projects.querysets import RelatedProjectQuerySet


@python_2_unicode_compatible
class DomainData(models.Model):

"""
Information from a project about it's Sphinx domains.

This captures data about API objects that exist in that codebase.
"""

project = models.ForeignKey(
Project,
related_name='domain_data',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we will have project.domains and project.domain_data. I think it will be clearer as project.domains (current relationship) and project.sphinx_domains (or project.sphinxdomains), IMO.

)
version = models.ForeignKey(Version, verbose_name=_('Version'),
related_name='domain_data')
modified_date = models.DateTimeField(_('Publication date'), auto_now=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could start using this standard #4864

commit = models.CharField(_('Commit'), max_length=255)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? I think we already have this from the version 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a file can be removed from a specific commit, and we need to track this to delete it.


domain = models.CharField(
_('Domain'),
max_length=255,
)
name = models.CharField(
_('Name'),
max_length=255,
)
display_name = models.CharField(
_('Display Name'),
max_length=255,
)
type = models.CharField(
_('Type'),
max_length=255,
)
doc_name = models.CharField(
_('Doc Name'),
max_length=255,
)
anchor = models.CharField(
_('Anchor'),
max_length=255,
)
objects = RelatedProjectQuerySet.as_manager()

def __str__(self):
return f'''
DomainData [{self.project.slug}:{self.version.slug}]
[{self.domain}:{self.type}] {self.name} -> {self.doc_name}#{self.anchor}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be wrapped.

'''

@property
def doc_type(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the correct name, I wasn't able to find something that could be used instead. I mean, this is just the full object not, the type of document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function:: or py:function:'spam'.

Maybe "role_name" is a better name.

return f'{self.domain}:{self.type}'

@property
def doc_url(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name it as docs_url to follow the same pattern that we have in other places.

path = self.doc_name
if self.anchor:
path += f'#{self.anchor}'
full_url = resolve(
project=self.project, version_slug=self.version.slug, filename=path)
return full_url
55 changes: 55 additions & 0 deletions readthedocs/projects/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import json
import logging
import os
import sys
import shutil
import socket
from collections import Counter, defaultdict
Expand All @@ -25,6 +26,8 @@
from django.utils import timezone
from django.utils.translation import ugettext_lazy as _
from slumber.exceptions import HttpClientError
from sphinx.ext import intersphinx


from readthedocs.builds.constants import (
BUILD_STATE_BUILDING,
Expand Down Expand Up @@ -58,6 +61,7 @@
)
from readthedocs.doc_builder.loader import get_builder_class
from readthedocs.doc_builder.python_environments import Conda, Virtualenv
from readthedocs.domaindata.models import DomainData
from readthedocs.projects.models import APIProject
from readthedocs.restapi.client import api as api_v2
from readthedocs.vcs_support import utils as vcs_support_utils
Expand Down Expand Up @@ -1136,6 +1140,7 @@ def fileify(version_pk, commit):
),
)
_manage_imported_files(version, path, commit)
_update_intersphinx_data(version, path, commit)
else:
log.info(
LOG_TEMPLATE.format(
Expand All @@ -1146,6 +1151,56 @@ def fileify(version_pk, commit):
)


def _update_intersphinx_data(version, path, commit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to run this function in a celery tasks instead of calling syncly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already run inside of a celery task.

"""
Update intersphinx data for this version

:param version: Version instance
:param path: Path to search
:param commit: Commit that updated path
"""
object_file = os.path.join(path, 'objects.inv')

class MockConfig:
intersphinx_timeout = None # type: int
tls_verify = False

class MockApp:
srcdir = ''
config = MockConfig()

def warn(self, msg):
# type: (unicode) -> None
print(msg, file=sys.stderr)

invdata = intersphinx.fetch_inventory(MockApp(), '', object_file)
for key in sorted(invdata or {}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these two loops need to be sorted?

domain, _type = key.split(':')
for name, einfo in sorted(invdata[key].items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it clearer to use .items() in the first loop here?

for key, value in invdata.items():
    domain, _type = key.split(':')
    for name, einfo in value:

url = einfo[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a small comment with previous to this line, with the expected structure of this einfo object. Like

# (key, name, url, doc type, etc)
# ('py', 'function', '/some-nice-url/python-function/', 'func', ...)

or... a link to the docs.

I found debugging this complicated later.

if '#' in url:
doc_name, anchor = url.split('#')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use urlparse https://docs.python.org/3/library/urllib.parse.html, but the logic here is very simple, so not really sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, feels unnecessary. It's also not a full URL, so it might be confused by it.

else:
doc_name, anchor = url, ''
display_name = einfo[3]
obj, _ = DomainData.objects.get_or_create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail the first time because commit is not passed here and it's not marked as null=True.

project=version.project,
version=version,
domain=domain,
name=name,
display_name=display_name,
type=_type,
doc_name=doc_name,
anchor=anchor,
)
if obj.commit != commit:
obj.commit = commit
obj.save()
DomainData.objects.filter(project=version.project,
version=version
).exclude(commit=commit).delete()


def _manage_imported_files(version, path, commit):
"""
Update imported files for version.
Expand Down
2 changes: 2 additions & 0 deletions readthedocs/restapi/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
SocialAccountViewSet,
VersionViewSet,
)
from readthedocs.domaindata.api import DomainDataAPIView


router = routers.DefaultRouter()
Expand All @@ -34,6 +35,7 @@
router.register(r'project', ProjectViewSet, basename='project')
router.register(r'notification', NotificationViewSet, basename='emailhook')
router.register(r'domain', DomainViewSet, basename='domain')
router.register(r'domaindata', DomainDataAPIView, base_name='domaindata')
router.register(
r'remote/org',
RemoteOrganizationViewSet,
Expand Down
1 change: 1 addition & 0 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ def INSTALLED_APPS(self): # noqa
'readthedocs.notifications',
'readthedocs.integrations',
'readthedocs.analytics',
'readthedocs.domaindata',
'readthedocs.search',


Expand Down