Skip to content

Embed APIv3: initial implementation #8319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Sep 21, 2021
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
61554ac
Move `clean_links` to embed/utils.py
humitos Jul 5, 2021
311ada4
`clean_links` response HTML in raw instead of a PyQuery object
humitos Jul 6, 2021
76d0bf5
Implement the minimal version of the contract
humitos Jul 6, 2021
e841b98
Add docs.sympy.org to the allowed domains
humitos Jul 6, 2021
78f6656
Implement doctool= usage for Sphinx `dt` known cases
humitos Jul 6, 2021
fe3a507
Use timeout= when requesting external pages
humitos Jul 6, 2021
28e4e83
Handle requests TooManyRedirects and other errors
humitos Jul 6, 2021
71b9e2d
Use cache-keys and djangos settings for timeouts
humitos Jul 7, 2021
1aa70e3
More logs and comments
humitos Jul 7, 2021
5bab780
Remove one unneeded conditional
humitos Jul 7, 2021
59fe0f1
Return fragment=null if there is no fragment
humitos Jul 7, 2021
517d3e8
Use setting for request.get `timeout=` argument
humitos Jul 12, 2021
562c260
Log exception to track wrong URLs in Sentry
humitos Jul 12, 2021
73e2ee3
Sanitize the URL inside `_download_page_content`
humitos Jul 12, 2021
854a44b
Handle malformed URLs (not netloc or scheme)
humitos Jul 12, 2021
49e02d5
Use for/else syntax sugar instead of a temporary variable
humitos Jul 12, 2021
be32bd3
Call `clean_links` before creating the response
humitos Jul 12, 2021
cc2227c
Do not depend on impicit state: pass the required arguments
humitos Jul 12, 2021
1a72c07
Don't return metadata (project, version, language, path)
humitos Jul 12, 2021
7b6b493
Update readthedocs/embed/v3/views.py
humitos Jul 15, 2021
418d9ac
Improve the response http status codes
humitos Jul 19, 2021
8a77043
Sanitize URL before passing it to `clean_liniks`
humitos Jul 19, 2021
d9ca50e
Comment to sanitize `cache_key` by URL
humitos Jul 19, 2021
3be9b8e
Update import for `clean_links` in tests
humitos Jul 19, 2021
a27122a
Do not call selectolax if there is no content
humitos Jul 19, 2021
84c3d18
Check if the domain is valid before calling `unresolver`
humitos Jul 19, 2021
7adccb2
Remove tedius warnings from pytest
humitos Jul 19, 2021
a1cf45a
Initial test suite for EmbedAPI v3
humitos Jul 19, 2021
ec2fd5f
Add `doctoolwriter` to allow `html4` and `html5` on Sphinx
humitos Aug 16, 2021
f05da3f
Run EmbedAPIv3 test on a different tox environment
humitos Aug 16, 2021
7f3e3a9
Fix tests with proper error message
humitos Aug 16, 2021
bcfba37
Run tests-embedapi in CircleCI
humitos Aug 16, 2021
905cbcf
Consider docutils 0.16 and 0.17 when checking HTML output
humitos Aug 16, 2021
b0dc81f
Revert "Fix tests with proper error message"
humitos Aug 16, 2021
9810f89
Revert "Add `doctoolwriter` to allow `html4` and `html5` on Sphinx"
humitos Aug 16, 2021
05936d3
Lint
humitos Aug 16, 2021
333c892
Disable unused-argument for now
humitos Aug 16, 2021
c4751c7
Make test for sphinxcontrib-bibtex to pass
humitos Aug 18, 2021
7dd4b4f
Auto-delete _build directory after test run
humitos Aug 18, 2021
d134ab9
Checks that depend on Sphinx version (3.5)
humitos Aug 18, 2021
4b5f2d6
Don't make doctoolversion= attribute mandatory when passing doctool=
humitos Aug 18, 2021
0056e90
Sphinx 3.5 seems to be different on its HTML
humitos Aug 18, 2021
cf3da8a
Lint
humitos Aug 18, 2021
aeaad76
Fragment case changed on 3.0.0
humitos Aug 18, 2021
ef5a14c
Don't run EmbedAPIv3 tests by default
humitos Aug 18, 2021
160e51b
Sphinx 3.5 adds an <span> on <dl>
humitos Aug 18, 2021
cc76893
Update readthedocs/embed/v3/tests/test_external_pages.py
humitos Sep 9, 2021
5e25c47
Log url= together with fragment=
humitos Sep 21, 2021
7edb272
Merge branch 'master' of github.com:readthedocs/readthedocs.org into …
humitos Sep 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@ jobs:
- run: pip install --user tox
- run: tox -e py36,codecov

tests-embedapi:
docker:
- image: 'cimg/python:3.6'
steps:
- checkout
- run: git submodule sync
- run: git submodule update --init
- run: pip install --user tox
- run: tox -c tox.embedapi.ini

checks:
docker:
- image: 'cimg/python:3.6'
Expand Down Expand Up @@ -45,3 +55,4 @@ workflows:
jobs:
- checks
- tests
- tests-embedapi
10 changes: 9 additions & 1 deletion pytest.ini
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
[pytest]
addopts = --reuse-db --strict-markers
addopts = --strict-markers
markers =
search
serve
proxito
embed_api
sphinx
python_files = tests.py test_*.py *_tests.py
filterwarnings =
# Ignore external dependencies warning deprecations
Expand All @@ -13,3 +15,9 @@ filterwarnings =
ignore:Pagination may yield inconsistent results with an unordered object_list.*:django.core.paginator.UnorderedObjectListWarning
# docutils
ignore:'U' mode is deprecated:DeprecationWarning
# slumber
ignore:Using 'method_whitelist' with Retry is deprecated and will be removed in v2.0.*:DeprecationWarning
# kombu
ignore:SelectableGroups dict interface is deprecated.*:DeprecationWarning
# django
ignore:Remove the context parameter from JSONField.*:django.utils.deprecation.RemovedInDjango30Warning
6 changes: 6 additions & 0 deletions readthedocs/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
import pytest
from rest_framework.test import APIClient


pytest_plugins = (
'sphinx.testing.fixtures',
)


@pytest.fixture
def api_client():
return APIClient()
2 changes: 1 addition & 1 deletion readthedocs/embed/tests/test_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import pytest
from pyquery import PyQuery

from readthedocs.embed.views import clean_links
from readthedocs.embed.utils import clean_links

URLData = namedtuple('URLData', ['docurl', 'href', 'expected'])

Expand Down
55 changes: 55 additions & 0 deletions readthedocs/embed/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""Embed utils."""

from urllib.parse import urlparse
from pyquery import PyQuery as PQ # noqa


def recurse_while_none(element):
"""Recursively find the leaf node with the ``href`` attribute."""
Expand All @@ -10,3 +13,55 @@ def recurse_while_none(element):
if not href:
href = element.attrib.get('id')
return {element.text: href}


def clean_links(obj, url, html_raw_response=False):
"""
Rewrite (internal) links to make them absolute.

1. external links are not changed
2. prepend URL to links that are just fragments (e.g. #section)
3. prepend URL (without filename) to internal relative links
"""

# TODO: do not depend on PyQuery
obj = PQ(obj)

if url is None:
return obj

for link in obj.find('a'):
base_url = urlparse(url)
# We need to make all internal links, to be absolute
href = link.attrib['href']
parsed_href = urlparse(href)
if parsed_href.scheme or parsed_href.path.startswith('/'):
# don't change external links
continue

if not parsed_href.path and parsed_href.fragment:
# href="#section-link"
new_href = base_url.geturl() + href
link.attrib['href'] = new_href
continue

if not base_url.path.endswith('/'):
# internal relative link
# href="../../another.html" and ``base_url`` is not HTMLDir
# (e.g. /en/latest/deep/internal/section/page.html)
# we want to remove the trailing filename (page.html) and use the rest as base URL
# The resulting absolute link should be
# https://slug.readthedocs.io/en/latest/deep/internal/section/../../another.html

# remove the filename (page.html) from the original document URL (base_url) and,
path, _ = base_url.path.rsplit('/', 1)
# append the value of href (../../another.html) to the base URL.
base_url = base_url._replace(path=path + '/')

new_href = base_url.geturl() + href
link.attrib['href'] = new_href

if html_raw_response:
return obj.outerHtml()

return obj
Empty file.
Empty file.
14 changes: 14 additions & 0 deletions readthedocs/embed/v3/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import os
import shutil
import pytest

from .utils import srcdir


@pytest.fixture(autouse=True, scope='module')
def remove_sphinx_build_output():
"""Remove _build/ folder, if exist."""
for path in (srcdir,):
build_path = os.path.join(path, '_build')
if os.path.exists(build_path):
shutil.rmtree(build_path)
9 changes: 9 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/bibtex-cite.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
sphinxcontrib-bibtex
====================

See https://sphinxcontrib-bibtex.readthedocs.io/en/latest/ for more information about how to use ``sphinxcontrib-bibtex``.

See :cite:t:`1987:nelson` for an introduction to non-standard analysis.
Non-standard analysis is fun :cite:p:`1987:nelson`.

.. bibliography::
11 changes: 11 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/chapter-i.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
:orphan:

Chapter I
=========

This is Chapter I.

Section I
---------

This the Section I inside Chapter I.
17 changes: 17 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# conf.py to run tests
import sphinxcontrib.bibtex

master_doc = 'index'
extensions = [
'sphinx.ext.autosectionlabel',
'sphinxcontrib.bibtex',
]

bibtex_bibfiles = ['refs.bib']

def setup(app):
app.add_object_type(
'confval', # directivename
'confval', # rolename
'pair: %s; configuration value', # indextemplate
)
12 changes: 12 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/configuration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Configuration
=============

Examples of configurations.

.. confval:: config1

Description: This the description for config1

Default: ``'Default value for config'``

Type: bool
9 changes: 9 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/glossary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Glossary
--------

Example using a ``:term:`` role :term:`Read the Docs`.

.. glossary::

Read the Docs
Best company ever.
9 changes: 9 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Title
=====

This is an example page used to test EmbedAPI parsing features.

Sub-title
---------

This is a reference to :ref:`sub-title`.
6 changes: 6 additions & 0 deletions readthedocs/embed/v3/tests/examples/default/refs.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@Book{1987:nelson,
author = {Edward Nelson},
title = {Radically Elementary Probability Theory},
publisher = {Princeton University Press},
year = {1987}
}
71 changes: 71 additions & 0 deletions readthedocs/embed/v3/tests/test_basics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import pytest

from django.conf import settings
from django.core.cache import cache
from django.urls import reverse

from .utils import srcdir


@pytest.mark.django_db
@pytest.mark.embed_api
class TestEmbedAPIv3Basics:

@pytest.fixture(autouse=True)
def setup_method(self, settings):
settings.USE_SUBDOMAIN = True
settings.PUBLIC_DOMAIN = 'readthedocs.io'
settings.RTD_EMBED_API_EXTERNAL_DOMAINS = ['docs.project.com']

self.api_url = reverse('embed_api_v3')

yield
cache.clear()

def test_not_url_query_argument(self, client):
params = {}
response = client.get(self.api_url, params)
assert response.status_code == 400
assert response.json() == {'error': 'Invalid arguments. Please provide "url".'}

def test_not_allowed_domain(self, client):
params = {
'url': 'https://docs.notalloweddomain.com#title',
}
response = client.get(self.api_url, params)
assert response.status_code == 400
assert response.json() == {'error': 'External domain not allowed. domain=docs.notalloweddomain.com'}

def test_malformed_url(self, client):
params = {
'url': 'https:///page.html#title',
}
response = client.get(self.api_url, params)
assert response.status_code == 400
assert response.json() == {'error': f'The URL requested is malformed. url={params["url"]}'}

def test_rate_limit_domain(self, client):
params = {
'url': 'https://docs.project.com#title',
}
cache_key = 'embed-api-docs.project.com'
cache.set(cache_key, settings.RTD_EMBED_API_DOMAIN_RATE_LIMIT)

response = client.get(self.api_url, params)
assert response.status_code == 429
assert response.json() == {'error': 'Too many requests for this domain. domain=docs.project.com'}

def test_infinite_redirect(self, client, requests_mock):
requests_mock.get(
'https://docs.project.com',
status_code=302,
headers={
'Location': 'https://docs.project.com',
},
)
params = {
'url': 'https://docs.project.com#title',
}
response = client.get(self.api_url, params)
assert response.status_code == 400
assert response.json() == {'error': f'The URL requested generates too many redirects. url={params["url"]}'}
Loading