Skip to content

.inv files are over-cached? #7590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nedbat opened this issue Oct 22, 2020 · 7 comments · Fixed by #7593
Closed

.inv files are over-cached? #7590

nedbat opened this issue Oct 22, 2020 · 7 comments · Fixed by #7593
Labels
Accepted Accepted issue on our roadmap Bug A bug

Comments

@nedbat
Copy link
Contributor

nedbat commented Oct 22, 2020

Details

Expected Result

We have a multi-book docset, with references between the books. We use an intersphinx_mapping to resolve these references:

intersphinx_mapping = {
    'opencoursestaff' : ('https://edx.readthedocs.io/projects/open-edx-building-and-running-a-course/en/latest/', None),
    'data' : ('https://edx.readthedocs.io/projects/devdata/en/latest/', None),
    'partnercoursestaff': ('https://edx.readthedocs.io/projects/edx-partner-course-staff/en/latest/', None),
    'insights' : ('https://edx.readthedocs.io/projects/edx-insights/en/latest/', None),
    'xblockapi' : ('https://edx.readthedocs.io/projects/xblock/en/latest/', None),
    'xblocktutorial' : ('https://edx.readthedocs.io/projects/xblock-tutorial/en/latest/', None),
    'installation' : ('https://edx.readthedocs.io/projects/edx-installing-configuring-and-running/en/latest/', None),
    'olx' : ('https://edx.readthedocs.io/projects/edx-open-learning-xml/en/latest/', None),
    'learners' : ('https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/', None),
    'openlearners' : ('https://edx.readthedocs.io/projects/open-edx-learner-guide/en/latest/', None),
    'opendevelopers' : ('https://edx.readthedocs.io/projects/edx-developer-guide/en/latest/', None),
    'opendataapi' : ('https://edx.readthedocs.io/projects/edx-data-analytics-api/en/latest/', None),
    'openreleasenotes' : ('https://edx.readthedocs.io/projects/open-edx-release-notes/en/latest/', None),
    'partnerreleasenotes': ('https://edx.readthedocs.io/projects/edx-release-notes/en/latest/', None),
    '2014releasenotes' : ('https://edx.readthedocs.io/projects/edx-2013-2014-release-notes/en/latest/', None),
    'retirement' : ('https://user-retirement-guide.readthedocs.io/en/latest/', None),
}

(The "learners" reference is to a particular branch so that my new references will be found as I work on the pull request.)

When I update one book, I find the next local build still can't resolve the new references. Using wget to pull the .inv file, it seems that the old content is still being served, but if I add a nonsense query parameter, I get the new content:

$ wget -q -O - https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv | wc -c
    2923
$ wget -q -O - https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv\?bust | wc -c
    3821
$ wget -q -O - https://edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv | wc -c
    2923

I saw the problem yesterday, and then new references seemed to resolve this morning, so it seems that waiting long enough will bring the new content, as if a cache expires. I don't know how long that time is.

@nedbat
Copy link
Contributor Author

nedbat commented Oct 22, 2020

It looks like the cache expired after 90 minutes.

@humitos
Copy link
Member

humitos commented Oct 22, 2020

Thanks for the report. The cache for a particular version should be purged immediately (couple of minutes, not 90 as you mentioned) after the build for that version succeed.

We will need to have more debugging information here to be able to find out the reason. Anything else that you can provide? Can you do some other tests here and check the HTTP headers: CF-Cache-Status and CF-RAY before and after a rebuild?

We have an alarm in our system that checks the latency since the build was triggered until the new content is published and online, and we are not noticing this 90 minutes cached delay that you are mentioning.

@nedbat
Copy link
Contributor Author

nedbat commented Oct 22, 2020

Here is header information:

Requesting objects.inv after 90 minutes have passed since the build:

HTTP/1.1 200 OK
Date: Thu, 22 Oct 2020 14:54:23 GMT
Content-Type: application/octet-stream
Content-Length: 3821
Connection: keep-alive
Set-Cookie: __cfduid=dcb621f5f1a9b8070f8b445100fce9c521603378463; expires=Sat, 21-Nov-20 14:54:23 GMT; path=/; domain=.readthedocs.io; HttpOnly; SameSite=Lax
Last-Modified: Thu, 22 Oct 2020 11:40:29 GMT
ETag: 0x8D8767F4725A1C7
x-ms-request-id: 045bc86b-f01e-00a0-737d-a8925d000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Access-Control-Allow-Origin: *
X-Served: Nginx-Proxito-Sendfile
X-Backend: web00020d
X-RTD-Project: edx
X-RTD-Version: nedbat-fix-learner-links
X-RTD-Path: /proxito/media/html/edx-guide-for-students/nedbat-fix-learner-links/objects.inv
X-RTD-Domain: edx.readthedocs.io
X-RTD-Version-Method: path
X-RTD-Project-Method: subdomain
Referrer-Policy: no-referrer-when-downgrade
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
CF-Cache-Status: HIT
Age: 2665
Expires: Thu, 22 Oct 2020 15:54:23 GMT
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
cf-request-id: 05f266da130000ec00f9804000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 5e640da3593fec00-BOS

Requesting objects.inv right after a build, the old content is served:

HTTP/1.1 200 OK
Date: Thu, 22 Oct 2020 15:06:06 GMT
Content-Type: application/octet-stream
Content-Length: 3821
Connection: keep-alive
Set-Cookie: __cfduid=d6c8d5fcfb5043867daaccb0bc9bd9bed1603379166; expires=Sat, 21-Nov-20 15:06:06 GMT; path=/; domain=.readthedocs.io; HttpOnly; SameSite=Lax
Last-Modified: Thu, 22 Oct 2020 11:40:29 GMT
ETag: 0x8D8767F4725A1C7
x-ms-request-id: 045bc86b-f01e-00a0-737d-a8925d000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Access-Control-Allow-Origin: *
X-Served: Nginx-Proxito-Sendfile
X-Backend: web00020d
X-RTD-Project: edx
X-RTD-Version: nedbat-fix-learner-links
X-RTD-Path: /proxito/media/html/edx-guide-for-students/nedbat-fix-learner-links/objects.inv
X-RTD-Domain: edx.readthedocs.io
X-RTD-Version-Method: path
X-RTD-Project-Method: subdomain
Referrer-Policy: no-referrer-when-downgrade
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
CF-Cache-Status: HIT
Age: 3368
Expires: Thu, 22 Oct 2020 16:06:06 GMT
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
cf-request-id: 05f271968f0000ebf8d485c000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 5e641ed0e8b6ebf8-BOS

Requesting objects.inv with a query param to bust the cache:

HTTP/1.1 200 OK
Date: Thu, 22 Oct 2020 15:03:59 GMT
Content-Type: application/octet-stream
Content-Length: 4055
Connection: keep-alive
Set-Cookie: __cfduid=d7fea1b0997ac260f34bb8b325d9b83751603379039; expires=Sat, 21-Nov-20 15:03:59 GMT; path=/; domain=.readthedocs.io; HttpOnly; SameSite=Lax
Last-Modified: Thu, 22 Oct 2020 15:00:42 GMT
ETag: 0x8D8769B3F4E8D34
x-ms-request-id: 09f67383-601e-00c8-5a84-a8cc0c000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Access-Control-Allow-Origin: *
X-Served: Nginx-Proxito-Sendfile
X-Backend: web00020d
X-RTD-Project: edx
X-RTD-Version: nedbat-fix-learner-links
X-RTD-Path: /proxito/media/html/edx-guide-for-students/nedbat-fix-learner-links/objects.inv
X-RTD-Domain: edx.readthedocs.io
X-RTD-Version-Method: path
X-RTD-Project-Method: subdomain
Referrer-Policy: no-referrer-when-downgrade
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
CF-Cache-Status: MISS
Expires: Thu, 22 Oct 2020 16:03:59 GMT
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
cf-request-id: 05f26fa5590000ebf04829a000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 5e641bb55fa1ebf0-BOS

@nedbat
Copy link
Contributor Author

nedbat commented Oct 22, 2020

A more careful test showed that wget got updated content an hour after the build. That meshes well will the "Cache-Control: public, max-age=3600" header.

@humitos
Copy link
Member

humitos commented Oct 22, 2020

I think I found the issue and I'd say that's a bug in our task that purge the cache. It gets confused when generating tags cache for subprojects.

For your particular case, when hitting edx.readthedocs.io/projects/edx-guide-for-students/en/nedbat-fix-learner-links/objects.inv we generate the HTTP header Cache-Tag: edx,edx-nedbat-fix-learner-links (source code), but then when the build finishes and we call the purge task, we are purging edx-guide-for-students-nedbat-fix-learner-links (source code in private repo).

The Cache-Tag that we are generating is wrong, because it uses the slug for the main project instead of the subproject.

@humitos
Copy link
Member

humitos commented Oct 27, 2020

@nedbat we just deployed the fix for this, could you confirm us that this is working as you expected now and the cache is being invalidated "immediately" (minutes) after the build is done? Thanks!

@nedbat
Copy link
Contributor Author

nedbat commented Oct 27, 2020

I tested it, and it works! My branch got an updated objects.inv right away. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Bug A bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants