Skip to content

expose server side analytics #5141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jab opened this issue Jan 21, 2019 · 8 comments
Closed

expose server side analytics #5141

jab opened this issue Jan 21, 2019 · 8 comments
Labels
Feature New feature Needed: design decision A core team decision is required

Comments

@jab
Copy link

jab commented Jan 21, 2019

As of #4131, it looks like Read the Docs is tracking some analytics server side. Awesome: This offers advantages to documentation readers (e.g. their IPs are anonymized before being sent to Google Analytics), but also an advantage to documentation publishers that is not yet realized: Currently publishers need to supply their own Google Analytics tracking ID to be able to track views of their docs, which (1) not all publishers bother to do (it's buried under Advanced Settings), and (2) won't count visitors who block requests to GA (e.g. using a browser extension). If Read the Docs exposed to projects the server side analytics that it's already tracking, it would address both these issues. Surfacing even just one or two metrics such as visitors per month and pageviews per month would be really useful. Any interest?

Thanks for your consideration and for all your work on Read the Docs!

@stsewd stsewd added Feature New feature Needed: design decision A core team decision is required labels Jan 21, 2019
@davidfischer
Copy link
Contributor

Right now the analytics sent server side are exclusively advertising related. Longer term I'd like to completely remove client side GA and switch to entirely server side GA. I outlined my thoughts here. I even built a separate module for it.

One reason I'm hesitating slightly is that right now we're sending ~1-2k/day events to GA (ad clicks) and that's fine. If we made every pageview on RTD send to GA server side, we'd be looking at closer to 1-2M/day. Perhaps using some serverless tech is a better fit.

Regardless, I'm glad somebody else is interested in this! This is on my list of stuff I want to do but it hasn't yet bubbled to the top.

@jab
Copy link
Author

jab commented Jan 21, 2019

Thanks for the quick reply @davidfischer, and glad to hear this is already on your radar! One quick followup thought: I know Cloudflare is able to do this for its users and makes the data available via its API as well as its browser UI. Here's a screenshot I just took for one of my sites:

screen shot 2019-01-21 at 17 34 28

It looks like Read the Docs is using Azure CDN, which I've no experience with, but maybe they provide something similar that could save you some work?

@davidfischer
Copy link
Contributor

Currently we are only using Azure CDN for static files and not for dynamic content so I don't think it would work in its current form.

Secondly, we attach a lot of data to pageviews and events so we can understand the site better. For example, I look at pageviews by programming language of the docs or pageviews by Sphinx theme pretty frequently. Ideally I'd like to still get that.

Interestingly, the really privacy conscious stuff in GA is the stuff I don't want or need at all. I don't need any demographics info and I don't need to know that a user who visited our site 6 months ago is "returning".

@davidfischer
Copy link
Contributor

screen shot 2019-01-21 at 2 55 11 pm

Just to show off, here's a small dashboard of custom dimension breakdowns. It's a week's worth of data. I removed stuff that would identify single projects or small groups.

@jab
Copy link
Author

jab commented Jan 21, 2019

Interesting, thanks!

This would be a more significant change to your current architecture, but could save hosting costs and improve page load times, so just in case it's worth considering:

You could still serve mutable responses (like all <projectid>.rtfd.io pages) with a cache-control: public header and a low (e.g. 5-minute) max-age, such that a CDN can still serve them. You'd then move the server-side metrics from the endpoint that serves top-level pages to some dedicated analytics endpoint that would be accessed from a subrequest of every page (e.g. via XHR), whose response would not have any cache headers (so clients would always re-request it and the CDN would never cache it).

This has worked well for me with Cloudflare, so just thought I'd share in case it's helpful.

@davidfischer
Copy link
Contributor

Thanks for the tip!

@stsewd
Copy link
Member

stsewd commented Jun 1, 2020

@davidfischer was this solved by #6121?

@davidfischer
Copy link
Contributor

It isn't quite exactly the same but it is partially solved. It's probably good enough to call it done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

3 participants