Skip to content

Order search results by most viewed pages #5968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dojutsu-user opened this issue Jul 20, 2019 · 7 comments
Closed

Order search results by most viewed pages #5968

dojutsu-user opened this issue Jul 20, 2019 · 7 comments
Labels
Feature New feature Needed: design decision A core team decision is required

Comments

@dojutsu-user
Copy link
Member

Currently the order of the search results don't consider the number of views of a page.
It would be good if search results gets ordered based on the number of views parametere.

@dojutsu-user dojutsu-user added Feature New feature Needed: design decision A core team decision is required labels Jul 20, 2019
@dojutsu-user dojutsu-user changed the title Order search results by most viewed pages [Feature] Order search results by most viewed pages Jul 20, 2019
@humitos
Copy link
Member

humitos commented Jul 22, 2019

Do you have an idea about how to implement this?

I have some questions:

  • where will we store this data? Maybe in the HTMLFile object?
  • this data needs to be considering when doing full re-index
  • where this data will come from? how? is it a celery task querying the source of data every X minutes?
  • once we have this data, how complicate is to add it to ElasticSearch to be considered when sorting results?

@dojutsu-user
Copy link
Member Author

dojutsu-user commented Jul 22, 2019

@humitos
I don't have a definite answer to most of the points yet.

where will we store this data? Maybe in the HTMLFile object?

HTMLFile object seems to be the right place -- but they get deleted and recreated after every build, so we will lost all the data

this data needs to be considering when doing full re-index

We want the data into elasticsearch -- so yes

where this data will come from? how? is it a celery task querying the source of data every X minutes?

We can use Google Analytics, I don't know if they have an API for this or something else.
Or we can just count it ourselves -- Increase a count of the page by 1 everytime when the page loads.
Once we have the data, we can have celery run in every 7 days to update the data in elasticsearch.

once we have this data, how complicate is to add it to ElasticSearch to be considered when sorting results?

Once we have the data, I don't think it should be very complicated. I will research about this point.

@davidfischer Can we somehow use Google Analytics here?

@agjohnson agjohnson changed the title [Feature] Order search results by most viewed pages Order search results by most viewed pages Aug 5, 2019
@dojutsu-user
Copy link
Member Author

dojutsu-user commented Aug 12, 2019

Some simple thoughts on this:

Storing the data

We can store the data in a separate model. We can't store in HTMLFile model because these gets deleted and recreated after a build. Also, no use of ForeignKey because we don't want the relationships to be null when improted files objects are deleted and recreated.

Updating the data

I believe we can have API endpoint for it. Send a API request as soon as the page loads which increases its count by one.

Syncing the data with elasticsearch

Just a query to our new "count model" should be enough to get the data.

cc: @ericholscher

@ericholscher
Copy link
Member

I think we should likely update it via the footer API, not with JS. I think it should basically work exactly the same as the Search Analytics, except that we aggregate the counts by day for each (project/version/page) grouping. We should likely do the same for Search Analytics, where we aggregate the count on the model for each day by project, or at least project.

I think the main goals here are how we want to query/display the data. I'm imagining similar to search analytics:

  • List of top viewed pages
  • Each page can have a detail with a # of pageviews per day in a graph

I'm imagining something similar to this: https://github.com/readthedocs/readthedocs.org/graphs/traffic -- but we likely won't track referrer to start.

@dojutsu-user
Copy link
Member Author

@ericholscher
This idea looks better, footer_api probably is the best place for it.
I will start working on it right away.

@stsewd
Copy link
Member

stsewd commented Jun 1, 2020

I'm proposing another solution for this in #7082. The problem here, is that new pages (like docs for a new api) are going to be ranked lower in this case, but we actually want those pages to have a higher rank.

@ericholscher
Copy link
Member

I think we're not moving forward with this idea because of the complexity (ref #7297 (comment)). Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature Needed: design decision A core team decision is required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants