Skip to content

Record search queries smartly #6088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 5, 2019
Merged

Record search queries smartly #6088

merged 8 commits into from
Sep 5, 2019

Conversation

dojutsu-user
Copy link
Member

This PR refactor the recording mechanism for better support of "search as you type" feature.
Also, It makes all the queries lowercase so as to have no difference between words like Build and build

@dojutsu-user dojutsu-user changed the title Record queries smartly Record search queries smartly Aug 21, 2019
@dojutsu-user
Copy link
Member Author

dojutsu-user commented Aug 21, 2019

Currently, it reduces the total partial queries -- not all but better than storing every partial query.

project=project,
version=version,
query=query,
)
obj.created = time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be an auto field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... but the actual time is when the query was made.
Celery task are not executed immediately so the automated time will be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that's probably OK to use the auto field for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be querying the modifed field, and using that -- it will automatically update whenever it is saved: https://github.com/django-extensions/django-extensions/blob/master/django_extensions/db/models.py#L19 -- we should likely use that, instead of trying to change the created date.

@stsewd
Copy link
Member

stsewd commented Aug 21, 2019

Also, what about just saving the query if it is greater than 3 characters?

@dojutsu-user
Copy link
Member Author

@stsewd

Also, what about just saving the query if it is greater than 3 characters?

This will fail in many cases, like: readt, readthe, readthedo (for readthedocs)

@dojutsu-user dojutsu-user added the PR: work in progress Pull request is not ready for full review label Aug 21, 2019
@stsewd
Copy link
Member

stsewd commented Aug 21, 2019

Why would that fail?

We are start executing the same logic from readt. I'm not saying to remove the logic from this PR

@dojutsu-user
Copy link
Member Author

@stsewd
Ohhh... Okay
I get it.
Yeahh... that's a nice improvement.
I will update the PR.

@dojutsu-user
Copy link
Member Author

This is not ready for a review. It is too far from being OK.
Results are not good currently.

version__slug=version_slug,
query__startswith=query,
created__gte=before_10_sec,
).order_by('-created')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the performance characteristics of this. I don't believe created is indexed although project and version are. This might be worth testing on the live system before merging. This might be ok but I'm not 100% confident without just timing a similar query in prod.


# record the search query with a celery task
tasks.record_search_query.delay(
project_slug,
version_slug,
query,
total_results,
time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how tasks are serialized, this may not work exactly as planned. I know celery supports Pickle and JSON and I'm not sure which one we are using. Datetimes are not JSON serializable by default although Django does have a way to do it. However, I'm not sure they come back as timezone aware datetimes after they do. Did you verify this?

Copy link
Member Author

@dojutsu-user dojutsu-user Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During local development, celery uses json serializer.

CELERY_RESULT_SERIALIZER = 'json'

(Docs: https://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERY_RESULT_SERIALIZER#new-lowercase-settings)

they come back as timezone aware datetimes after they do

I am not sure how to verify this.
I did print the timezone.now() and the SQL query -- that looks pretty similar.

timezone.now(): 2019-08-23 08:45:42.841161+00:00
SQL Query:

SELECT "search_searchquery"."id", "search_searchquery"."created", "search_searchquery"."modified", "search_searchquery"."project_id", "search_searchquery"."version_id", "search_searchquery"."query"
FROM "search_searchquery"
INNER JOIN "projects_project"
ON ("search_searchquery"."project_id" = "projects_project"."id")
INNER JOIN "builds_version"
ON ("search_searchquery"."version_id" = "builds_version"."id")
WHERE ("projects_project"."slug" = rtd AND "builds_version"."slug" = latest AND "search_searchquery"."query" LIKE read% ESCAPE '\' AND "search_searchquery"."created" >= 2019-08-23 08:45:32.841161)
ORDER BY "search_searchquery"."created" DESC

return

# don't record query with zero results.
if not total_results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this block isn't at the top of the function? I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something.

Copy link
Member Author

@dojutsu-user dojutsu-user Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this block isn't at the top of the function?

Yes. I wanted to update the timing and query even if no results are there. For example: readthed will return no results but I wanted to update the timing and query if there is a previous query of readth.

I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something.

I am not sure if I understood.

@ericholscher
Copy link
Member

@dojutsu-user what is the status here? Is it working locally, or still needs more work?

@dojutsu-user
Copy link
Member Author

@ericholscher
I would really like to have some help here.
The logic and everything seems correct but I it is not working as expected.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this looks good, but we should definitely be updating the modified time, not the created time :)

project=project,
version=version,
query=query,
)
obj.created = time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be querying the modifed field, and using that -- it will automatically update whenever it is saved: https://github.com/django-extensions/django-extensions/blob/master/django_extensions/db/models.py#L19 -- we should likely use that, instead of trying to change the created date.

@ericholscher
Copy link
Member

Gonna merge this so it goes out on the next deploy, but would like to do some small updates to this over time.

@ericholscher ericholscher merged commit a1f3b05 into readthedocs:master Sep 5, 2019
@dojutsu-user dojutsu-user deleted the recording-query-smartly branch September 6, 2019 04:31
@humitos
Copy link
Member

humitos commented Sep 9, 2019

Think this looks good, but we should definitely be updating the modified time, not the created time :)

@dojutsu-user can you create an issue to track this?

@dojutsu-user
Copy link
Member Author

@humitos
Sure 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: work in progress Pull request is not ready for full review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants