-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Record search queries smartly #6088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record search queries smartly #6088
Conversation
Currently, it reduces the total partial queries -- not all but better than storing every partial query. |
project=project, | ||
version=version, | ||
query=query, | ||
) | ||
obj.created = time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be an auto field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... but the actual time is when the query was made.
Celery task are not executed immediately so the automated time will be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that's probably OK to use the auto field for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this data for query also, so I think if we use automated data, results will be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be querying the modifed
field, and using that -- it will automatically update whenever it is saved: https://github.com/django-extensions/django-extensions/blob/master/django_extensions/db/models.py#L19 -- we should likely use that, instead of trying to change the created
date.
Also, what about just saving the query if it is greater than 3 characters? |
This will fail in many cases, like: |
Why would that fail? We are start executing the same logic from |
@stsewd |
This is not ready for a review. It is too far from being OK. |
version__slug=version_slug, | ||
query__startswith=query, | ||
created__gte=before_10_sec, | ||
).order_by('-created') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the performance characteristics of this. I don't believe created
is indexed although project
and version
are. This might be worth testing on the live system before merging. This might be ok but I'm not 100% confident without just timing a similar query in prod.
|
||
# record the search query with a celery task | ||
tasks.record_search_query.delay( | ||
project_slug, | ||
version_slug, | ||
query, | ||
total_results, | ||
time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how tasks are serialized, this may not work exactly as planned. I know celery supports Pickle and JSON and I'm not sure which one we are using. Datetimes are not JSON serializable by default although Django does have a way to do it. However, I'm not sure they come back as timezone aware datetimes after they do. Did you verify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During local development, celery uses json
serializer.
CELERY_RESULT_SERIALIZER = 'json' |
(Docs: https://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERY_RESULT_SERIALIZER#new-lowercase-settings)
they come back as timezone aware datetimes after they do
I am not sure how to verify this.
I did print the timezone.now()
and the SQL query -- that looks pretty similar.
timezone.now(): 2019-08-23 08:45:42.841161+00:00
SQL Query:
SELECT "search_searchquery"."id", "search_searchquery"."created", "search_searchquery"."modified", "search_searchquery"."project_id", "search_searchquery"."version_id", "search_searchquery"."query"
FROM "search_searchquery"
INNER JOIN "projects_project"
ON ("search_searchquery"."project_id" = "projects_project"."id")
INNER JOIN "builds_version"
ON ("search_searchquery"."version_id" = "builds_version"."id")
WHERE ("projects_project"."slug" = rtd AND "builds_version"."slug" = latest AND "search_searchquery"."query" LIKE read% ESCAPE '\' AND "search_searchquery"."created" >= 2019-08-23 08:45:32.841161)
ORDER BY "search_searchquery"."created" DESC
return | ||
|
||
# don't record query with zero results. | ||
if not total_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this block isn't at the top of the function? I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this block isn't at the top of the function?
Yes. I wanted to update the timing and query even if no results are there. For example: readthed
will return no results but I wanted to update the timing and query if there is a previous query of readth
.
I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something.
I am not sure if I understood.
@dojutsu-user what is the status here? Is it working locally, or still needs more work? |
@ericholscher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this looks good, but we should definitely be updating the modified time, not the created time :)
project=project, | ||
version=version, | ||
query=query, | ||
) | ||
obj.created = time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be querying the modifed
field, and using that -- it will automatically update whenever it is saved: https://github.com/django-extensions/django-extensions/blob/master/django_extensions/db/models.py#L19 -- we should likely use that, instead of trying to change the created
date.
Gonna merge this so it goes out on the next deploy, but would like to do some small updates to this over time. |
@dojutsu-user can you create an issue to track this? |
@humitos |
This PR refactor the recording mechanism for better support of "search as you type" feature.
Also, It makes all the queries lowercase so as to have no difference between words like
Build
andbuild