-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Record search queries smartly #6088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
0603ed4
81b1f5d
703f510
d1090f0
2001be4
9200121
58ecb39
294f6bd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -142,19 +142,45 @@ def delete_old_search_queries_from_db(): | |
|
||
|
||
@app.task(queue='web') | ||
def record_search_query(project_slug, version_slug, query, total_results): | ||
"""Record search query in database.""" | ||
if not project_slug or not version_slug or not query or not total_results: | ||
def record_search_query(project_slug, version_slug, query, total_results, time): | ||
"""Record/update search query in database.""" | ||
if not project_slug or not version_slug or not query: | ||
log.debug( | ||
'Not recording the search query. Passed arguments: ' | ||
'project_slug: %s, version_slug: %s, query: %s, total_results: %s' % ( | ||
project_slug, version_slug, query, total_results | ||
'project_slug: %s, version_slug: %s, query: %s, total_results: %s, time: %s' % ( | ||
project_slug, version_slug, query, total_results, time | ||
) | ||
) | ||
return | ||
|
||
project_qs = Project.objects.filter(slug=project_slug) | ||
before_10_sec = time - timezone.timedelta(seconds=10) | ||
partial_query_qs = SearchQuery.objects.filter( | ||
project__slug=project_slug, | ||
version__slug=version_slug, | ||
query__startswith=query, | ||
created__gte=before_10_sec, | ||
).order_by('-created') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder about the performance characteristics of this. I don't believe |
||
|
||
# if partial query exists, | ||
# just update it instead of creating new SearchQuery object. | ||
if partial_query_qs.exists(): | ||
obj = partial_query_qs.first() | ||
dojutsu-user marked this conversation as resolved.
Show resolved
Hide resolved
|
||
obj.created = time | ||
obj.query = query | ||
obj.save() | ||
return | ||
|
||
# don't record query with zero results. | ||
if not total_results: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason this block isn't at the top of the function? I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. I wanted to update the timing and query even if no results are there. For example:
I am not sure if I understood. |
||
log.debug( | ||
'Not recording search query because of zero results. Passed arguments: ' | ||
'project_slug: %s, version_slug: %s, query: %s, total_results: %s, time: %s' % ( | ||
project_slug, version_slug, query, total_results, time | ||
) | ||
) | ||
return | ||
|
||
project_qs = Project.objects.filter(slug=project_slug) | ||
dojutsu-user marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if not project_qs.exists(): | ||
log.debug( | ||
'Not recording the search query because project does not exist. ' | ||
|
@@ -168,11 +194,21 @@ def record_search_query(project_slug, version_slug, query, total_results): | |
version_qs = Version.objects.filter(project=project, slug=version_slug) | ||
|
||
if not version_qs.exists(): | ||
log.debug( | ||
'Not recording the search query because version does not exist. ' | ||
'project_slug: %s, version_slug: %s' % ( | ||
project_slug, version_slug | ||
) | ||
) | ||
return | ||
|
||
version = version_qs.first() | ||
SearchQuery.objects.create( | ||
|
||
# make a new SearchQuery object. | ||
obj = SearchQuery.objects.create( | ||
project=project, | ||
version=version, | ||
query=query, | ||
) | ||
obj.created = time | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be an auto field? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes... but the actual time is when the query was made. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say that's probably OK to use the auto field for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I used this data for query also, so I think if we use automated data, results will be wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be querying the |
||
obj.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how tasks are serialized, this may not work exactly as planned. I know celery supports Pickle and JSON and I'm not sure which one we are using. Datetimes are not JSON serializable by default although Django does have a way to do it. However, I'm not sure they come back as timezone aware datetimes after they do. Did you verify this?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During local development, celery uses
json
serializer.readthedocs.org/readthedocs/settings/dev.py
Line 39 in 3c04a96
(Docs: https://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERY_RESULT_SERIALIZER#new-lowercase-settings)
I am not sure how to verify this.
I did print the
timezone.now()
and the SQL query -- that looks pretty similar.timezone.now():
2019-08-23 08:45:42.841161+00:00
SQL Query: