Skip to content

DB: do not fetch data and others when deleting rows #10446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 28, 2023

Conversation

humitos
Copy link
Member

@humitos humitos commented Jun 19, 2023

This task was cancelled again. The query shows a SELECT first that fetchs the
whole rows. I think we can reduce this time/memory by only fetching the ids.
@humitos humitos requested a review from a team as a code owner June 19, 2023 08:54
@humitos humitos requested a review from benjaoming June 19, 2023 08:54
Copy link
Contributor

@benjaoming benjaoming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to do something about this, but I think we need a different approach, judging from the docs?

@humitos
Copy link
Member Author

humitos commented Jun 26, 2023

We need to find a way to avoid doing a SELECT and only run a DELETE FROM. I'm not sure how to do that with Django 😞 -- we may need to write raw SQL query for this cleanup?

@ericholscher
Copy link
Member

I think raw SQL makes sense for tasks like these generally, as it explicit won't do any signal processing, and makes it really clear what we're doing. 👍

We are facing an issue with this query because it takes too long to
execute (more than 30s) making our DB to kill the query. This is because Django
performs a `SELECT` first to be able to trigger pre_ and post_ delete signals on
each object delete.

We don't really need this here, so we are using raw SQL to bypass this and make
the query to execute faster. This is not ideal, but we didn't find a better approach.
@humitos
Copy link
Member Author

humitos commented Jun 27, 2023

I updated this PR to use the DB cursor directly. Please, take another look and let me know. If you approve this PR and think the SQL is fine, I can manually run this code in production before merging to be 100% it will work fine. I tested the SELECT COUNT(*) that I put in the comments for now.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, 👍 on running the query manually as a test to start.

@stsewd
Copy link
Member

stsewd commented Jun 27, 2023

What about running these tasks more often and with a limit in the querysets? [:500] or similar. Not sure that I like the idea of running raw queries, this may be skipping some other django features/validations for related models.

The query is executed without requiring `.fetchall()`
@humitos
Copy link
Member Author

humitos commented Jun 28, 2023

@stsewd yeah, I'm not super happy with this solution either -- but I don't care too much about this data since it's just old data we need to clean up with some frequency anyways.

I tried other nicer approaches but I wasn't able to make them work without doing an auto SELECT and I got tired of trying. I'm fine testing the proposed solution for now. We can come back to this if we want to improve it and fine a nice way, but for now, as a first step, I want them working 😅

@humitos humitos enabled auto-merge (squash) June 28, 2023 13:13
@humitos humitos merged commit cd4535e into main Jun 28, 2023
@humitos humitos deleted the humitos/telemetry-cleanup branch June 28, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants