Skip to content

Added UNIQUE KEYS to fluview_clinical to avoid duplicates #1127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dmytrotsko
Copy link
Contributor

closes

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

Summary

Added UNIQUE KEY constraints to epidata.fluview_clinical to avoid duplicate rows.

Before that, we need to delete duplicates:
DELETE FROM fluview_clinical WHERE id in ( SELECT dup.id FROM ( SELECT * FROM ( SELECT fc.id, fc.release_date, fc.issue, fc.epiweek, fc.region, fc.lag, fc.total_specimens, fc.total_a, fc.total_b, fc.percent_positive, fc.percent_a, fc.percent_b, ROW_NUMBER() OVER ( PARTITION BY fc.release_date,fc.issue,fc.epiweek,fc.region,fc.lag,fc.total_specimens,fc.total_a,fc.total_b,fc.percent_positive,fc.percent_a,fc.percent_b ORDER BY fc.release_date ) as row_num FROM epidata.fluview_clinical fc ORDER BY fc.release_date DESC ) s WHERE s.row_num > 1 ) dup );

@dmytrotsko dmytrotsko requested review from melange396 and krivard April 7, 2023 14:49
@dmytrotsko dmytrotsko linked an issue Apr 7, 2023 that may be closed by this pull request
melange396
melange396 previously approved these changes Apr 7, 2023
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! you might want to consider adding a new .sql file in src/ddl/migrations/ that has an ALTER TABLE statement to add the new index (and another to delete the old one).

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escalating George's request for a migrations file; for this it's required

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@dmytrotsko dmytrotsko requested review from krivard and melange396 and removed request for melange396 June 9, 2023 15:12
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dmytro and i talked about some other ways to approach this, he will probably be updating the migration file sometime soon.

…alculation correct. Changed issue to release_date in ON DUPLICATE KEY UPDATE secion for table.
…pulate it with only unique rows from original `fluview_clinical` table.

Added statement to remove duplicates from `fluview_clinical` table.
Added `issue` index drop on `fluview_clinical` table with with the following creation of new one that is based on (`issue`, `epiweek`, `region`).
@dmytrotsko dmytrotsko requested a review from melange396 June 21, 2023 11:58
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! just a few things for the paste/typo, adding 2 comments, and doing the table drop+rename!

Comment on lines 115 to 118
-- Before creating new unique constraint we need to remove rows that do not satisfy it.
DELETE FROM
fluview_clinical
WHERE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a reminder that we can remove the fluview_clinical table, and then rename fluview_clinical_v2 back to fluview_clinical instead of this

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 solid!

@dmytrotsko dmytrotsko closed this Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fluview_clinical queries return duplicate results
3 participants