Skip to content

schema refinement, dbjobs speed increase, and added migration routine #922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 19, 2022

Conversation

krivard
Copy link
Contributor

@krivard krivard commented Jun 1, 2022

This is the branch with the "constraintz" variant of v4.

  • schema refinement
  • dbjobs speed increase
  • added migration routine

Base automatically changed from v4-schema-revisions-release-prep-prep to v4-schema-revisions-release-prep June 15, 2022 20:14
@krivard krivard requested a review from nolangormley August 18, 2022 14:08
they come from a pending changeset to incorporate (into this branch, 'v4-srrpp-migrations') improvements made in a parallel 'mergedkey' branch...
i fear i may just be kicking more merge conflicts down the road, but hopefully this helps me to be less confused about some current diffs in the meantime.
changes are:
- signal_latest table definition simplified to automatically inherit structure from signal_history table
- removal of unused row-counting methods
- docstring update / correction
@melange396 melange396 marked this pull request as ready for review August 18, 2022 19:08
time_q.append(time.time())
logger.debug('signal_load_delete_processed', rows=self._cursor.rowcount, elapsed=time_q[-1]-time_q[-2])
except Exception as e:
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this exception be added to the logger so it's in the log file?

Copy link
Collaborator

@melange396 melange396 Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well...

This method is only really called by insert_or_update_batch which in turn will again catch and re-raise this exception, and it states "rollback is handled in csv_to_database; if you're calling this yourself, handle your own rollback".

Also, instantiating the structured_logger registers a handler by default that should log any uncaught exceptions.

Looking at this now (unless im missing something), its really just a no-op to catch a nonspecific exception and just simply re-raise it. The same situation effectively exists in the above mentioned insert_or_update_batch, but its presence coupled with the comment hints to a maintainer that exception handling is not an afterthought and is being dealt with elsewhere.

And to be fair, i cribbed the pattern in this method from @krivard 's version here (because im currently in the middle of incorporating lots of her testing fixes from that branch into this one). She probably had a good reason for this that got lost along the way; perhaps other code was removed, like an Exception subclass case that was handled differently, or a finally clause.

SO i think the correct thing to do here is to remove these no-op catch+reraise block boundaries, BUT i am going to leave them for now because they are harmless and we can discuss their efficacy later (or preferably sooner!). I made a note to address this that i will act on in the very near future, either by fixing it or turning it into a proper github issue.

im not sure why i wrote such a novel for this. ¯\_(ツ)_/¯

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the correct thing to do here is to remove these no-op catch+reraise block boundaries, BUT i am going to leave them for now because they are harmless and we can discuss their efficacy later (or preferably sooner!)

+1, agree on both counts

Copy link
Contributor

@nolangormley nolangormley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Big question I have is whether or not we should use nolock in the Acquisition database.py file or check in some way that there are no current locks on the tables in question.

Some other general questions about logging.

Copy link
Contributor

@nolangormley nolangormley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

George answered my questions on some of the code. All looks good!

@melange396
Copy link
Collaborator

Whoops, sorry if i assaulted anyones inboxes with edits i made to comments above!!! I forgot that i was responding to individual review comments and not submitting a single unified review myself (i thought i had to hit submit for them to be sent all at once).

@melange396
Copy link
Collaborator

This changeset slightly outgrew the scope of what this branch was supposed to be, so i want to summarize the totality of the delta up to this point:

  • simplifed schema definitions (and added placeholder stratification dimension table)
  • sped up various db operations by adding comprehensive indexes and by removing: bad index hints in the views, process_status, and compressed keys
  • integrated dbjobs() call into the end of insert_or_update_batch
  • created script to perform migration of data from v3 schema to v4 schema in batches
  • added method to fix autoincrement counter if it gets reset

@krivard krivard merged commit dc02a64 into v4-schema-revisions-release-prep Aug 19, 2022
@krivard krivard deleted the v4-srrpp-migrations branch August 19, 2022 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants