Skip to content

Decoupling table mapping to allow replaying all binlog events #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dmariassy
Copy link

As discussed in #56, starting a stream at a point in time where the schema of the source database was different than what it is now, will always break the program.

We can address this issue by allowing users of the library to set up a secondary MySQL database - a so-called Schema DB - and:

  1. Catching all those QueryEvents that modify the underlying schema
  2. Executing them on the Schema DB
  3. Pointing the BinlogStreamReader's __get_table_information method to this DB's information schema.

@julien-duponchelle
Copy link
Owner

First it's very interesting.

I'm not really happy to add an external dependency like sqlparse because it's complicate licence stuff (I know at least one org where this lib is used and where they need to ask legal departement about dependencies ...).

Do you think we can found a way to move outside the lib this system? I think we can provide a way for user to replace __get_table_information by allowing him to pass a function at init. Offering you the warranty that next version of pymysqlreplication will not break your code.

@dmariassy
Copy link
Author

Hi and thanks a lot for your feedback!

I understand your problem with external dependencies. However, simply allowing the user to pass in his/her own __get_table_information function wouldn't be sufficient to achieve the same result. That's because the parsing of the DDL statements must proceed at the same pace as the processing of other binlog events. Only this can ensure that the schema used for table mapping and the Row Events which are validated against the table map are always in sync. That's why I added the Query Event parsing and execution part to the main binlog streaming loop.

The solution I'm proposing consists of the three steps I listed in my original commit message, and the __get_table_information method only comes into play at the third stage of that process. I don't currently see how we could package all steps of the logic into a customisable __get_table_information function, since this function cannot be used to catch schema-changing Query Events.

So as I currently see it, if we really want to move away from sqlparse, our best bet would probably be an effort to mimic its capabilities and create our own (dumb) SQL parser that is only trained to catch DDL statements. But I'm not very keen on implementing that :)

WDYT?

@baloo
Copy link
Collaborator

baloo commented Nov 30, 2016

I believe it has been implemented and merged in #176. Feel free to reopen if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants