Decoupling table mapping to allow replaying all binlog events #163

dmariassy · 2016-08-07T13:38:48Z

As discussed in #56, starting a stream at a point in time where the schema of the source database was different than what it is now, will always break the program.

We can address this issue by allowing users of the library to set up a secondary MySQL database - a so-called Schema DB - and:

Catching all those QueryEvents that modify the underlying schema
Executing them on the Schema DB
Pointing the BinlogStreamReader's __get_table_information method to this DB's information schema.

julien-duponchelle · 2016-08-15T13:02:13Z

First it's very interesting.

I'm not really happy to add an external dependency like sqlparse because it's complicate licence stuff (I know at least one org where this lib is used and where they need to ask legal departement about dependencies ...).

Do you think we can found a way to move outside the lib this system? I think we can provide a way for user to replace __get_table_information by allowing him to pass a function at init. Offering you the warranty that next version of pymysqlreplication will not break your code.

dmariassy · 2016-08-15T15:23:44Z

Hi and thanks a lot for your feedback!

I understand your problem with external dependencies. However, simply allowing the user to pass in his/her own __get_table_information function wouldn't be sufficient to achieve the same result. That's because the parsing of the DDL statements must proceed at the same pace as the processing of other binlog events. Only this can ensure that the schema used for table mapping and the Row Events which are validated against the table map are always in sync. That's why I added the Query Event parsing and execution part to the main binlog streaming loop.

The solution I'm proposing consists of the three steps I listed in my original commit message, and the __get_table_information method only comes into play at the third stage of that process. I don't currently see how we could package all steps of the logic into a customisable __get_table_information function, since this function cannot be used to catch schema-changing Query Events.

So as I currently see it, if we really want to move away from sqlparse, our best bet would probably be an effort to mimic its capabilities and create our own (dumb) SQL parser that is only trained to catch DDL statements. But I'm not very keen on implementing that :)

WDYT?

baloo · 2016-11-30T16:30:54Z

I believe it has been implemented and merged in #176. Feel free to reopen if I'm wrong.

Decoupling table mapping to allow replaying all binlog events

5afacdc

baloo closed this Nov 30, 2016

bjoernhaeuser mentioned this pull request Nov 27, 2017

why the TableMapEvent pased by _ctl_connection #238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoupling table mapping to allow replaying all binlog events #163

Decoupling table mapping to allow replaying all binlog events #163

dmariassy commented Aug 7, 2016

julien-duponchelle commented Aug 15, 2016

dmariassy commented Aug 15, 2016

baloo commented Nov 30, 2016

Decoupling table mapping to allow replaying all binlog events #163

Decoupling table mapping to allow replaying all binlog events #163

Conversation

dmariassy commented Aug 7, 2016

julien-duponchelle commented Aug 15, 2016

dmariassy commented Aug 15, 2016

baloo commented Nov 30, 2016