Skip to content

Alter statements that add columns incorrectly assign columns before the alter #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gmillsap opened this issue Sep 28, 2017 · 1 comment

Comments

@gmillsap
Copy link

gmillsap commented Sep 28, 2017

If an alter statement has been run that adds a column (possible other scenarios regarding alters could cause this too) the column references before the alter statement will be incorrect. Ex:

=== WriteRowsEvent ===
Date: 2017-09-28T08:38:20
Log position: 1049
Event size: 25
Read bytes: 12
Table: test_db.test_table
Affected columns: 3
Changed rows: 1
Values:

('*', u'first_name', ':', u'first_name')
('*', u'uid', ':', 5)
('*', u'middle_name', ':', u'last_name') <------------- Do to the schema change, this is incorrect
()
=== XidEvent ===
Date: 2017-09-28T08:38:20
Log position: 1080
Event size: 8
Read bytes: 8
Transaction ID: 73
()
=== QueryEvent ===
Date: 2017-09-28T08:39:05
Log position: 1263
Event size: 160
Read bytes: 160
Schema: test_db
Execution time: 0
Query: alter table test_table add column middle_name varchar(255) default null after first_name
()

I understand that there is no direct way to fix this, due to the fact that the schema is queried in order to determine the column names, as the binary log doesn't output them as column names, rather column order (or some such). My suggestion is to allow for an additional configuration option of a second MySql connection, one to check against for schema.

We are trying to use this for an ETL process for data warehousing where we are essentially performing a replication process with transformation before loading into another MySql database.

I would fork the project and add this additional configuration option, but I'm not a python developer by trade, and I figure it would be a task better suited for you guys. Though if you're not interested in implementing it, I may have to try my hand.

@baloo
Copy link
Collaborator

baloo commented Sep 28, 2017

there is already a configuration option to specify a second mysql connection. See ctl_connection_settings in binlogstream. Or #176 that added it.

@baloo baloo closed this as completed Sep 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants