Skip to content

Differentiate between BINARY/CHAR #723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kwoodhouse93 opened this issue Dec 19, 2017 · 1 comment
Closed

Differentiate between BINARY/CHAR #723

kwoodhouse93 opened this issue Dec 19, 2017 · 1 comment

Comments

@kwoodhouse93
Copy link
Contributor

MySQL provides BINARY and VARBINARY data types. These are treated as strings with the binary character set. As far as MySQL is concerned, the only real difference between binary and text fields is the character set (although this is a bit of a simplification).

This is why, when using ColumnTypes(), the mysql driver returns CHAR or VARCHAR; there's no notion of a BINARY field type internally.

Motivation

The use case I have is for an application that ingests data from multiple sources and writes it to a common format. This requires identifying column types in the input database (which is MySQL, in this case) to ensure they can be supported in the common format.

This common format supports UTF-8 strings, but not arbitrary binary strings. Therefore, I'd like to be able to identify whether a column that the driver says is CHAR is actually a text field, and not a byte slice.

Possible Solutions

I can see two possible approaches that would benefit my use case here:

  1. Provide access to a column's character set
  2. Switch over character set to return BINARY or VARBINARY when appropriate

Option 1: Provide access to a column's character set

The benefit of this option is that it opens the way for users of the driver to do more than just distinguish between BINARY and CHAR. One could potentially handle different character sets in different ways, which I can imagine could be useful with unusually-configured databases.

The major disadvantage is that I'm not aware of an existing interface in the stdlib 'sql' package for returning this sort of information. Doing so would require providing some new MySQL specific interface for getting this information out of the driver. Not impossible, but could be tricky to get right.

Option 2: Switch over character set to return BINARY or VARBINARY when appropriate

This option should be relatively straightforward to implement. I believe the packet received by the driver already contains information about the character set for string types. We could store this information when reading field information from a packet, and later switch over it when we need to know the column type.

The character set for binary fields is 63, as evidenced here: https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_basic_character_set.html.

This option would require some modest changes to fields.go and packets.go to provide the extra logic required. I believe this sort of approach was discussed in #667, with little opposition, but the decision was to wait for some feedback from users before implementing.

Conclusion

I suggest option 2 be implemented, as it requires the smallest change and solves the use case I presented above. Option 1 could still be implemented later if it was found to be useful.

Subject to any feedback on what I've proposed above, I'll try implementing option 2 with the aim of producing a suitable PR.

Let me know if you have any questions or comments or if you think I've missed anything.

Note, there is also a similar issue with the TEXT and BLOB data types (and their variants). The difference here is that these fields are read as BLOBs by default (whether they are actually TEXT or BLOB. The fix should be more or less the same in this case.

@julienschmidt
Copy link
Member

Fixed by #724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants