You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MySQL provides BINARY and VARBINARY data types. These are treated as strings with the binary character set. As far as MySQL is concerned, the only real difference between binary and text fields is the character set (although this is a bit of a simplification).
This is why, when using ColumnTypes(), the mysql driver returns CHAR or VARCHAR; there's no notion of a BINARY field type internally.
Motivation
The use case I have is for an application that ingests data from multiple sources and writes it to a common format. This requires identifying column types in the input database (which is MySQL, in this case) to ensure they can be supported in the common format.
This common format supports UTF-8 strings, but not arbitrary binary strings. Therefore, I'd like to be able to identify whether a column that the driver says is CHAR is actually a text field, and not a byte slice.
Possible Solutions
I can see two possible approaches that would benefit my use case here:
Provide access to a column's character set
Switch over character set to return BINARY or VARBINARY when appropriate
Option 1: Provide access to a column's character set
The benefit of this option is that it opens the way for users of the driver to do more than just distinguish between BINARY and CHAR. One could potentially handle different character sets in different ways, which I can imagine could be useful with unusually-configured databases.
The major disadvantage is that I'm not aware of an existing interface in the stdlib 'sql' package for returning this sort of information. Doing so would require providing some new MySQL specific interface for getting this information out of the driver. Not impossible, but could be tricky to get right.
Option 2: Switch over character set to return BINARY or VARBINARY when appropriate
This option should be relatively straightforward to implement. I believe the packet received by the driver already contains information about the character set for string types. We could store this information when reading field information from a packet, and later switch over it when we need to know the column type.
This option would require some modest changes to fields.go and packets.go to provide the extra logic required. I believe this sort of approach was discussed in #667, with little opposition, but the decision was to wait for some feedback from users before implementing.
Conclusion
I suggest option 2 be implemented, as it requires the smallest change and solves the use case I presented above. Option 1 could still be implemented later if it was found to be useful.
Subject to any feedback on what I've proposed above, I'll try implementing option 2 with the aim of producing a suitable PR.
Let me know if you have any questions or comments or if you think I've missed anything.
Note, there is also a similar issue with the TEXT and BLOB data types (and their variants). The difference here is that these fields are read as BLOBs by default (whether they are actually TEXT or BLOB. The fix should be more or less the same in this case.
The text was updated successfully, but these errors were encountered:
MySQL provides
BINARY
andVARBINARY
data types. These are treated as strings with thebinary
character set. As far as MySQL is concerned, the only real difference between binary and text fields is the character set (although this is a bit of a simplification).This is why, when using
ColumnTypes()
, themysql
driver returnsCHAR
orVARCHAR
; there's no notion of aBINARY
field type internally.Motivation
The use case I have is for an application that ingests data from multiple sources and writes it to a common format. This requires identifying column types in the input database (which is MySQL, in this case) to ensure they can be supported in the common format.
This common format supports UTF-8 strings, but not arbitrary binary strings. Therefore, I'd like to be able to identify whether a column that the driver says is
CHAR
is actually a text field, and not a byte slice.Possible Solutions
I can see two possible approaches that would benefit my use case here:
BINARY
orVARBINARY
when appropriateOption 1: Provide access to a column's character set
The benefit of this option is that it opens the way for users of the driver to do more than just distinguish between
BINARY
andCHAR
. One could potentially handle different character sets in different ways, which I can imagine could be useful with unusually-configured databases.The major disadvantage is that I'm not aware of an existing interface in the stdlib 'sql' package for returning this sort of information. Doing so would require providing some new MySQL specific interface for getting this information out of the driver. Not impossible, but could be tricky to get right.
Option 2: Switch over character set to return
BINARY
orVARBINARY
when appropriateThis option should be relatively straightforward to implement. I believe the packet received by the driver already contains information about the character set for string types. We could store this information when reading field information from a packet, and later switch over it when we need to know the column type.
The character set for binary fields is
63
, as evidenced here: https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_basic_character_set.html.This option would require some modest changes to
fields.go
andpackets.go
to provide the extra logic required. I believe this sort of approach was discussed in #667, with little opposition, but the decision was to wait for some feedback from users before implementing.Conclusion
I suggest option 2 be implemented, as it requires the smallest change and solves the use case I presented above. Option 1 could still be implemented later if it was found to be useful.
Subject to any feedback on what I've proposed above, I'll try implementing option 2 with the aim of producing a suitable PR.
Let me know if you have any questions or comments or if you think I've missed anything.
Note, there is also a similar issue with the
TEXT
andBLOB
data types (and their variants). The difference here is that these fields are read asBLOB
s by default (whether they are actuallyTEXT
orBLOB
. The fix should be more or less the same in this case.The text was updated successfully, but these errors were encountered: