Skip to content

ENH: change to_sql default from Text to UnicodeText #35627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gordthompson opened this issue Aug 8, 2020 · 3 comments
Open

ENH: change to_sql default from Text to UnicodeText #35627

gordthompson opened this issue Aug 8, 2020 · 3 comments
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@gordthompson
Copy link

Is your feature request related to a problem?

Yes. Databases that differentiate between Unicode and SBCS columns (notably Microsoft SQL Server nvarchar vs. varchar) can fail simple round-trips when using to_sql to write columns with Unicode strings.

Describe the solution you'd like

Change the default column dtype from Text ...

return Text

... to UnicodeText.

API breaking implications

None that I can see.

Describe alternatives you've considered

An alternative would be to modify the behaviour of Text in SQLAlchemy's mssql dialect, but SQLA already has UnicodeText and it solves the problem.

Additional context

import pandas as pd
import sqlalchemy as sa

engine = sa.create_engine("mssql+pyodbc://@mssqlLocal")

df = pd.DataFrame(
    [("English", "CHEERS!"), ("Greek", "ΟΠΑ!")], columns=["language", "toast"]
)
df.to_sql("pd_test", engine, index=False, if_exists="replace")
print(pd.read_sql_query("SELECT * FROM pd_test", engine))
"""console output:
  language    toast
0  English  CHEERS!
1    Greek     ???!
"""
@gordthompson gordthompson added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 8, 2020
@mathurk1
Copy link
Contributor

mathurk1 commented Aug 9, 2020

able to create this in the master branch as well,

def main():

    df = pd.DataFrame(
    [("English", "CHEERS!"), ("Greek", "ΟΠΑ!")], columns=["language", "toast"]
    )

    df.to_sql("pd_test", db_engine(), index=False, if_exists="replace")

    print(pd.__version__)
    print(pd.read_sql_query("SELECT * FROM pd_test", db_engine()))


if __name__ == "__main__":
    main()

produces

1.1.0.dev0+2073.g280efbfcc
  language    toast
0  English  CHEERS!
1    Greek     ???!

@mathurk1
Copy link
Contributor

mathurk1 commented Aug 9, 2020

it works fine when the database is a postgres -

1.1.0.dev0+2073.g280efbfcc
  language    toast
0  English  CHEERS!
1    Greek     ΟΠΑ!

the table created in postgres is below:

create table pd_test
(
	language text,
	toast text
);

@gordthompson
Copy link
Author

Yes, and this works with both MSSQL and PostgreSQL:

df = pd.DataFrame(
    [("English", "CHEERS!"), ("Greek", "ΟΠΑ!")], columns=["language", "toast"]
)
df.to_sql(
    "pd_test",
    engine,
    index=False,
    if_exists="replace",
    dtype={"toast": sa.UnicodeText},
)
print(pd.read_sql_query("SELECT * FROM pd_test", engine))
"""console output:
  language    toast
0  English  CHEERS!
1    Greek     ΟΠΑ!
"""

My request is to make that the default so we don't have to specify it every time.

@jbrockmendel jbrockmendel added IO SQL to_sql, read_sql, read_sql_query and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

3 participants