ENH: The need to re-design to_sql() method #49246

redreamality · 2022-10-22T03:17:42Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

There exists too much issues about this functionality, some even not close for 5 years. The original design of to_sql didn't fully take different db_dialects into consideration. Overly high level design of this function also prohibits the testing and functionality changing of this function, some PR #48331 was even being blocked due to unrelated test issues, which largely discourages the community.

To this extent, I would like to suggest breaking it down to some more specific and lower-level functions, like _to_sql_oracle, etc
making this less.
Another suggestion is that: It is acceptable if only partial functionality is implemented, say, only for a certain kind of db is designed. If some funcitonality is not officially supported, presenting example gist/code snippets in the document is also a good choice.

Refered issues:
#15988
#48331
#15988
#40647
#35347
#41335

Feature Description

Breaking down the to_sql() into lower level dbs, according to the connection type.

Alternative Solutions

see description.

Additional Context

No response

jreback · 2022-10-22T03:21:35Z

-1 this is not pandas responsibility - most of this is deferred to sqlalchemy already

redreamality · 2022-10-22T03:33:27Z

There will exist some operations that is very different among different dbs, for example the upsert operation.
If the pandas community regard this as unrelated issues, I suggest removing this functionality at all.
After all, it is not difficult to iter over all the lines and insert into a db and it is better to tell the user that the functionality is not available than telling them a nice picture along with an awful experience of searching around and finding out the functionality is not ready for production at all!

ParfaitG · 2022-10-24T15:52:57Z

While not an author of this particular method, the philosophy of most pandas IO methods (to_csv, to_excel, to_json, to_xml, etc.) underscores general use cases to migrate data from tabular, two-dimension DataFrames (rows by columns) to external formats in similar flat two-dimension structures. However, these formats can have many nested qualities, hierarchies, and dimensions of which pandas' to_* methods usually serves one type.

Likewise, DataFrame.to_sql is meant to be database-agnostic and generalizable to any SQL Alchemy connected backend. For the base case of migrating data from a pandas DataFrame to a database table, this IO method does work and works well across RDBMSs. However, database tables can have many constraints like primary keys, special data types, validation rules, and other properties.

The issues you link are nuanced needs for particular schema. The to_sql method cannot solve every particular need of every RDBMS connection. Users need to develop tailored solutions with their DB-APIs to handle their specific use cases such as duplicate keys and temp table staging and work with their dialects that support non-industry standard SQL such as UPSERT, INSERT IGNORE, or CREATE TEMP TABLE. Again, pandas to_* methods are really for general use cases of tabular data migration.

mroeschke · 2022-10-25T18:19:19Z

Thanks for the suggestion, but I would be -1 on refactoring or removing to_sql in its current state given the reasoning above. Closing as unlikely to be pursued.

redreamality added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 22, 2022

mroeschke closed this as completed Oct 25, 2022

redreamality mentioned this issue Feb 16, 2023

When using to_sql(), continue if duplicate primary keys are detected? #15988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: The need to re-design to_sql() method #49246

ENH: The need to re-design to_sql() method #49246

redreamality commented Oct 22, 2022 •

edited

Loading

jreback commented Oct 22, 2022

redreamality commented Oct 22, 2022 •

edited

Loading

ParfaitG commented Oct 24, 2022

mroeschke commented Oct 25, 2022

ENH: The need to re-design to_sql() method #49246

ENH: The need to re-design to_sql() method #49246

Comments

redreamality commented Oct 22, 2022 • edited Loading

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

jreback commented Oct 22, 2022

redreamality commented Oct 22, 2022 • edited Loading

ParfaitG commented Oct 24, 2022

mroeschke commented Oct 25, 2022

redreamality commented Oct 22, 2022 •

edited

Loading

redreamality commented Oct 22, 2022 •

edited

Loading