Skip to content

ENH: df.to_sql lacks a parameter if_not_exist with options [create, fail] #40647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
olippuner opened this issue Mar 26, 2021 · 3 comments
Open
Assignees
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas IO SQL to_sql, read_sql, read_sql_query

Comments

@olippuner
Copy link

olippuner commented Mar 26, 2021

feature request and related problem

If a table does not exist pd.to_sql is unconditionnally creating it. There are situations, where this is not acceptable.

There are application strategies where this has to be avoided. Eg. in case failed name matching the ISERT might have to fail and signal, that programcode has to be adapted. Such condition arise because of possible changes in the DB model - the outside of python's data environment.

requested option feature for pd.to_sql:
If the table expected by python is not there, it should not be created but the attempted INSERT should fail.

Presently a a certain amount of boilerplate is needed to do such fail-safe handling.

tableName = '...'
schemaName = '...'
dbEngine = ....
# dataframe df is holding the data for target table named in [tableName]
# currently quote some boilerplate is needed to fail safe, if database table does not exist to avoid creating one
if tableExists(tableName):
    df.to_sql(tableName,  schema=schemaName, con=dbEngine,  index=True, if_exists='append')**
    print(f"INSERT-I-done ")
else
    print(f"INSERT-I-failed: table with expected name [{schemaName}].[{tableName}] does not yet or nomore exist")

This missing feature leads to another level of encapsulation in a wrapper class. It would be better to integrate this as a option directly to PANDAS.

Describe the proposed solution

Integrate this checking of the outside DB environment into Dataframe.to_sql

Suggested new parameter:

if_not_exists{‘fail’, ‘create’}, default 'fail'

API breaking implications

Dataframe.to_sql fails if the table specified by name does not exist in the schema. This should be the default, thus avoiding that dead phantom tables are created, resulting in dead ends off the track of the DB model, once the nomenklatura of the DB model has moved on. Coevolutive and agile development of applications python code and the datamodel and its naming thus should be supported, by failsafe handling of naming assumptions.

present processing still is possible with parameter if_no_exist='create'

Possible the default 'create' might be preferred, as thus it does not break existing code, even though in a greenfield approach I consider 'fail' to be the better choice.

Describe alternatives you've considered

Handling outside of Pandas is possible, as shown above. Anyway this results in widespread boilder-plate code and violates the idea of code centralisation and functional encapsulation.

Additional context

...

@olippuner olippuner added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 26, 2021
@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO SQL to_sql, read_sql, read_sql_query and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 26, 2021
@dsaxton
Copy link
Member

dsaxton commented Mar 26, 2021

Seems like a reasonable idea. Would also suggest having the argument be a boolean since there are only two possible values, and name something like fail_if_not_exists. A better way to introduce would probably be to set the default to False so behavior isn't changed, and then possibly switch the default to True later (although I'm not sure it'd be desirable to do that).

@lithomas1 lithomas1 added this to the Contributions Welcome milestone Mar 31, 2021
@mschmookler
Copy link
Contributor

take

@mschmookler
Copy link
Contributor

As another possibility we could add an additional option for if_exists. Such as append_existing. And then just make it clear that this fails if table does not exist.

If we were to add the extra boolean param for if_no_exist then we get three additional possibilities when taken together with if_exists, and two of them make little sense. I can't see any use for if_exists='fail', if_no_exist='fail', and same for if_exists='replace', if_no_exist='fail'. So for those reasons, I'm against adding this as a new param.

I'd love to hear thoughts on this, or if anyone can think of a better name than append_existing.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

6 participants