-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Adding (Insert or update if key exists) option to .to_sql #14553 #29636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t 88a5a481b git-subtree-dir: vendor/github.com/V0RT3X4/python_utils git-subtree-split: 88a5a481b5dbec610e762df862fd69918c1b77d4
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: [email protected]:V0RT3X4/python_utils.git git-vendor-ref: master
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: [email protected]:V0RT3X4/python_utils.git git-vendor-ref: master
…tch function written down for deleting pkeys
…tch function written down for deleting pkeys
…hed upsert ignore method
…hed upsert ignore method
…t 88a5a481b git-subtree-dir: vendor/github.com/V0RT3X4/python_utils git-subtree-split: 88a5a481b5dbec610e762df862fd69918c1b77d4
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: [email protected]:V0RT3X4/python_utils.git git-vendor-ref: master
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: [email protected]:V0RT3X4/python_utils.git git-vendor-ref: master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are changing so much that this will need extensive review and likley will take an extended time to merge if that even happens.
I would suggest starting with a really simple change and not try to do everything. you are likely to have better results.
on_conflict : {None, 'do_nothing', 'do_update'}, optional | ||
Determine insertion behaviour in case of a primary key clash. | ||
- None: Do nothing to handle primary key clashes, will raise an Error. | ||
- 'do_nothing': Ignore incoming rows with primary key clashes, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is completely different than the other arguments above.
@jreback I appreciate that a change of this size is difficult to review properly. Elaborating on your suggestion of starting simple - do you mean breaking this functionality down into smaller components which can be added to master one at a time e.g. a first PR to add a method which checks if there will be primary key clashes before calling |
@cvonsteg well, if you are refactoring at all, then push this as a separate PR that can be merged first. |
Any chance this will be shipped in the coming 1.4 or 1.5 release? |
this PR has lots of comments which need to be addressed |
@jreback - taking your feedback around smaller changes into consideration, I propose splitting this into the following as separate PRs:
If the above sounds good for you, I will raise these as new PRs instead of trying to untangle what has been done on this branch. |
@cvonsteg sounds ok. key is well tested & documented changes. |
Hi Folks, is there any update on this? |
This needs to be merged and shipped with the latest update! Why it is kept on hold even after passing all of the tests? |
The branch cannot be merged as there are conflicts, and some changes have been requested. |
I'd like to add that I am really rooting for this change. I would immediately make use of the new functionality in my projects. |
Hi folks, really looking forward to this change! I know this change hasn't been merged yet, but any chance we will see this functionality being also added to the GeoPandas (.to_postgis)? |
Hi, |
i'll repeat the same statement as above a simple, well tested and documented change that implements this would be accepted this PR does way too much and is not mergable if someone in the community wants to read the comments and implement the requested changes the core team will review and merge |
Could some of the capacity of the pandas core devs be allocated to the implementation of these changes? I believe this is one of the most wanted features in pandas. |
pandas is all volunteer you can certainly ask but it is up to individuals to work in things |
I wanted to add that I fully understand @jreback 's comments, and I agree that this first attempt was doing too much. Unfortunately, due to a change in circumstances over the past few months, I simply don't have time to work on this at the moment. If anybody wants to pick this feature up, please do! I will be more than happy to give feedback and input on what's been done so far, or to clarify any code that is unclear. |
Would sponsoring for this specific issue work? And what do you think is a fair donation amount for the test-fixes? |
i don't think anyone has capacity code contributions are the most helpful |
@cvonsteg If someone else would want to pick up this feature, should they create separate PRs for the |
Is this still in limbo? This is an import enough topic for me, I'd like to help get it out, if possible. Not sure how to go about it, though. |
I can confirm that most of know solution as external not working due bugs in relation to new pandas and especially sqlalchemy. |
Yes, sorry about that @ManPython, I made a new release for pangres which supports |
which version of pandas contains this implemented and ready to use? |
@sarathsairam This was never implemented as this PR was closed. |
This PR adds SQL
upsert
functionality to theto_sql
method. As outlined in the issue discussion, there will be 2 types of upsert:upsert_delete
- prioritizes new data from incoming dataframe. Deletes clashing records in the database and then inserts new dataframe.upsert_ignore
- prioritizes what is already in the database, over what is coming in from the new dataframe. Deletes records which already exist, from existing dataframe before inserting the rest into the db.Both upserts currently only check for primary key duplicates.
Please note 1 important change over the design proposal made in #14553 -
upsert_delete
andupsert_ignore
have now been made options for theif_exists
arg, rather than themethod
argument. This was revised, because on second consideration,upsert
was deemed mutually exclusive toappend
andreplace
, but not to themethod
argument, so making it part ofif_exists
seemed the more sound design..to_sql
#14553black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff