-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Faster SQL implementation in d6stack library #28817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What's the actual next step here? You'll need to flesh out your proposal before we can decide whether it's a good idea / worth doing. |
The important piece of information is also to know what they do differently to achieve this better performance, to know if this is something we can do in pandas. |
From looking at the code it uses the bulk insert operations for the respective database types.So it writes a csv file and then loads that into the database using |
I think this issue is too open ended so should refine the ask or close out. You can already write your own bulk load command as shown in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#insertion-method Though that has type preservation downsides |
There is already a parameter called method to switch between normal and multi insert. Im assuming that multi is not an implementation of bulk insert? Then I would propose a new value, where you can choose to use "bulk". The if-statement for the different methods can be found here: Lines 724 to 734 in bee17d5
Implementing this should be done within Pandas, because the d6stack library is a big dependency and implementing shouldn't lead to too many duplicate lines of code. The implementation of the different methods can be found here: Lines 660 to 682 in bee17d5
|
d6stack implements database specific function (Link)
I don't think it is worth to implement these for the gain in performance. The issue can be closed, if the pandas maintainers agree. |
Looks like the project is fairly inactivate and not much interest from the core team. Closing as it seems not worth pursuing. |
Code Sample
https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb
Problem description
In the example it currently takes 28s to upload data to an SQL database. With the d6stack implementation it only takes 4.7s with postgres and 7.1s with mysql which is 4 to 6 times faster than standard pandas.
Expected Output
Improved speed uploading data to an SQL database right out of the box.
The text was updated successfully, but these errors were encountered: