REF: sql insert_data operate column-wise to avoid internals #33229

jbrockmendel · 2020-04-02T01:11:32Z

No description provided.

WillAyd · 2020-04-02T19:32:34Z

pandas/io/sql.py

-                #  object array of Timedeltas
-                d = b.values.astype(object)
+
+        for i in range(len(temp.columns)):


Is the issue with using .items that it can provide a frame back in case of duplicate labels? If so wonder if we shouldn't just update .items to do what you have done here

you mean BlockManager.items? i dont see how thats relevant here

temp is a DataFrame, no?

yes. this is just doing this column-wise instead of block-wise to avoid accessing ._data

jorisvandenbossche · 2020-04-02T13:54:23Z

pandas/io/sql.py

+
+        for i in range(len(temp.columns)):
+            ser = temp.iloc[:, i]
+            vals = ser._values


I think we start with not using iloc for this pattern (that gives a lot of overhead).
Not that it will matter very much in this case, though, I suppose, since we are converting to object dtype below which will be more costly.

See eg the helper function you asked me to remove in 07372e3

sure, _ixs works here

Although _ixs helps, the big win actually comes from avoiding creating a series if all you need are the values (as I did in the linked snippet)

OK. Until #33252 goes through, any objection to _ixs here?

i agree with jeff on this one

It's exactly the goal of the helper function I am adding in #33252 to be used in situations where only the array values are needed, like here is the case.

So if we are adding such helper method, why not use it? (the reason I did the PR is because I want to see it used in several places, and many of those places will be outside of frame.py, eg in the ops code)

If the privateness of the method is a problem, let's discuss making it public. Although I personally think that is not needed right now (and we are using plenty of semi-private methods on DataFrame/Series outside of frame.py already)

lets revisit that once #33252 is merged

This PR certainly does not need to wait on #33252 being merged, but I still first want to have the discussion whether we find this an appropriate place to use that function here, once it exists.

I don't think we should be using internal functions except in the internals. This is not internal by any stretch (would say that pandas.io is offlimits entirely)

…-data-sql

jreback · 2020-04-03T17:44:54Z

pandas/io/sql.py

+
+        for i in range(len(temp.columns)):
+            ser = temp.iloc[:, i]
+            vals = ser._values


I really don't like using private methods like this, what is wrong with .iloc here?

jorisvandenbossche · 2020-04-06T17:44:43Z

pandas/io/sql.py

+
+        for i in range(len(temp.columns)):
+            ser = temp.iloc[:, i]
+            vals = ser._values


This PR certainly does not need to wait on #33252 being merged, but I still first want to have the discussion whether we find this an appropriate place to use that function here, once it exists.

…-data-sql

jreback · 2020-04-06T22:24:59Z

pandas/io/sql.py

-                d = b.values.astype(object)
+
+        for i in range(len(temp.columns)):
+            ser = temp.iloc[:, i]


you should just use .items() here no? which yields the name and column

we're going to need i below

Hinted at this in my first comment but I do also agree it would be nice to use items - we seem to special case it a lot (I've done so in groupby) I think just to work around the fact that it can return a DataFrame for duplicate labels

If we didn't do that maybe we could just enumerate over df.items() and have one way of doing things

Hinted at this in my first comment

Oh! you meant DataFrame.items! I thought you were referring to BlockManager.items, which matches DataFrame.columns

yes (with an enumerate on top if you need the i)

updated to use enumerate and .items

This looks good. Only comment I have is that this probably will fail with duplicate column labels if not already tested.

DataFrame.items has a check for columns.is_unique; is that check not enough?

Ah thanks; misunderstanding on my end

…-data-sql

jorisvandenbossche · 2020-04-07T07:14:09Z

I don't think we should be using internal functions except in the internals. This is not internal by any stretch (would say that pandas.io is offlimits entirely)

Can you please explain a bit more why you don't want a method like DataFrame._iter_column_arrays to be used outside of frame.py ?

Yes, we want to use as much public functions in the other pandas modules. But we do have a set of "private" methods that we use ourselves, that I would see as "internal developer APIs". For example, we also use Series._values, Series._can_hold_na, Index._get_level_values (this are a few examples currently being used in sql.py), and many others, outside core/series.py or core/indexes/.
Or do you find that those should also not be used?

Or, do you want to limit the use here because it is outside pandas/core? (initially, you commented you didn't want private DataFrame methods to be used outside of frame.py, but maybe you meant outside pandas/core, and within all of /core it is fine to use those? In which case the argument would only be about pandas.io, pandas.plotting, pandas.tseries, more or less)
Just trying to understand your argument.

jbrockmendel · 2020-04-07T14:53:26Z

Can you please explain a bit more why you don't want a method like DataFrame._iter_column_arrays to be used outside of frame.py ?

Can we please bike-shed this elsewhere?

WillAyd

lgtm I think a nice cleanup

REF: sql insert_data operate column-wise to avoid internals

930a6ec

WillAyd reviewed Apr 2, 2020

View reviewed changes

WillAyd added Clean IO SQL to_sql, read_sql, read_sql_query labels Apr 2, 2020

jorisvandenbossche reviewed Apr 2, 2020

View reviewed changes

jbrockmendel added 2 commits April 2, 2020 14:39

Merge branch 'master' of https://github.com/pandas-dev/pandas into no…

b69f466

…-data-sql

iloc->_ixs

8834ac1

jorisvandenbossche mentioned this pull request Apr 3, 2020

INT: provide helpers for accessing the values of DataFrame columns #33252

Merged

jreback requested changes Apr 3, 2020

View reviewed changes

jorisvandenbossche requested changes Apr 6, 2020

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into no…

7328cec

…-data-sql

jreback requested changes Apr 6, 2020

View reviewed changes

jbrockmendel added 3 commits April 6, 2020 15:48

Merge branch 'master' of https://github.com/pandas-dev/pandas into no…

a480ddd

…-data-sql

Merge branch 'master' of https://github.com/pandas-dev/pandas into no…

6ce360b

…-data-sql

use items

eae5b1e

WillAyd approved these changes Apr 7, 2020

View reviewed changes

jorisvandenbossche merged commit b195a67 into pandas-dev:master Apr 7, 2020

jbrockmendel deleted the no-data-sql branch April 7, 2020 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: sql insert_data operate column-wise to avoid internals #33229

REF: sql insert_data operate column-wise to avoid internals #33229

jbrockmendel commented Apr 2, 2020

WillAyd Apr 2, 2020

jbrockmendel Apr 2, 2020

WillAyd Apr 2, 2020

jbrockmendel Apr 2, 2020

jorisvandenbossche Apr 2, 2020

jbrockmendel Apr 2, 2020

jorisvandenbossche Apr 3, 2020

jorisvandenbossche Apr 3, 2020

jbrockmendel Apr 3, 2020

jbrockmendel Apr 3, 2020

jorisvandenbossche Apr 6, 2020

jbrockmendel Apr 6, 2020

jorisvandenbossche Apr 6, 2020

jreback Apr 6, 2020

jreback Apr 3, 2020

jorisvandenbossche Apr 6, 2020

jreback Apr 6, 2020

jbrockmendel Apr 6, 2020

WillAyd Apr 6, 2020

jbrockmendel Apr 6, 2020

jreback Apr 6, 2020

jbrockmendel Apr 7, 2020

WillAyd Apr 7, 2020

jbrockmendel Apr 7, 2020

WillAyd Apr 7, 2020

jorisvandenbossche commented Apr 7, 2020

jbrockmendel commented Apr 7, 2020

WillAyd left a comment

REF: sql insert_data operate column-wise to avoid internals #33229

REF: sql insert_data operate column-wise to avoid internals #33229

Conversation

jbrockmendel commented Apr 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 7, 2020

jbrockmendel commented Apr 7, 2020

WillAyd left a comment

Choose a reason for hiding this comment