BUG: convert nan to None before insert data into mysql #4200

simomo · 2013-07-11T07:59:57Z

For issue #4199

hayd · 2013-07-11T08:40:47Z

danielballan · 2013-07-11T12:47:06Z

@simomo Can you include a test demonstrating expected behavior? See pandas/io/tests/test_sql.py for examples.

Also, of course including None requires the object dtype. Does this come with a performance cost? @jreback?

In [30]: frame
Out[30]: 
   0         1
0  2       NaN
1  3 -1.029046

In [31]: frame.dtypes
Out[31]: 
0      int64
1    float64
dtype: object

In [32]: frame.where(pd.notnull(frame), None).dtypes
Out[32]: 
0     int64
1    object
dtype: object

jreback · 2013-07-11T13:10:40Z

This is quite complicated and shouldn't be done this way. The main issue is different reprs for datetime/non-datetime. There are better ways of doing this (these are internal routines), e.g. see core/format.py/CSVFormatter/_save_chunk. This has to do with how things are converted/passed to SQL, e.g. whether they need to be stringfied or not.

You are going to need to segregate by block type, then convert or (not-convert) as needed, substituting appropriate 'null' sentinals (which might be different for different flavors?)

hayd · 2013-07-11T13:34:10Z

@jreback Why is it necessary to have different sentinals for NaN and NaT?

(I agree this should be done on write/like _save_chunk...)

jreback · 2013-07-11T13:36:13Z

Its not necessary per se, but I suspect that the different SQL have different sentials (if None works for everything then great)....

for perf though...this may need to be optimized

hayd · 2013-07-11T13:43:07Z

Idea being to abstract problem of None to SQLAlchemy, assuming it Just Works^TM. Which I thought was kind of the point of it...

Yeah, perf could be an issue - in which case we'll end up writing a load of platform specific stuff? :s

jreback · 2013-07-11T13:45:33Z

I am not sure what perf diff will be, just have to profile it. You might simply want to something like:

values = df.values.astype(object)
values[pd.isnull(df)] = None

prob should work and be pretty fast (not 100% sure what this will do to dates though)

danielballan · 2013-07-11T13:46:07Z

http://stackoverflow.com/questions/5401455/is-there-a-database-independent-way-with-sqlalchemy-to-query-filtered-by-none

jreback · 2013-07-11T13:50:48Z

I think you are going to have to do specific backend specific conversions (mostly on NaN/None, but also datetimes). IIRC mysql stores dates as strings? though some of this may be converted from datestimes

hayd · 2013-07-11T13:54:18Z

@danielballan do we care how it's stored... provided the roundtrip works (and probably also that the dtype is sensible) is all good...?

Do we need to know in order to query (for None)? Can't SQLAlchemy compile your query in a clever way (worrying about the platform specific bit), maybe I've got it totally wrong?

danielballan · 2013-07-11T14:07:24Z

No, I don't think we care. The second comment on the SO question is troubling, or at least confusing to me. I think all flavors of SQL just have NULL, and we'll want those to ultimately come out as np.nan. Certainty not 'NaN'.

jreback · 2013-07-11T14:14:11Z

@danielballan you will for sure need to do type conversions on the readback, e.g. make sure dates are correct (you can just use convert_objects(convert_dates='coerce')

you also may want to do convert_objects(convert_numeric=True) on the numeric columns (may only be necessary depending on how results are returned)

jreback · 2013-07-20T15:32:12Z

related to #4163

stared · 2013-09-11T23:39:56Z

Filling NaN with None:

df['col1'].fillna(None)

produces an error:

ValueError: must specify a fill method or value

Is it the same bug as reported in this thread?

jreback · 2013-09-11T23:46:34Z

@stared what are you trying to do?

stared · 2013-09-12T00:09:35Z

@jreback Convert np.nan fields to None values (for dtype=object, of course).

At the same time I can do (i.e. there is no error):

df['col1'].apply(lambda x: None if pd.isnull(x) else x)

which seems to be equivalent to:

df['col1'].fillna(None)

jreback · 2013-09-12T00:12:50Z

@stared

well aside fro the apply be MUCH slower, fillna also does dtype infererence.

I meant what is your purpose in doing this?

stared · 2013-09-12T00:21:52Z

@jreback I am performing an outer join of two tables, so I am getting np.nans.
Later I am using this data to interact with MongoDB and I want to have None for missing fields. (I don't want to make conversions each time when I read from or write to the database.)

BTW: Why apply is much slower? Or, in general, for mapping columns what should be used?

jreback · 2013-09-12T00:24:33Z

ahh...then this is the same issue (has to do with exporting np.nan -> None, or the appropriate if say its NaT)

well, apply is not vectorized so you should do that if at all possible; fillna is cython based so pretty fast.

apply is very general though

cpcloud · 2013-09-12T00:25:37Z

@stared apply will be much slower than most (all?) ops that are built into pandas already. E.g., fillna does a specific thing so it doesn't need to accept an arbitrary Python function like apply does. It is there free to use whatever numpy and maybe Cython is available to do its job. apply must be very general and slow it is going to be slow.

jreback · 2013-09-28T19:45:25Z

@hayd I believe you are going to do this as part of big SQL refactor (and its already linked), so closing

convert nan to None before insert data into mysql

b94650d

hayd mentioned this pull request Jul 20, 2013

ENH: sql support #4163

Closed

20 tasks

jreback closed this Sep 28, 2013

maxgrenderjones mentioned this pull request Jul 23, 2014

ENH: full SQL support for datetime64 values #7103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: convert nan to None before insert data into mysql #4200

BUG: convert nan to None before insert data into mysql #4200

simomo commented Jul 11, 2013

hayd commented Jul 11, 2013

danielballan commented Jul 11, 2013

jreback commented Jul 11, 2013

hayd commented Jul 11, 2013

jreback commented Jul 11, 2013

hayd commented Jul 11, 2013

jreback commented Jul 11, 2013

danielballan commented Jul 11, 2013

jreback commented Jul 11, 2013

hayd commented Jul 11, 2013

danielballan commented Jul 11, 2013

jreback commented Jul 11, 2013

jreback commented Jul 20, 2013

stared commented Sep 11, 2013

jreback commented Sep 11, 2013

stared commented Sep 12, 2013

jreback commented Sep 12, 2013

stared commented Sep 12, 2013

jreback commented Sep 12, 2013

cpcloud commented Sep 12, 2013

jreback commented Sep 28, 2013

BUG: convert nan to None before insert data into mysql #4200

BUG: convert nan to None before insert data into mysql #4200

Conversation

simomo commented Jul 11, 2013

hayd commented Jul 11, 2013

danielballan commented Jul 11, 2013

jreback commented Jul 11, 2013

hayd commented Jul 11, 2013

jreback commented Jul 11, 2013

hayd commented Jul 11, 2013

jreback commented Jul 11, 2013

danielballan commented Jul 11, 2013

jreback commented Jul 11, 2013

hayd commented Jul 11, 2013

danielballan commented Jul 11, 2013

jreback commented Jul 11, 2013

jreback commented Jul 20, 2013

stared commented Sep 11, 2013

jreback commented Sep 11, 2013

stared commented Sep 12, 2013

jreback commented Sep 12, 2013

stared commented Sep 12, 2013

jreback commented Sep 12, 2013

cpcloud commented Sep 12, 2013

jreback commented Sep 28, 2013