Skip to content

BUG: Fix issue with old-style usage in convert_objects #10602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 6, 2015

Conversation

bashtage
Copy link
Contributor

Fix to temporary allow passing 'coerce' to variables

closes #10601

@jreback
Copy link
Contributor

jreback commented Jul 16, 2015

lgtm

@jreback jreback added Bug Deprecate Functionality to remove in pandas labels Jul 16, 2015
@jreback jreback added this to the 0.17.0 milestone Jul 16, 2015
@jorisvandenbossche
Copy link
Member

I don't think this handles the case where you did two different types at the same time? Eg df.convert_objects(convert_numeric=True, convert_dates='coerce') ?

@bashtage
Copy link
Contributor Author

That case was wrong in 0.16 so I don't think it should be fixed. Coerce should mean one and only one type allowed.

The only question is whether I should be setting the non-coerce string to False, or just pass them through.

A different strategy would be to set any input that is coerce to True and ensure coerce=True and then let the error raise.

@jorisvandenbossche
Copy link
Member

An example with simultaneous numeric and dates, which already worked in 0.15. Was it a bug that it worked?

In [1]: df = pd.DataFrame({'a':[1.0, '2.3'], 'b':['1', "4"], 'c':['a', pd.Timestamp('2012-01-01') ]})

In [2]: df.dtypes
Out[2]:
a    object
b    object
c    object
dtype: object

In [3]: df.convert_objects(convert_numeric=True, convert_dates='coerce')
Out[3]:
     a  b          c
0  1.0  1        NaT
1  2.3  4 2012-01-01

In [4]: df.convert_objects(convert_numeric=True, convert_dates='coerce').dtypes
Out[4]:
a           float64
b             int64
c    datetime64[ns]
dtype: object

In [5]: pd.__version__
Out[5]: '0.15.2'

@bashtage
Copy link
Contributor Author

import datetime as dt
df = pd.DataFrame({'a':['1.','2010-01-01',dt.datetime.now()-dt.datetime.now()]})

Any of these either don't work or don't make sense:

df.convert_objects(convert_dates='coerce', convert_numeric='coerce',convert_timedeltas=True)
df.convert_objects(convert_numeric=True,convert_timedelta='coerce')
df.convert_objects(convert_dates=True, convert_numeric=True,convert_timedeltas=True)

The issue is there are was a a very specific order as to what happens, and once one has satisied some criteria, then the others weren't attempted. For example, in the final one, why is the result numeric?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2015

it only did the conversions (even soft conversions) if it had an object dtype. So all were True (which is weird, but I suppose possible), then it would attemp them in order : datetime, numeric, timedelta. numeric would almost always succeed before timedelta (which is only a soft conversion anyhow).

You will warn if any of the above examples are passed, so that is ok!

@bashtage bashtage force-pushed the convert-object-dep-fix branch 2 times, most recently from 9a5728f to bf1d34e Compare July 20, 2015 15:49
@jorisvandenbossche
Copy link
Member

@bashtage @jreback coming back to this:

  • Ah, I see that now it is disallowed to do multiple conversions when coerce=True. But, nonetheless, it did work previously (with eg df.convert_objects(convert_numeric=True, convert_dates='coerce'), see my example above).
    I was thinking: would it be better to just pass this through, so you get an error instead of a warning?
    it seems reasonable to me to ensure that if someone did this before with the old style, this keeps working (his/her code does not break, the example I give above does not work now on master, and also not yet with this PR I think), but this is not possible with the new logic?
  • Maybe we should also add a note in the docstring of convert_objects that if multiple conversions are done at the same time, this happens with a particular order? And also, that this is not allowed when using coerce=True (you get an error for that, but it is not mentioned in the docstring)

Further, the default was previously to do date conversions. Now it is stated in the docs that the default is changed to do nothing. However, if I try on master, it seems that by default it does do numeric conversions:

In [29]: pd.__version__
Out[29]: '0.16.2+220.ga677217'

In [30]: df = pd.DataFrame({'a':[1.0, '2.3'], 'b':['1', "4"], 'c':['a', pd.Times
tamp('2012-01-01') ]})

In [31]: df.dtypes
Out[31]:
a    object
b    object
c    object
dtype: object

In [32]: df.convert_objects()
Out[32]:
     a  b                    c
0  1.0  1                    a
1  2.3  4  2012-01-01 00:00:00

In [33]: df.convert_objects().dtypes
Out[33]:
a    float64
b      int64
c     object
dtype: object

In [34]: df.convert_objects(numeric=False).dtypes
Out[34]:
a    float64
b      int64
c     object
dtype: object

So it seems that the numeric keyword has no effect.

@jreback
Copy link
Contributor

jreback commented Jul 27, 2015

@jorisvandenbossche The purpose of .convert_objects is conversion. whether it be soft or force. Yes it defaulted to convert datetimes (but only in a trivial case when it was object dtyped but was actually an array of Timestamps). So calling it will do some conversions (like above its a soft numeric conversion).

So this should not generally be necessary from a callers perspective for datetimes/timedeltas as we already have a well-defined way of doing this (e.g. to_datetime/to_timedelta). (though it IS used internally).

How about adding a to_numeric analagously? and making .convert_objects private. These are kind of like .astype but more general.

Downside is you cannot then say to pandas "just figure this out", by doing .convert_objects(datetime=True, timedelta=True, numeric=True). (which IMHO is kind of dumb anyhow though).

@jreback
Copy link
Contributor

jreback commented Aug 16, 2015

@bashtage can you update and respond to above comments.

@bashtage
Copy link
Contributor Author

What was the decision? Make convert_objects private and add a to_numeric, so that there would be a trio of primitive coercers? I suppoe if this is the way forward I would restore convert_objects to its previous version (and fix the big bug in it) and then start its deprecation.

@jreback
Copy link
Contributor

jreback commented Aug 17, 2015

I think it would be ok to add to_numeric as we already have to_datetime/to_timedelta. We could leave .convert_objects as is (even though you are slightly changing the defaults). I don't think that is a big deal. What is the bug?

@bashtage
Copy link
Contributor Author

The original covert_objects has a path (with datetime conversion) that doesn't return the right thing. Something like

new_values = some_fn(x)
# A step like this is missing
if not np.isnan(new_values).all():
    values = new_values
# End missing
return values

This isn't in master since my modified convert_objects is quite a bit different.

@jreback
Copy link
Contributor

jreback commented Aug 17, 2015

ok so their was an old bug which you fixed, but what are we missing with the new one that is necessary for back-compat? can you give an exmple

@bashtage
Copy link
Contributor Author

ok so their was an old bug which you fixed, but what are we missing with the new one that is necessary for back-compat? can you give an exmple

The version currently in master has a different signature. In particular, coerce is now a bool. Previously coercing was done using arguments like convert_numeric='coerce'. This was done mostly to make it clear that one can only coerce one type. For example, you can run

df = pd.DataFrame([pd.NaT, '1.', dt.datetime(2015,1,1)]
df.convert_objects(convert_dates='coerce', convert_numeric='coerce'

which always struck me an horribly ambiguous as to what you get - the returned type depends on the order of the code in the function, which seems like a fragile ("magic") design.

This can't happen with the new one since

df = pd.DataFrame([pd.NaT, '1.', dt.datetime(2015,1,1)]
df.convert_objects(datetimes=True, numeric=True, coerce=True)

will raise since this is ambiguous.

I suppose there are a few different options:

  1. Restore to the 0.16.2 version, plus fix the bug
  2. Option 1 plus check for ambiguous calls
  3. Option 2 plus keep new keyword argument names
  4. Go forward with a fundamentally different signature, which is in master

There is one final issue with covert_objects, which is what started me down this path, which is what to do about convert_numeric='coerce' which does not actually coerce if there are no numeric types.

Fix to temporary allow passing 'coerce' to variables

closes pandas-dev#10601
@bashtage bashtage force-pushed the convert-object-dep-fix branch from bf1d34e to 6ceb303 Compare August 18, 2015 13:57
@jreback
Copy link
Contributor

jreback commented Aug 20, 2015

@bashtage I think that you have preserved back compat enough. I would add a warning box in the whatsnew and the docs saying if you are relying upon the 'magic' then you need to be forewarned.

Separately I think we should go forward with to_numeric, can you make an issue? (We can just leave .convert_objects or deprecate the entire method in the future then).

@bashtage
Copy link
Contributor Author

I have a PR nearly ready for to_numeric should have that soon.

I'm all for not changing things, but I was under the impression that @jorisvandenbossche felt strongly about supporting the same arguments.

@jreback
Copy link
Contributor

jreback commented Aug 20, 2015

If I am reading the above correctly, then the objection is that the following did work.

df.convert_objects(convert_numeric=True, convert_dates='coerce')

It seems to me that this is easily cause though and might as well raise on this. (e.g. you are seeing the deprecated args, you know that the user is trying to do).

It may be that intercepting this via the decorator is a bit tricky though (as you are using the mapping; maybe preserving the args and then just warning as you see them is better)?

@jreback
Copy link
Contributor

jreback commented Sep 1, 2015

@jorisvandenbossche comments? this lgtm

@jreback
Copy link
Contributor

jreback commented Sep 5, 2015

@jorisvandenbossche ?

jreback added a commit that referenced this pull request Sep 6, 2015
BUG: Fix issue with old-style usage in convert_objects
@jreback jreback merged commit b2756a7 into pandas-dev:master Sep 6, 2015
@jreback
Copy link
Contributor

jreback commented Sep 6, 2015

merging. I think this is fine.

@bashtage bashtage deleted the convert-object-dep-fix branch February 16, 2016 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG in dealing with deprecated argument in convert_objects
3 participants