Skip to content

MAINT: Drop convert_objects from NDFrame #15757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Mar 21, 2017

Deprecated since 0.17.0

xref #11173

""" attempt to coerce any object types to better types return a copy of
the block (if copy = True) by definition we ARE an ObjectBlock!!!!!
"""
Attempt to coerce any object types to more specific data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a whole bunch of code here that i think u can ditch now that was for compat with convert_objects

Copy link
Member Author

@gfyoung gfyoung Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by compat with convert_objects? How so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of this code iirc was to support multiple convert args at the same time (via convert_objects) which is not needed by convert - look at the blame and u shall see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really follow you here. convert_objects was nowhere in the implementation of convert, and it seems like this function is used as a pathway to the replacement for the object conversion functions (e.g. to_timedelta)

@codecov
Copy link

codecov bot commented Mar 21, 2017

Codecov Report

Merging #15757 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15757      +/-   ##
==========================================
- Coverage   90.96%   90.96%   -0.01%     
==========================================
  Files         161      143      -18     
  Lines       49301    49416     +115     
==========================================
+ Hits        44849    44953     +104     
- Misses       4452     4463      +11
Flag Coverage Δ
#multiple ?
#single ?
Impacted Files Coverage Δ
pandas/core/frame.py 97.56% <ø> (-0.25%) ⬇️
pandas/core/generic.py 96.25% <ø> (+3.95%) ⬆️
pandas/core/internals.py 93.62% <ø> (+0.18%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/testing.py 80.85% <0%> (-19.15%) ⬇️
pandas/tools/plotting.py 71.77% <0%> (-10.05%) ⬇️
pandas/io/json/json.py 90.27% <0%> (-9.73%) ⬇️
pandas/conftest.py 94.11% <0%> (-2.32%) ⬇️
pandas/types/concat.py 98.06% <0%> (-1.94%) ⬇️
pandas/io/pickle.py 79.54% <0%> (-1.71%) ⬇️
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9421af...b18e93c. Read the comment docs.

@@ -1845,12 +1845,13 @@ def is_bool(self):
"""
return lib.is_bool_array(self.values.ravel())

# TODO: Refactor when convert_objects is removed since there will be 1 path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look and see when this comment was put in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the fn_kwargs stuff can go back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be removed i mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did. #11173 - I'm not sure I follow what the logic was back then.

Copy link
Member Author

@gfyoung gfyoung Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove fn_kwargs? convert appears to be just a wrapper around all of the object conversion functions as I mentioned above. Doesn't that allow us to pass in keyword arguments specific to each of those functions?

@jorisvandenbossche
Copy link
Member

Can we hold off on this one until we decide on #11221 ?

(@gfyoung, sorry, when I saw you were doing all the deprecations, I wanted to raise this before you did this PR, but forgot it)

@jreback
Copy link
Contributor

jreback commented Mar 21, 2017

@jorisvandenbossche you would have to make a clear case for a useful conversion function
lots of arguments against

not sure it's worth the name either

i would suggest a recipe or expand to_* to a) 2-d b) add a soft option

@jorisvandenbossche
Copy link
Member

you would have to make a clear case

See my comment in #11221 (comment) (at that time I was convinced of its usefulness, would have to dig in again for the details), and some of the other reactions.
But lets keep that discussion there.

@jreback
Copy link
Contributor

jreback commented Mar 21, 2017

@jorisvandenbossche i saw your comment

reviving a bad api is not going to happen
you are welcome to propose a new one that has a new name
but that is orthogonal to this issue

@jreback jreback added the Deprecate Functionality to remove in pandas label Mar 21, 2017
@jreback
Copy link
Contributor

jreback commented Mar 21, 2017

@gfyoung what I mean is that _possibly_convert_objects should just become part of _soft_convert_objects (same file), prob rename -> _maybe_convert_objects. This would be much easier once convert_objects is removed.

This new function is pretty much what @jorisvandenbossche is looking for and potentially could be exposed. It allows soft & hard conversions.

I just ran all tests. It seems that with this PR. _possibly_convert_objects is always called with the default arguments. I believe its actually called with NO arguments (and hence they are the defaults). except for 1 validation test. e.g.

diff --git a/pandas/types/cast.py b/pandas/types/cast.py
index 11a837d..59d0b55 100644
--- a/pandas/types/cast.py
+++ b/pandas/types/cast.py
@@ -568,6 +568,9 @@ def _possibly_convert_objects(values, convert_dates=True, convert_numeric=True,
                               convert_timedeltas=True, copy=True):
     """ if we have an object dtype, try to coerce dates and/or numbers """
 
+    if not (convert_dates == True and convert_numeric == True and convert_timedeltas == True):
+        raise NotImplementedError
+
     # if we have passed in a list or scalar
     if isinstance(values, (list, tuple)):
         values = np.array(values, dtype=np.object_)

pandas.tests.series.test_internals.TestSeriesInternals testMethod=test_convert_objects

@chris-b1
Copy link
Contributor

I think it would be worth exposing whatever the new soft convert api is 0.20 (I haven't looked at it in detail), referencing it in the convert_objects depr message, then deferring convert_objects to the next version, if possible.

I say this because I know there are people (for example, me) who have ignored the convert_objects depr message in a couple cases, in particular working with data where you don't necessarily know the columns. Real instance:

df = pd.read_html(source)[0]  # poorly formatted table, everything inferred to object
                                                # exact columns can vary

df.columns = df.loc[0, :]
df = df.drop(0).dropna()

df = df.convert_objects()

Looking at this again, I realizedf.apply(lambda x: pd.to_numeric(x, errors='ignore')) would also work fine in this case, but that wasn't immediately obvious, and I'm not sure we've done enough handholding (for lack of a better term) to help people transition.

@jorisvandenbossche
Copy link
Member

@chris-b1 Can you comment that in #11221 ?

@gfyoung
Copy link
Member Author

gfyoung commented Mar 21, 2017

@jorisvandenbossche @chris-b1 : You both raise some valid arguments regarding the keeping of convert_objects, so then I guess we have to back to @bashtage and ask it was deprecated in the first place 😄. I am perfectly fine with holding with regards to removal, though there were some minor doc fixes that I would still like to push through at minimum. How does that sound?

@jreback
Copy link
Contributor

jreback commented Mar 21, 2017

@gfyoung

it was deprecated because of the magic "if it has 1 valid value convert it, but if nothing converts then revert it" (maybe should make a song!)

So it becomes data dependent, not a great thing for a default. Sure it could be an option to the user which would be fine.

This further was exacerbated because the conversion would first try to convert to datetime, then if that failed to timdelta, then to numeric. Even more magic here.

So certainly we could add this back, but it would have to be an explict user decision.

@jreback
Copy link
Contributor

jreback commented Mar 21, 2017

this is basically what it does on top of the current functions: #11221 (comment)

@bashtage
Copy link
Contributor

My opinion of convert_objects hasn't changed. As @jreback said, it has a lot of magic and the precise behavior in all situations could only be described by the actual code being executed. This does not strike me as good design.

The real issue here is that the well-defined rules for converting objects have already failed by the time that convert_objects was usually called (e.g., text data was imported from read_* and object was returned). Given that there are not nearly lossless choices available, it seems very difficult to arrive at a reasonable set of rules that don't seem arbitrary.

If there was a need for a convert_objects-like function, then I think it would need to start from scratch by first describing a concrete set of actions that are more aggressive than the default converters but not so as to destroy the utility of this function.

@gfyoung gfyoung force-pushed the convert-objects-remove branch from 05e7910 to 6eb86ff Compare March 21, 2017 17:44
@gfyoung
Copy link
Member Author

gfyoung commented Mar 21, 2017

@jreback : _possibly_convert_objects is in cast.py, which is not affected by this removal. The functions which had the TODO are in core/internals.py I think your proposed refactoring is beyond the scope of this removal, though it can implications on how to rebuild the API afterwards.

@gfyoung gfyoung force-pushed the convert-objects-remove branch 5 times, most recently from 69eeb52 to 145adda Compare March 22, 2017 23:38
@jreback
Copy link
Contributor

jreback commented Mar 22, 2017

@gfyoung btw, no reason for you to keep rebasing this, unless something is actually changing.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 23, 2017

Yeah, sorry, I have been making changes locally, but I haven't pushed anything yet.

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

np by all means if u r changing things

@gfyoung gfyoung force-pushed the convert-objects-remove branch from 145adda to 9b64ba2 Compare March 24, 2017 07:56
@gfyoung
Copy link
Member Author

gfyoung commented Mar 24, 2017

@jreback : Is there anything else that needs to be changed here? Even your proposed diff has already been incorporated into master already it seems.

@jreback
Copy link
Contributor

jreback commented Mar 25, 2017

i think this is fine

even IF we want to have a future conversion function could always bring back the name

i am
+1 in merging

@jorisvandenbossche
Copy link
Member

I am -1 on merging this before a resolution on #11221

@gfyoung gfyoung force-pushed the convert-objects-remove branch 7 times, most recently from e47b85d to 8f5c9bd Compare June 15, 2017 06:44
@gfyoung gfyoung force-pushed the convert-objects-remove branch 4 times, most recently from 22839ea to 8feab44 Compare June 21, 2017 15:27
@gfyoung gfyoung force-pushed the convert-objects-remove branch 4 times, most recently from e501b44 to cf81b37 Compare June 29, 2017 03:24
@gfyoung gfyoung force-pushed the convert-objects-remove branch 5 times, most recently from 9ef4cda to ffe4361 Compare July 7, 2017 05:23
@jreback
Copy link
Contributor

jreback commented Jul 7, 2017

@jorisvandenbossche I don't think waiting on this is healthy. We haven't had any activity w.r.t. #11221. if someone really needs this then a new API can be designed.

@chris-b1

@gfyoung gfyoung force-pushed the convert-objects-remove branch 2 times, most recently from 57dc4db to 73b67d7 Compare July 8, 2017 17:59
@gfyoung
Copy link
Member Author

gfyoung commented Jul 9, 2017

@jreback : I might also point out that you had mentioned moving forward on this PR about a month ago here with no response.

Thus, I am also in favor with merging this AND closing the related issue, which can always be revived in the future by anyone who's interested.

@gfyoung gfyoung force-pushed the convert-objects-remove branch from 73b67d7 to f84278a Compare July 10, 2017 15:07
Deprecated since 0.17.0

xref pandas-devgh-11173

[ci skip]
@gfyoung gfyoung force-pushed the convert-objects-remove branch from f84278a to b18e93c Compare July 12, 2017 04:29
@jreback
Copy link
Contributor

jreback commented Jul 13, 2017

yes, let's close this and proceed as indicated in #11221, create infer_object and change the deprecation message on convert_objects (and schedule for next release). as indicated #11221 (comment)

@jreback jreback closed this Jul 13, 2017
@gfyoung gfyoung deleted the convert-objects-remove branch July 13, 2017 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants