Skip to content

BUG: Fix pd.to_numeric to have consistent behavior for date-like arguments #43315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hec10r
Copy link

@hec10r hec10r commented Aug 31, 2021

pd.to_numeric was having an inconsistent behavior when using in datelike objects and was returning wrong result when using with pd.NaT

In master:

>>> import pandas as pd
>>> from datetime import datetime
>>> from functools import partial
>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan
>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0    1629590400000000000
dtype: int64
>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64
>>>
>>> pd.to_numeric(pd.NaT, errors="coerce")
nan
>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   -9223372036854775808
dtype: int64
>>> pd.Series(pd.NaT).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

In this PR:

>>> import pandas as pd
>>> from datetime import datetime
>>> from functools import partial
>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan
>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0   NaN
dtype: float64
>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64
>>>
>>> pd.to_numeric(pd.NaT, errors="coerce")
nan
>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   NaN
dtype: float64
>>> pd.Series(pd.NaT).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

Please let me know if this approach makes sense so I can add the tests.

@hec10r hec10r changed the title Fix datetime behavior to numeric Fix pd.to_numeric to have consistent behavior for date-like arguments Aug 31, 2021
@hec10r hec10r changed the title Fix pd.to_numeric to have consistent behavior for date-like arguments BUG: Fix pd.to_numeric to have consistent behavior for date-like arguments Aug 31, 2021
@pep8speaks
Copy link

pep8speaks commented Aug 31, 2021

Hello @hec10r! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-31 14:35:27 UTC

@Navaneethan2503
Copy link

i thing already data is consider as object , then why you again defining as a object

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #43280 (comment) as the behavior here is largely correct.

If you would like to address the other bugs noted there, that would be great!

@@ -137,6 +141,15 @@ def to_numeric(arg, errors="raise", downcast=None):
if errors not in ("ignore", "raise", "coerce"):
raise ValueError("invalid error value specified")

# Handle inputs of "date" type as objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm what is this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to change the behavior of pd.to_datetime for date-like objects. Refer to this discussion: #43280 (comment)

result = to_numeric(lis, **kwargs)
expected = to_numeric(ser, **kwargs).values

tm.assert_numpy_array_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you comparing vs a numpy array?

these should use assert_series_equal at a minimum

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because one input was a list and the other input was a pd.Series. So the first ouput is a np.array and the output for series.values is a np.array when series is instance of pd.Series. It could have been done in other way tough;

...
result = to_numeric(lis, **kwargs)
expected = to_numeric(ser, **kwargs)

tm.assert_series_equal(pd.Series(result), expected)

@@ -372,39 +417,6 @@ def test_str(data, exp, transform_assert_equal):
assert_equal(result, expected)


def test_datetime_like(tz_naive_fixture, transform_assert_equal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I was changing the behavior of the function for date-like objects

@hec10r
Copy link
Author

hec10r commented Sep 1, 2021

@jreback Thanks for your review.

This PR will likely be close and I will open a new one based on the discussion that we had in #43280. It would be great to have your input there as well.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 2, 2021
@mroeschke
Copy link
Member

Thanks for the PR but as mentioned in #43280 (comment), this will probably need a different approach. Closing due to being stale

@mroeschke mroeschke closed this Oct 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.to_numeric has an inconsistent behavior for datetime objects
5 participants