-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: astype falsely converts inf to integer, patch for Numpy (GH14265) #14343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: astype falsely converts inf to integer, patch for Numpy (GH14265) #14343
Conversation
Current coverage is 85.26% (diff: 100%)@@ master #14343 diff @@
==========================================
Files 144 144
Lines 50981 50980 -1
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 43470 43469 -1
Misses 7511 7511
Partials 0 0
|
@@ -529,6 +529,10 @@ def _astype_nansafe(arr, dtype, copy=True): | |||
|
|||
if np.isnan(arr).any(): | |||
raise ValueError('Cannot convert NA to integer') | |||
elif np.isinf(arr).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can combine both checks using one not np.isfinite(arr).all()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took a while to get to this. You're totally right, hadn't even seen that function. Thanks Joris, I'll change it right now.
82403d5
to
8adc147
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, looks good for the rest!
if np.isnan(arr).any(): | ||
raise ValueError('Cannot convert NA to integer') | ||
if not np.isfinite(arr).all(): | ||
raise ValueError('Cannot convert non-finite values to integer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add between brackets after "non-finite" something like "(NA or inf)", then it is clearer what is meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no problem. I debated adding it, but went for succinctness. I agree it is informative, adding it now.
@@ -79,3 +79,5 @@ Performance Improvements | |||
|
|||
Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Bug in Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call this a "bug in numpy". It is defined behaviour of numpy, which we just don't want to expose to our users and therefore provide an error (as we already do with NaN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, correcting it now.
8adc147
to
f5157d5
Compare
Looks good to me (for 0.20, but can be merged) |
@@ -79,3 +79,5 @@ Performance Improvements | |||
|
|||
Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shawnheide Can you add that it now raises an error instead?
@@ -79,3 +79,5 @@ Performance Improvements | |||
|
|||
Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double-backticks on inf
@@ -79,3 +79,5 @@ Performance Improvements | |||
|
|||
Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say something more like:
Bug in .astype()
where inf
values incorrectly converted to integers.
class TestAsType(tm.TestCase): | ||
|
||
def test_astype_raises(self): | ||
types = [np.int32, np.int64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment
def test_astype_raises(self): | ||
types = [np.int32, np.int64] | ||
values = [np.nan, np.inf] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go in tests/series/tests_dtypes.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I already found it strange I couldn't see any other astype tests
for this_val in values: | ||
s = Series([this_val]) | ||
self.assertRaises(ValueError, s.astype, this_type) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test on DataFrames in the analagous file in frames.
f5157d5
to
86b905a
Compare
Sorry @jorisvandenbossche and @jreback it took so long to make these simple fixes, but I just got them done. I ended up merging the tests in series and frame with the previous test for NaN conversion. Let me know if you have any more requests. Cheers. |
can you rebase (so can check with current code) |
86b905a
to
e8f49eb
Compare
Just rebased and re-ran the tests, everything looks good. Let me know if there's anything else I need to do. |
@@ -361,8 +361,7 @@ def test_astype(self): | |||
arr.astype('i8') | |||
|
|||
arr = SparseArray([0, np.nan, 0, 1], fill_value=0) | |||
msg = "Cannot convert NA to integer" | |||
with tm.assertRaisesRegexp(ValueError, msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the message the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u want to assert the new message would be great
…) (+2 squashed commit) Squashed commits: Update test messages to catch new ValueError Change sparse test for astype to assertRaises(ValueError) instead of regex version
e8f49eb
to
a9a702e
Compare
Thanks for catching that @jreback. Hopefully that should do it. |
@shawnheide Thanks! |
* origin/master: (22 commits) BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343) BUG: Apply min_itemsize to index even when not appending DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788) BLD: missing - on secure BLD: new access token on pandas-dev TST: Test DatetimeIndex weekend offset (pandas-dev#14853) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ...
git diff upstream/master | flake8 --diff
Bug in Numpy causes inf values to be falsely converted to integers. I added a ValueError exception similar to the exception for trying to convert NaN to an integer.