Skip to content

BUG: astype falsely converts inf to integer, patch for Numpy (GH14265) #14343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shawnheide
Copy link
Contributor

Bug in Numpy causes inf values to be falsely converted to integers. I added a ValueError exception similar to the exception for trying to convert NaN to an integer.

@codecov-io
Copy link

codecov-io commented Oct 4, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14343 into master will decrease coverage by <.01%

@@             master     #14343   diff @@
==========================================
  Files           144        144          
  Lines         50981      50980     -1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43470      43469     -1   
  Misses         7511       7511          
  Partials          0          0          

Powered by Codecov. Last update e1f9b96...a9a702e

@@ -529,6 +529,10 @@ def _astype_nansafe(arr, dtype, copy=True):

if np.isnan(arr).any():
raise ValueError('Cannot convert NA to integer')
elif np.isinf(arr).any():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine both checks using one not np.isfinite(arr).all()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took a while to get to this. You're totally right, hadn't even seen that function. Thanks Joris, I'll change it right now.

@jorisvandenbossche jorisvandenbossche added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 4, 2016
@shawnheide shawnheide force-pushed the hotfix/BUG_astype_numpy_inf branch 2 times, most recently from 82403d5 to 8adc147 Compare October 8, 2016 19:25
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, looks good for the rest!

if np.isnan(arr).any():
raise ValueError('Cannot convert NA to integer')
if not np.isfinite(arr).all():
raise ValueError('Cannot convert non-finite values to integer')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add between brackets after "non-finite" something like "(NA or inf)", then it is clearer what is meant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no problem. I debated adding it, but went for succinctness. I agree it is informative, adding it now.

@@ -79,3 +79,5 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Bug in Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call this a "bug in numpy". It is defined behaviour of numpy, which we just don't want to expose to our users and therefore provide an error (as we already do with NaN)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, correcting it now.

@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Oct 10, 2016
@shawnheide shawnheide force-pushed the hotfix/BUG_astype_numpy_inf branch from 8adc147 to f5157d5 Compare October 11, 2016 00:39
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 15, 2016

Looks good to me (for 0.20, but can be merged)
@jreback

@@ -79,3 +79,5 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawnheide Can you add that it now raises an error instead?

@@ -79,3 +79,5 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-backticks on inf

@@ -79,3 +79,5 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Numpy ``astype()`` caused `inf` values to be falsely converted to integers. Affected ``astype()`` in Series and DataFrames (:issue:`14265`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say something more like:

Bug in .astype() where inf values incorrectly converted to integers.

class TestAsType(tm.TestCase):

def test_astype_raises(self):
types = [np.int32, np.int64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

def test_astype_raises(self):
types = [np.int32, np.int64]
values = [np.nan, np.inf]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to go in tests/series/tests_dtypes.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I already found it strange I couldn't see any other astype tests

for this_val in values:
s = Series([this_val])
self.assertRaises(ValueError, s.astype, this_type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test on DataFrames in the analagous file in frames.

@shawnheide shawnheide force-pushed the hotfix/BUG_astype_numpy_inf branch from f5157d5 to 86b905a Compare October 26, 2016 04:24
@shawnheide
Copy link
Contributor Author

Sorry @jorisvandenbossche and @jreback it took so long to make these simple fixes, but I just got them done. I ended up merging the tests in series and frame with the previous test for NaN conversion. Let me know if you have any more requests. Cheers.

@jreback
Copy link
Contributor

jreback commented Dec 6, 2016

can you rebase (so can check with current code)

@shawnheide shawnheide force-pushed the hotfix/BUG_astype_numpy_inf branch from 86b905a to e8f49eb Compare December 7, 2016 05:00
@shawnheide
Copy link
Contributor Author

Just rebased and re-ran the tests, everything looks good. Let me know if there's anything else I need to do.

@@ -361,8 +361,7 @@ def test_astype(self):
arr.astype('i8')

arr = SparseArray([0, np.nan, 0, 1], fill_value=0)
msg = "Cannot convert NA to integer"
with tm.assertRaisesRegexp(ValueError, msg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the message the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u want to assert the new message would be great

…) (+2 squashed commit)

Squashed commits:
Update test messages to catch new ValueError
Change sparse test for astype to assertRaises(ValueError) instead of regex version
@shawnheide shawnheide force-pushed the hotfix/BUG_astype_numpy_inf branch from e8f49eb to a9a702e Compare December 11, 2016 18:27
@shawnheide
Copy link
Contributor Author

Thanks for catching that @jreback. Hopefully that should do it.

@jorisvandenbossche jorisvandenbossche merged commit 0c82abe into pandas-dev:master Dec 11, 2016
@jorisvandenbossche
Copy link
Member

@shawnheide Thanks!

yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
* origin/master: (22 commits)
  BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343)
  BUG: Apply min_itemsize to index even when not appending
  DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788)
  BLD: missing - on secure
  BLD: new access token on pandas-dev
  TST: Test DatetimeIndex weekend offset (pandas-dev#14853)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ...
@shawnheide shawnheide deleted the hotfix/BUG_astype_numpy_inf branch December 14, 2016 05:10
ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

astype(np.int32) converts np.inf to int32 min value
4 participants