Skip to content

Numpy 1.8 DeprecationWarning in compat/scipy.py #5824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gdraps opened this issue Jan 2, 2014 · 20 comments · Fixed by #6810
Closed

Numpy 1.8 DeprecationWarning in compat/scipy.py #5824

gdraps opened this issue Jan 2, 2014 · 20 comments · Fixed by #6810
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@gdraps
Copy link
Contributor

gdraps commented Jan 2, 2014

Not sure how pressing this is, but with DeprecationWarning enabled, I notice that numpy 1.8 is raising a warning during the following call to describe(). [side note: enabled DeprecationWarning in my test suite after learning that it was changed in py2.7 to "ignore" by default.]

import pandas as pd
import warnings
warnings.simplefilter("once", DeprecationWarning)

df = pd.DataFrame({"A": [1, 2, 3], "B": [1.2, 4.2, 5.2]})
print df.groupby('A')['B'].describe()

stdout:

$ python test_fail.py 
.../pandas/compat/scipy.py:68: DeprecationWarning: using a non-integer
number instead of an integer will result in an error in the future
  score = values[idx]

Here's the full traceback with DeprecationWarning escalated to an error (warnings.simplefilter("error", DeprecationWarning)):

Traceback (most recent call last):
  File "test_fail.py", line 6, in <module>
    print df.groupby('A')['B'].describe()
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 343, in wrapper
    return self.apply(curried)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 424, in apply
    return self._python_apply_general(f)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 427, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 883, in apply
    res = f(group)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 422, in f
    return func(g, *args, **kwargs)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 329, in curried
    return f(x, *args, **kwargs)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/series.py", line 1386, in describe
    lb), self.median(), self.quantile(ub),
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/series.py", line 1316, in quantile
    result = _quantile(valid_values, q * 100)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/compat/scipy.py", line 68, in scoreatpercentile
    score = values[idx]
IndexError: cannot convert index to integer
@jreback
Copy link
Contributor

jreback commented Jan 2, 2014

can you show what values and idx are at that point (and what valid_values and q are coming in?

@gdraps
Copy link
Contributor Author

gdraps commented Jan 2, 2014

values == array([ 1.2])
idx == 0.0
valid_values == array([ 1.2])
q == 0.25

Can repro this with plain numpy as:

In [1]: import warnings
In [2]: warnings.simplefilter('error', DeprecationWarning)
In [3]: np.array(range(10))[1.0]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-2e544ff0834f> in <module>()
----> 1 np.array(range(10))[1.0]

IndexError: cannot convert index to integer
In [4]: warnings.simplefilter('ignore', DeprecationWarning)
In [5]: np.array(range(10))[1.0]
Out[5]: 1

@jreback
Copy link
Contributor

jreback commented Jan 2, 2014

ahh...i see you are trying to index with floats as the indexer (when its convertible to an int)....its 'accepted'...but in general not a good idea.

I am going to warn on this in 0.14 too, see here:http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#float64index-api-change (the very last part).

@gdraps
Copy link
Contributor Author

gdraps commented Jan 2, 2014

I think pandas is generating the float index internally in compat/scipy.py. If I call scipy.stats.scoreatpercentile(np.array([ 1.2]), 25.0) directly with scipy version 0.13.2, I don't see this warning..

@jreback
Copy link
Contributor

jreback commented Jan 2, 2014

if you have your above example (the top example) fail, can you print out those values (they should be the valid_values), prob a float. not sure why that would cause a problem (its a numpy array)

@gdraps
Copy link
Contributor Author

gdraps commented Jan 2, 2014

Here are the values again for the original example:
values == array([ 1.2]), idx == 0.0, valid_values == array([ 1.2]), q == 0.25

Looks to me like pandas has an old version of scipy scoreatpercentile and needs to copy a new version to prevent indexing a numpy ndarray with a float, which was deprecated in numpy 1.8.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2014

idx is computed by scipy
afaik pandas is just passing simple stuff
can u see where scipy deprecated this (eg the original issue)
maybe need to call a different routine?

@gdraps
Copy link
Contributor Author

gdraps commented Jan 2, 2014

I didn't have scipy installed when I the fail, so I think that idx is
computed by a copy of a scipy function that lives in pandas. According to the
compat/scipy.py header, it was copied to avoid a dependency on scipy.

Looks like scipy.stats.scoreatpercentile was last changed in commit
jjhelmus/scipy@1cdd08b
to accept sequences of percentiles, and also not index an ndarray
with a float..

@gdraps
Copy link
Contributor Author

gdraps commented Jan 2, 2014

for reference:
pandas/compat/scipy.py created by #1092
non-integer ndarray indexing deprecated in numpy/numpy#3243

@jreback
Copy link
Contributor

jreback commented Jan 2, 2014

ahh...I see now...

ok...so basically that module then needs updating....

care to submit a PR (and prob need some tests their)

we have a 'soft' dep on scipy...but this is such a common thing its fine to have it 'built' in

so will call this a 'bug' then

@gdraps
Copy link
Contributor Author

gdraps commented Jan 2, 2014

Sure, I will draft a pr
On Jan 2, 2014 5:55 PM, "jreback" [email protected] wrote:

ahh...I see now...

ok...so basically that module then needs updating....

care to submit a PR (and prob need some tests their)

we have a 'soft' dep on scipy...but this is such a common thing its fine
to have it 'built' in

so will call this a 'bug' then


Reply to this email directly or view it on GitHubhttps://github.com//issues/5824#issuecomment-31492223
.

@juliantaylor
Copy link

note that (still unreleased) numpy 1.9 will have a percentile that should be able to replace scipy.scoreatpercentile both in performance and features.
it uses partition instead of sort which is faster and will support extended axes (ext axes is not merged yet but should be soon)
as you are a user of percentile maybe you want to give numpy 1.9.dev a try and see if it works for you.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

@gdraps could you submit a PR for this?

(and for numpy 1.9 should take advantage of the changes)....

@jreback
Copy link
Contributor

jreback commented Mar 29, 2014

@gdraps PR for this?

@gdraps
Copy link
Contributor Author

gdraps commented Mar 30, 2014

Sent PR #6740 to fix the core issue, though it doesn't take advantage of numpy.percentile, which has been in numpy in a form that appears compatible with pandas's usage since 1.5, best I can tell.

When I tried to simply replace scoreatpercentile in core/frame.py and core/series.py with numpy.percentile, while using numpy 1.8, two tests below failed.

======================================================================
FAIL: test_timedelta_ops (pandas.tseries.tests.test_timedeltas.TestTimedeltas)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmd/github_shadow/pandas/pandas/tseries/tests/test_timedeltas.py",
  line 203, in test_timedelta_ops
    tm.assert_almost_equal(result, expected)
  File "testing.pyx", line 58, in pandas._testing.assert_almost_equal
  (pandas/srcAssertionError: numpy.timedelta64(2599999999,'ns') !=
  numpy.timedelta64(2600000000,'ns')

======================================================================
FAIL: test_quantile (pandas.tests.test_series.TestSeries)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmd/github_shadow/pandas/pandas/tests/test_series.py", line 2115,
  in test_quantile
    self.assertEqual(q, scoreatpercentile(self.ts.valid(), 10))
AssertionError: -1.2926251727455667 != -1.2926251727455669
----------------------------------------------------------------------

@juliantaylor
Copy link

the way the fractions are computed are not the same in your function and numpy so you get slight rounding errors, numpy computes:

(1 - fraction) * low + fraction * high 

while your code has one operation less:

low + (high - low) * fraction

maybe we could change numpy to that method if you expect it causes issues, but relying on exact results for floating point operations is usually not a good idea in high level programs without tight control on the operations and rounding modes

@jreback
Copy link
Contributor

jreback commented Mar 30, 2014

@gdraps I would be happy just dropping entirely pandas.compat.scipy/_quantile in favor of using the numpy method. Then just change the test_quantile test to compare against the numpy method iteself. (and just remove this part of the scipy dep).

Not sure why this was not done originally. Pls also add a test for using datetime64[ns] as I suspect this fails. (look at the isin method to see how to do this).
.

@gdraps
Copy link
Contributor Author

gdraps commented Apr 5, 2014

re: datetime[ns], with pandas 0.11 and numpy 1.7.1, .quantile on a datetime[ns] series returns an np.datetime64 object:

In [35]: s
Out[35]: 
0   2014-01-31 00:00:00
1   2014-02-28 00:00:00
2   2014-03-31 00:00:00
3   2014-04-30 00:00:00
4   2014-05-31 00:00:00
Name: 0, dtype: datetime64[ns]

In [36]: s.quantile(.25)
Out[36]: numpy.datetime64('2014-02-27T19:00:00.000000000-0500')

Is the intended behavior for .quantile to return a Timestamp object? e.g.:

In [63]: s.quantile(.25)
Out[63]: Timestamp('2014-02-28 00:00:00')

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

hmm..it should, but I suspect its not being inferred at all (IIRC I had to put it in manually for timedeltas, so prob need a check for similar for datetimes).

want to add that and a test or 2? (and add a separate release note for that change as well)
same PR is fine

@gdraps
Copy link
Contributor Author

gdraps commented Apr 5, 2014

Yup. Can add that later today.
On Apr 5, 2014 11:18 AM, "jreback" [email protected] wrote:

hmm..it should, but I suspect its not being inferred at all (IIRC I had to
put it in manually for timedeltas, so prob need a check for similar for
datetimes).

want to add that and a test or 2?

Reply to this email directly or view it on GitHubhttps://github.com//issues/5824#issuecomment-39641152
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
3 participants