Numpy 1.8 `DeprecationWarning` in compat/scipy.py #5824

gdraps · 2014-01-02T18:30:31Z

Not sure how pressing this is, but with DeprecationWarning enabled, I notice that numpy 1.8 is raising a warning during the following call to describe(). [side note: enabled DeprecationWarning in my test suite after learning that it was changed in py2.7 to "ignore" by default.]

import pandas as pd
import warnings
warnings.simplefilter("once", DeprecationWarning)

df = pd.DataFrame({"A": [1, 2, 3], "B": [1.2, 4.2, 5.2]})
print df.groupby('A')['B'].describe()

stdout:

$ python test_fail.py 
.../pandas/compat/scipy.py:68: DeprecationWarning: using a non-integer
number instead of an integer will result in an error in the future
  score = values[idx]

Here's the full traceback with DeprecationWarning escalated to an error (warnings.simplefilter("error", DeprecationWarning)):

Traceback (most recent call last):
  File "test_fail.py", line 6, in <module>
    print df.groupby('A')['B'].describe()
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 343, in wrapper
    return self.apply(curried)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 424, in apply
    return self._python_apply_general(f)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 427, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 883, in apply
    res = f(group)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 422, in f
    return func(g, *args, **kwargs)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/groupby.py", line 329, in curried
    return f(x, *args, **kwargs)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/series.py", line 1386, in describe
    lb), self.median(), self.quantile(ub),
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/core/series.py", line 1316, in quantile
    result = _quantile(valid_values, q * 100)
  File "/home/gmd/ENV/pandas-master-2/lib/python2.7/site-packages/pandas-0.13.0_29_g97860a1-py2.7-linux-i686.egg/pandas/compat/scipy.py", line 68, in scoreatpercentile
    score = values[idx]
IndexError: cannot convert index to integer

The text was updated successfully, but these errors were encountered:

jreback · 2014-01-02T18:52:25Z

can you show what values and idx are at that point (and what valid_values and q are coming in?

gdraps · 2014-01-02T19:07:32Z

values == array([ 1.2])
idx == 0.0
valid_values == array([ 1.2])
q == 0.25

Can repro this with plain numpy as:

In [1]: import warnings
In [2]: warnings.simplefilter('error', DeprecationWarning)
In [3]: np.array(range(10))[1.0]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-2e544ff0834f> in <module>()
----> 1 np.array(range(10))[1.0]

IndexError: cannot convert index to integer
In [4]: warnings.simplefilter('ignore', DeprecationWarning)
In [5]: np.array(range(10))[1.0]
Out[5]: 1

jreback · 2014-01-02T19:26:01Z

ahh...i see you are trying to index with floats as the indexer (when its convertible to an int)....its 'accepted'...but in general not a good idea.

I am going to warn on this in 0.14 too, see here:http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#float64index-api-change (the very last part).

gdraps · 2014-01-02T19:38:51Z

I think pandas is generating the float index internally in compat/scipy.py. If I call scipy.stats.scoreatpercentile(np.array([ 1.2]), 25.0) directly with scipy version 0.13.2, I don't see this warning..

jreback · 2014-01-02T19:44:59Z

if you have your above example (the top example) fail, can you print out those values (they should be the valid_values), prob a float. not sure why that would cause a problem (its a numpy array)

gdraps · 2014-01-02T20:13:16Z

Here are the values again for the original example:
values == array([ 1.2]), idx == 0.0, valid_values == array([ 1.2]), q == 0.25

Looks to me like pandas has an old version of scipy scoreatpercentile and needs to copy a new version to prevent indexing a numpy ndarray with a float, which was deprecated in numpy 1.8.

jreback · 2014-01-02T20:16:38Z

idx is computed by scipy
afaik pandas is just passing simple stuff
can u see where scipy deprecated this (eg the original issue)
maybe need to call a different routine?

gdraps · 2014-01-02T21:49:44Z

I didn't have scipy installed when I the fail, so I think that idx is
computed by a copy of a scipy function that lives in pandas. According to the
compat/scipy.py header, it was copied to avoid a dependency on scipy.

Looks like scipy.stats.scoreatpercentile was last changed in commit
jjhelmus/scipy@1cdd08b
to accept sequences of percentiles, and also not index an ndarray
with a float..

gdraps · 2014-01-02T22:04:58Z

for reference:
pandas/compat/scipy.py created by #1092
non-integer ndarray indexing deprecated in numpy/numpy#3243

jreback · 2014-01-02T22:55:16Z

ahh...I see now...

ok...so basically that module then needs updating....

care to submit a PR (and prob need some tests their)

we have a 'soft' dep on scipy...but this is such a common thing its fine to have it 'built' in

so will call this a 'bug' then

gdraps · 2014-01-02T23:25:00Z

Sure, I will draft a pr
On Jan 2, 2014 5:55 PM, "jreback" [email protected] wrote:

ahh...I see now...

ok...so basically that module then needs updating....

care to submit a PR (and prob need some tests their)

we have a 'soft' dep on scipy...but this is such a common thing its fine
to have it 'built' in

so will call this a 'bug' then

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5824#issuecomment-31492223
.

juliantaylor · 2014-01-17T23:29:02Z

note that (still unreleased) numpy 1.9 will have a percentile that should be able to replace scipy.scoreatpercentile both in performance and features.
it uses partition instead of sort which is faster and will support extended axes (ext axes is not merged yet but should be soon)
as you are a user of percentile maybe you want to give numpy 1.9.dev a try and see if it works for you.

jreback · 2014-02-18T01:46:28Z

@gdraps could you submit a PR for this?

(and for numpy 1.9 should take advantage of the changes)....

jreback · 2014-03-29T00:05:12Z

@gdraps PR for this?

gdraps · 2014-03-30T04:20:32Z

Sent PR #6740 to fix the core issue, though it doesn't take advantage of numpy.percentile, which has been in numpy in a form that appears compatible with pandas's usage since 1.5, best I can tell.

When I tried to simply replace scoreatpercentile in core/frame.py and core/series.py with numpy.percentile, while using numpy 1.8, two tests below failed.

======================================================================
FAIL: test_timedelta_ops (pandas.tseries.tests.test_timedeltas.TestTimedeltas)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmd/github_shadow/pandas/pandas/tseries/tests/test_timedeltas.py",
  line 203, in test_timedelta_ops
    tm.assert_almost_equal(result, expected)
  File "testing.pyx", line 58, in pandas._testing.assert_almost_equal
  (pandas/srcAssertionError: numpy.timedelta64(2599999999,'ns') !=
  numpy.timedelta64(2600000000,'ns')

======================================================================
FAIL: test_quantile (pandas.tests.test_series.TestSeries)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmd/github_shadow/pandas/pandas/tests/test_series.py", line 2115,
  in test_quantile
    self.assertEqual(q, scoreatpercentile(self.ts.valid(), 10))
AssertionError: -1.2926251727455667 != -1.2926251727455669
----------------------------------------------------------------------

juliantaylor · 2014-03-30T12:19:17Z

the way the fractions are computed are not the same in your function and numpy so you get slight rounding errors, numpy computes:

(1 - fraction) * low + fraction * high

while your code has one operation less:

low + (high - low) * fraction

maybe we could change numpy to that method if you expect it causes issues, but relying on exact results for floating point operations is usually not a good idea in high level programs without tight control on the operations and rounding modes

jreback · 2014-03-30T14:09:56Z

@gdraps I would be happy just dropping entirely pandas.compat.scipy/_quantile in favor of using the numpy method. Then just change the test_quantile test to compare against the numpy method iteself. (and just remove this part of the scipy dep).

Not sure why this was not done originally. Pls also add a test for using datetime64[ns] as I suspect this fails. (look at the isin method to see how to do this).
.

gdraps · 2014-04-05T15:11:21Z

re: datetime[ns], with pandas 0.11 and numpy 1.7.1, .quantile on a datetime[ns] series returns an np.datetime64 object:

In [35]: s
Out[35]: 
0   2014-01-31 00:00:00
1   2014-02-28 00:00:00
2   2014-03-31 00:00:00
3   2014-04-30 00:00:00
4   2014-05-31 00:00:00
Name: 0, dtype: datetime64[ns]

In [36]: s.quantile(.25)
Out[36]: numpy.datetime64('2014-02-27T19:00:00.000000000-0500')

Is the intended behavior for .quantile to return a Timestamp object? e.g.:

In [63]: s.quantile(.25)
Out[63]: Timestamp('2014-02-28 00:00:00')

jreback · 2014-04-05T15:18:08Z

hmm..it should, but I suspect its not being inferred at all (IIRC I had to put it in manually for timedeltas, so prob need a check for similar for datetimes).

want to add that and a test or 2? (and add a separate release note for that change as well)
same PR is fine

gdraps · 2014-04-05T15:31:03Z

Yup. Can add that later today.
On Apr 5, 2014 11:18 AM, "jreback" [email protected] wrote:

hmm..it should, but I suspect its not being inferred at all (IIRC I had to
put it in manually for timedeltas, so prob need a check for similar for
datetimes).

want to add that and a test or 2?

Reply to this email directly or view it on GitHubhttps://github.com//issues/5824#issuecomment-39641152
.

jreback modified the milestones: 0.15.0, 0.14.0 Mar 29, 2014

gdraps mentioned this issue Mar 30, 2014

BUG: fix ndarray indexing with float in compat/scipy.py #6740

Closed

jreback modified the milestones: 0.14.0, 0.15.0 Mar 30, 2014

gdraps mentioned this issue Apr 5, 2014

CLN: replace pandas.compat.scipy.scoreatpercentile with numpy.percentile #6810

Merged

jreback closed this as completed in #6810 Apr 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numpy 1.8 `DeprecationWarning` in compat/scipy.py #5824

Numpy 1.8 `DeprecationWarning` in compat/scipy.py #5824

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

juliantaylor commented Jan 17, 2014

jreback commented Feb 18, 2014

jreback commented Mar 29, 2014

gdraps commented Mar 30, 2014

juliantaylor commented Mar 30, 2014

jreback commented Mar 30, 2014

gdraps commented Apr 5, 2014

jreback commented Apr 5, 2014

gdraps commented Apr 5, 2014

Numpy 1.8 DeprecationWarning in compat/scipy.py #5824

Numpy 1.8 DeprecationWarning in compat/scipy.py #5824

Comments

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

gdraps commented Jan 2, 2014

jreback commented Jan 2, 2014

gdraps commented Jan 2, 2014

juliantaylor commented Jan 17, 2014

jreback commented Feb 18, 2014

jreback commented Mar 29, 2014

gdraps commented Mar 30, 2014

juliantaylor commented Mar 30, 2014

jreback commented Mar 30, 2014

gdraps commented Apr 5, 2014

jreback commented Apr 5, 2014

gdraps commented Apr 5, 2014

Numpy 1.8 `DeprecationWarning` in compat/scipy.py #5824

Numpy 1.8 `DeprecationWarning` in compat/scipy.py #5824