Skip to content

BUG 0.20.0rc1: Empty series for var/std for frame with non-numeric columns #16116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jcrist opened this issue Apr 24, 2017 · 10 comments · Fixed by #16124
Closed

BUG 0.20.0rc1: Empty series for var/std for frame with non-numeric columns #16116

jcrist opened this issue Apr 24, 2017 · 10 comments · Fixed by #16124
Labels
Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jcrist
Copy link
Contributor

jcrist commented Apr 24, 2017

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.20.0rc1'

In [3]: df = pd.DataFrame({'int': [1, 2, 3, 4],
   ...:                    'float': [1., 2., 3., 4.],
   ...:                    'str': ['a', 'b', 'c', 'd']})

In [4]: df.mean()
Out[4]:
float    2.5
int      2.5
dtype: float64

In [5]: df.std()
Out[5]:
Series([], dtype: float64)

In [6]: df.var()
Out[6]:
Series([], dtype: float64)
@jorisvandenbossche jorisvandenbossche added Bug Regression Functionality that used to work in a prior pandas version labels Apr 24, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Apr 24, 2017
@TomAugspurger
Copy link
Contributor

This is strange... I can reproduce on 0.20rc1, but not master. Rebuilding my dev environment now.

@jorisvandenbossche
Copy link
Member

@jcrist Thanks for the report!

@TomAugspurger
Copy link
Contributor

Ah, so this is only an issue if bottleneck is not installed.

@jorisvandenbossche
Copy link
Member

Hmm, I could reproduce this in my dev env, which has bottleneck installed ..

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 24, 2017

Huh: So running this in script.py

try:
    import bottleneck
    print("have bottleneck")
except ImportError:
    print("No bottleneck")

import pandas as pd
df = pd.DataFrame({'int': [1, 2, 3, 4], 'float': [1., 2., 3., 4.], 'str': ['a', 'b', 'c', 'd']})
print(df.std())
(py3) bash-4.4$ python script.py
have bottleneck
float    1.290994
int      1.290994
dtype: float64
(py3) bash-4.4$ pip uninstall -y bottleneck
Uninstalling Bottleneck-1.2.0:
  Successfully uninstalled Bottleneck-1.2.0
(py3) bash-4.4$ python script.py
No bottleneck
Series([], dtype: float64)
(py3) bash-4.4$

@jorisvandenbossche
Copy link
Member

Hmm, OK, not sure why, but updated my env to latest master (previously was still from last week), and now it works correctly, so in accordance to your findings (as I have bottleneck installed), so ignore my above.

And can confirm it is indeed triggered by not using bottleneck:

In [5]: df.std()
Out[5]: 
float    1.290994
int      1.290994
dtype: float64

In [6]: pd.core.nanops._USE_BOTTLENECK = False

In [7]: df.std()
Out[7]: Series([], dtype: float64)

@TomAugspurger
Copy link
Contributor

Here's the underlying issue, which gets swallowed due to how std is implemented using .apply

In [3]: pd.core.nanops.nanstd(pd.Series([1, 2, 3]))
Out[3]: 1.0

In [4]: pd.core.nanops._USE_BOTTLENECK = False

In [5]: pd.core.nanops.nanstd(pd.Series([1, 2, 3]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    106                 else:
--> 107                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    108             except Exception:

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in nanvar(values, axis, skipna, ddof)
    396         values = values.copy()
--> 397         np.putmask(values, mask, 0)
    398

TypeError: putmask() argument 1 must be numpy.ndarray, not Series

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    106                 else:
--> 107                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    108             except Exception:

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in nanstd(values, axis, skipna, ddof)
    375 def nanstd(values, axis=None, skipna=True, ddof=1):
--> 376     result = np.sqrt(nanvar(values, axis=axis, skipna=skipna, ddof=ddof))
    377     return _wrap_results(result, values.dtype)

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     49                 with np.errstate(invalid='ignore'):
---> 50                     return f(*args, **kwargs)
     51             except ValueError as e:

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    109                 try:
--> 110                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    111                 except ValueError as e:

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in nanvar(values, axis, skipna, ddof)
    396         values = values.copy()
--> 397         np.putmask(values, mask, 0)
    398

TypeError: putmask() argument 1 must be numpy.ndarray, not Series

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    106                 else:
--> 107                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    108             except Exception:

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in nanvar(values, axis, skipna, ddof)
    396         values = values.copy()
--> 397         np.putmask(values, mask, 0)
    398

TypeError: putmask() argument 1 must be numpy.ndarray, not Series

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-5-7b66e79a9b96> in <module>()
----> 1 pd.core.nanops.nanstd(pd.Series([1, 2, 3]))

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     48             try:
     49                 with np.errstate(invalid='ignore'):
---> 50                     return f(*args, **kwargs)
     51             except ValueError as e:
     52                 # we want to transform an object array

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    108             except Exception:
    109                 try:
--> 110                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    111                 except ValueError as e:
    112                     # we want to transform an object array

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in nanstd(values, axis, skipna, ddof)
    374 @bottleneck_switch(ddof=1)
    375 def nanstd(values, axis=None, skipna=True, ddof=1):
--> 376     result = np.sqrt(nanvar(values, axis=axis, skipna=skipna, ddof=ddof))
    377     return _wrap_results(result, values.dtype)
    378

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     48             try:
     49                 with np.errstate(invalid='ignore'):
---> 50                     return f(*args, **kwargs)
     51             except ValueError as e:
     52                 # we want to transform an object array

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    108             except Exception:
    109                 try:
--> 110                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    111                 except ValueError as e:
    112                     # we want to transform an object array

/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in nanvar(values, axis, skipna, ddof)
    395     if skipna:
    396         values = values.copy()
--> 397         np.putmask(values, mask, 0)
    398
    399     # xref GH10242

TypeError: putmask() argument 1 must be numpy.ndarray, not Series

Looking into it now.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2017

pd.core.nanops.nanstd(pd.Series([1, 2, 3]))

@TomAugspurger these are NOT defined for Series and only accept arrays.

@TomAugspurger
Copy link
Contributor

Right, I think that last traceback isn't the real failure.

Do you know what in your .agg PR changed that caused this? I still don't quite see what's going on.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2017

just pushed a PR to fix. was an oversight on nanvar (not directly related).

That apply code is complicated, it was supposed to raise to break out of that, but was raising the wrong exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants