Skip to content

ENH: convert masked arrays for Series #20427

Open
@alorenzo175

Description

@alorenzo175

Problem description

When a Series is constructed from a float32, masked numpy array, calling mean() on a resample produces NaNs. This doesn't occur with float64, masked arrays or non-masked float32 arrays. Some operations like first() work while median() raises a value error.

Code Sample, a copy-pastable example if possible

import numpy as np                                                                                                                                             
import pandas as pd                                                                                                                                            


arr32 = np.ma.array([1.0, 2.0, 3.0], mask=[False, False, False], dtype='float32')                                                                              
arr64 = np.ma.array([1.0, 2.0, 3.0], mask=[False, False, False], dtype='float64')
index = pd.date_range(start='2018-03-01 12:00:00Z', end='2018-03-01 12:10:00Z',
                      freq='5min')
                                                                                                                                                                                                                                                                                                                              
ser32 = pd.Series(arr32, index=index)
ser64 = pd.Series(arr64, index=index)

print('float32 masked array')                                                                                                                                  
print(ser32.resample('5min').mean())
print(ser32.resample('10min').mean())

print('float64 masked array')                                                                                                                                  
print(ser64.resample('5min').mean())
print(ser64.resample('10min').mean())

print('non-masked float32')                                                                                                                                    
print(pd.Series(arr32.data, index=index).resample('5min').mean())

ser32.resample('5min').median()

which outputs

float32 masked array                                                           
2018-03-01 12:00:00+00:00   NaN                                                
2018-03-01 12:05:00+00:00   NaN                                                
2018-03-01 12:10:00+00:00   NaN                                                
Freq: 5T, dtype: float32                                                       
2018-03-01 12:00:00+00:00   NaN                                                
2018-03-01 12:10:00+00:00   NaN                                                
Freq: 10T, dtype: float32                                                      
float64 masked array                                                           
2018-03-01 12:00:00+00:00    1.0                                               
2018-03-01 12:05:00+00:00    2.0                                               
2018-03-01 12:10:00+00:00    3.0                                               
Freq: 5T, dtype: float64                                                       
2018-03-01 12:00:00+00:00    1.5                                               
2018-03-01 12:10:00+00:00    3.0                                               
Freq: 10T, dtype: float64                                                      
non-masked float32                                                             
2018-03-01 12:00:00+00:00    1.0                                               
2018-03-01 12:05:00+00:00    2.0                                               
2018-03-01 12:10:00+00:00    3.0                                               
Freq: 5T, dtype: float32 

Traceback (most recent call last):
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 1145, in median
return self._cython_agg_general('median', **kwargs)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 921, in _cython_agg_general
min_count=min_count)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 2314, in aggregate
min_count=min_count)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 2242, in _cython_operation
values = _ensure_float64(values)
File "pandas/_libs/algos_common_helper.pxi", line 3182, in pandas._libs.algos.ensure_float64
File "pandas/_libs/algos_common_helper.pxi", line 3187, in pandas._libs.algos.ensure_float64
TypeError: astype() got an unexpected keyword argument 'copy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/nanops.py", line 128, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/nanops.py", line 386, in nanmedian
values = values.ravel()
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/numpy/ma/core.py", line 4532, in ravel
r._mask = ndarray.ravel(self._mask, order=order).reshape(r.shape)
ValueError: cannot reshape array of size 0 into shape (1,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "bad_pandas.py", line 26, in <module>
ser32.resample('5min').median()
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/resample.py", line 621, in f
return self._downsample(_method)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/resample.py", line 773, in _downsample
self.grouper, axis=self.axis).aggregate(how, **kwargs)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 3121, in aggregate
return getattr(self, func_or_funcs)(*args, **kwargs)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 1156, in median
return self._python_agg_general(f)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 939, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 2591, in agg_series
return grouper.get_result()
File "pandas/_libs/src/reduce.pyx", line 279, in pandas._libs.lib.SeriesBinGrouper.get_result
File "pandas/_libs/src/reduce.pyx", line 265, in pandas._libs.lib.SeriesBinGrouper.get_result
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 933, in <lambda>
f = lambda x: func(x, *args, **kwargs)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/groupby.py", line 1155, in f
return x.median(axis=self.axis, **kwargs)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/generic.py", line 7315, in stat_func
numeric_only=numeric_only)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/series.py", line 2577, in _reduce
return op(delegate, skipna=skipna, **kwds)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/nanops.py", line 77, in _f
return f(*args, **kwargs)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/nanops.py", line 131, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/pandas/core/nanops.py", line 386, in nanmedian
values = values.ravel()
File "/home/user/anaconda/envs/testpd/lib/python3.6/site-packages/numpy/ma/core.py", line 4532, in ravel
r._mask = ndarray.ravel(self._mask, order=order).reshape(r.shape)
ValueError: cannot reshape array of size 0 into shape (1,)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.9-300.fc27.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.4.2
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.0
pytz: 2018.3
blosc: 1.5.1
bottleneck: None
tables: None
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds DiscussionRequires discussion from core team before further actionResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions