Skip to content

BUG: fix Panel.fillna() ignoring axis parameter #8395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b6692a2
BUG: setitem fails on mixed-type Panel4D
stahlous Nov 1, 2014
6082542
added note to whatsnew
stahlous Nov 1, 2014
a2de1df
BUG: fix panel fillna ignoring axis parameter
stahlous Sep 26, 2014
76998e9
added/fixed tests and re-ordered _is_mixed_type check
stahlous Sep 26, 2014
158d11b
fill axis defaults to stat axis
stahlous Sep 26, 2014
9b1b9c2
implement inplace; simplify a bit
stahlous Sep 26, 2014
194eb87
changed ffill and bfill default axis to None
stahlous Sep 26, 2014
78d484e
fixed inplace implementation
stahlous Sep 26, 2014
384d896
added in @staple's tests
stahlous Sep 28, 2014
d55883f
return explicitly for inplace
stahlous Sep 28, 2014
a527cba
trying Panel.apply
stahlous Sep 28, 2014
507f665
more apply work
stahlous Sep 29, 2014
142ee9c
merge in new tests
stahlous Sep 29, 2014
3e1272e
simplify return
stahlous Sep 29, 2014
1fab8a4
update tests with issue num and spacing before comments
stahlous Sep 30, 2014
6f02fa5
added Panel4D.fillna suport and tests
stahlous Oct 1, 2014
58d450b
added updates to v0.15.0.txt and updated fillna docstring
stahlous Oct 1, 2014
fa3f34b
more straightforward fillna method
stahlous Oct 5, 2014
9548076
fix panel4d tests
stahlous Oct 5, 2014
40a757c
remove 'convert_numeric=True'
stahlous Oct 12, 2014
b1ecb43
implement inplace in memory efficient way
stahlous Oct 13, 2014
a147061
invoke convert_objects with copy=False
stahlous Oct 13, 2014
adb9379
implement categorical preservation
stahlous Oct 15, 2014
36d0dd8
rework categories implementation
stahlous Oct 16, 2014
3eef736
minor cleanup
stahlous Oct 16, 2014
cfab77e
another way of going about it
stahlous Oct 31, 2014
6e7d60d
Merge branch 'p4d_setitem_bug' into fillna_bug
stahlous Nov 1, 2014
a670571
more fixing
stahlous Nov 1, 2014
4cb00c5
Merge branch 'p4d_setitem_bug' into fillna_bug
stahlous Nov 1, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.15.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -227,3 +227,5 @@ Bug Fixes
- Fixed a bug where plotting a column ``y`` and specifying a label would mutate the index name of the original DataFrame (:issue:`8494`)

- Bug in ``date_range`` where partially-specified dates would incorporate current date (:issue:`6961`)

- Fixed a bug that prevented setting values in a mixed-type Panel4D
88 changes: 42 additions & 46 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,14 @@
from pandas.core.common import (isnull, notnull, is_list_like,
_values_from_object, _maybe_promote,
_maybe_box_datetimelike, ABCSeries,
SettingWithCopyError, SettingWithCopyWarning)
SettingWithCopyError, SettingWithCopyWarning,
CategoricalDtype)
import pandas.core.nanops as nanops
from pandas.util.decorators import Appender, Substitution, deprecate_kwarg
from pandas.core import config
from pandas.core.categorical import Categorical

from itertools import product

# goal is to be able to define the docs close to function, while still being
# able to share
Expand Down Expand Up @@ -2237,33 +2241,32 @@ def convert_objects(self, convert_dates=True, convert_numeric=False,
#----------------------------------------------------------------------
# Filling NA's

def fillna(self, value=None, method=None, axis=0, inplace=False,
def fillna(self, value=None, method=None, axis=None, inplace=False,
limit=None, downcast=None):
"""
Fill NA/NaN values using the specified method

Parameters
----------
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of
values specifying which value to use for each index (for a Series) or
column (for a DataFrame). (values not in the dict/Series/DataFrame will not be
filled). This value cannot be a list.
axis : {0, 1}, default 0
* 0: fill column-by-column
* 1: fill row-by-row
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change arguments orders

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the order in the function signature. I just noticed the docstring was not ordered the same way as the signature, so I made them consistent. I can change them back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you are right. this is fine

pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
axis : {0, 1, 2, 3}, defaults to the stat axis
The stat axis is 0 for Series and DataFrame, 1 for Panel, and 2 for Panel4D
inplace : boolean, default False
If True, fill in place. Note: this will modify any
other views on this object, (e.g. a no-copy slice for a column in a
DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible,
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible)

Expand All @@ -2275,54 +2278,47 @@ def fillna(self, value=None, method=None, axis=0, inplace=False,
-------
filled : same type as caller
"""
if isinstance(value, (list, tuple)):
raise TypeError('"value" parameter must be a scalar or dict, but '
'you passed a "{0}"'.format(type(value).__name__))
self._consolidate_inplace()

axis = self._get_axis_number(axis)
method = com._clean_fill_method(method)
if axis is None:
axis = self._stat_axis_number
else:
axis = self._get_axis_number(axis)

from pandas import DataFrame
if value is None:
if method is None:
raise ValueError('must specify a fill method or value')
if self._is_mixed_type and axis == 1:
if inplace:
raise NotImplementedError()
result = self.T.fillna(method=method, limit=limit).T

# need to downcast here because of all of the transposes
result._data = result._data.downcast()

return result

# > 3d
if self.ndim > 3:
raise NotImplementedError(
'Cannot fillna with a method for > 3dims'
)

# 3d
elif self.ndim == 3:
method = com._clean_fill_method(method)

# fill in 2d chunks
result = dict([(col, s.fillna(method=method, value=value))
for col, s in compat.iteritems(self)])
return self._constructor.from_dict(result).__finalize__(self)
off_axes = list(range(self.ndim))
off_axes.remove(axis)
expanded = [list(range(self.shape[x])) for x in off_axes]
frame = self if inplace else self.copy()
for axes_prod in product(*expanded):
slicer = list(axes_prod)
slicer.insert(axis, slice(None))
sl = tuple(slicer)
piece = frame.iloc[sl]
new_data = piece._data.interpolate(method=method,
limit=limit,
inplace=True,
coerce=True)
frame.iloc[sl] = piece._constructor(new_data)

new_data = frame._data
if downcast:
new_data = new_data.downcast(dtypes=downcast)

# 2d or less
method = com._clean_fill_method(method)
new_data = self._data.interpolate(method=method,
axis=axis,
limit=limit,
inplace=inplace,
coerce=True,
downcast=downcast)
else:
if method is not None:
raise ValueError('cannot specify both a fill method and value')

if isinstance(value, (list, tuple)):
raise TypeError('"value" parameter must be a scalar or dict, but '
'you passed a "{0}"'.format(type(value).__name__))

if len(self._get_axis(axis)) == 0:
return self

Expand Down Expand Up @@ -2368,12 +2364,12 @@ def fillna(self, value=None, method=None, axis=0, inplace=False,
else:
return self._constructor(new_data).__finalize__(self)

def ffill(self, axis=0, inplace=False, limit=None, downcast=None):
def ffill(self, axis=None, inplace=False, limit=None, downcast=None):
"Synonym for NDFrame.fillna(method='ffill')"
return self.fillna(method='ffill', axis=axis, inplace=inplace,
limit=limit, downcast=downcast)

def bfill(self, axis=0, inplace=False, limit=None, downcast=None):
def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
"Synonym for NDFrame.fillna(method='bfill')"
return self.fillna(method='bfill', axis=axis, inplace=inplace,
limit=limit, downcast=downcast)
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,7 @@ def _setitem_with_indexer(self, indexer, value):
# if we have a partial multiindex, then need to adjust the plane
# indexer here
if (len(labels) == 1 and
isinstance(self.obj[labels[0]].index, MultiIndex)):
isinstance(self.obj[labels[0]].axes[0], MultiIndex)):
item = labels[0]
obj = self.obj[item]
index = obj.index
Expand Down Expand Up @@ -421,7 +421,7 @@ def can_do_equal_len():

l = len(value)
item = labels[0]
index = self.obj[item].index
index = self.obj[item].axes[0]

# equal len list/ndarray
if len(index) == l:
Expand Down
72 changes: 71 additions & 1 deletion pandas/tests/test_panel.py
Original file line number Diff line number Diff line change
Expand Up @@ -1356,20 +1356,90 @@ def test_fillna(self):
assert_frame_equal(filled['ItemA'],
panel['ItemA'].fillna(method='backfill'))

# Fill forward.
filled = self.panel.fillna(method='ffill')
assert_frame_equal(filled['ItemA'],
self.panel['ItemA'].fillna(method='ffill'))

# With limit.
filled = self.panel.fillna(method='backfill', limit=1)
assert_frame_equal(filled['ItemA'],
self.panel['ItemA'].fillna(method='backfill', limit=1))

# With downcast.
rounded = self.panel.apply(lambda x: x.apply(np.round))
filled = rounded.fillna(method='backfill', downcast='infer')
assert_frame_equal(filled['ItemA'],
rounded['ItemA'].fillna(method='backfill', downcast='infer'))

# Now explicitly request axis 1.
filled = self.panel.fillna(method='backfill', axis=1)
assert_frame_equal(filled['ItemA'],
self.panel['ItemA'].fillna(method='backfill', axis=0))

# Fill along axis 2, equivalent to filling along axis 1 of each
# DataFrame.
filled = self.panel.fillna(method='backfill', axis=2)
assert_frame_equal(filled['ItemA'],
self.panel['ItemA'].fillna(method='backfill', axis=1))

# Fill an empty panel.
empty = self.panel.reindex(items=[])
filled = empty.fillna(0)
assert_panel_equal(filled, empty)

# either method or value must be specified
self.assertRaises(ValueError, self.panel.fillna)
# method and value can not both be specified
self.assertRaises(ValueError, self.panel.fillna, 5, method='ffill')

# can't pass list or tuple, only scalar
self.assertRaises(TypeError, self.panel.fillna, [1, 2])
self.assertRaises(TypeError, self.panel.fillna, (1, 2))

# limit not implemented when only value is specified
p = Panel(np.random.randn(3,4,5))
p.iloc[0:2,0:2,0:2] = np.nan
self.assertRaises(NotImplementedError, lambda : p.fillna(999,limit=1))
self.assertRaises(NotImplementedError, lambda : p.fillna(999, limit=1))

def test_fillna_axis_0(self):
# GH 8395

# Forward fill along axis 0, interpolating values across DataFrames.
filled = self.panel.fillna(method='ffill', axis=0)
nan_indexes = self.panel['ItemB']['C'].index[
self.panel['ItemB']['C'].apply(np.isnan)]

# Values from ItemA are filled into ItemB.
assert_series_equal(filled['ItemB']['C'][nan_indexes],
self.panel['ItemA']['C'][nan_indexes])

# Backfill along axis 0.
filled = self.panel.fillna(method='backfill', axis=0)

# The test data lacks values that can be backfilled on axis 0.
assert_panel_equal(filled, self.panel)

# Reverse the panel and backfill along axis 0, to properly test
# backfill.
reverse_panel = self.panel.reindex_axis(reversed(self.panel.axes[0]))
filled = reverse_panel.fillna(method='bfill', axis=0)
nan_indexes = reverse_panel['ItemB']['C'].index[
reverse_panel['ItemB']['C'].apply(np.isnan)]
assert_series_equal(filled['ItemB']['C'][nan_indexes],
reverse_panel['ItemA']['C'][nan_indexes])

# Fill along axis 0 with limit.
filled = self.panel.fillna(method='ffill', axis=0, limit=1)
a_nan = self.panel['ItemA']['C'].index[
self.panel['ItemA']['C'].apply(np.isnan)]
b_nan = self.panel['ItemB']['C'].index[
self.panel['ItemB']['C'].apply(np.isnan)]

# Cells that are nan in ItemB but not in ItemA remain unfilled in
# ItemC.
self.assertTrue(
filled['ItemC']['C'][b_nan.diff(a_nan)].apply(np.isnan).all())

def test_ffill_bfill(self):
assert_panel_equal(self.panel.ffill(),
Expand Down
98 changes: 97 additions & 1 deletion pandas/tests/test_panel4d.py
Original file line number Diff line number Diff line change
Expand Up @@ -845,11 +845,107 @@ def test_sort_index(self):
# assert_panel_equal(sorted_panel, self.panel)

def test_fillna(self):
# GH 8395
self.assertFalse(np.isfinite(self.panel4d.values).all())
filled = self.panel4d.fillna(0)
self.assertTrue(np.isfinite(filled.values).all())

self.assertRaises(NotImplementedError, self.panel4d.fillna, method='pad')
filled = self.panel4d.fillna(method='backfill')
assert_frame_equal(filled['l1']['ItemA'],
self.panel4d['l1']['ItemA'].fillna(method='backfill'))

panel4d = self.panel4d.copy()
panel4d['str'] = 'foo'

filled = panel4d.fillna(method='backfill')
assert_frame_equal(filled['l1']['ItemA'],
panel4d['l1']['ItemA'].fillna(method='backfill'))

# Fill forward.
filled = self.panel4d.fillna(method='ffill')
assert_frame_equal(filled['l1']['ItemA'],
self.panel4d['l1']['ItemA'].fillna(method='ffill'))

# With limit.
filled = self.panel4d.fillna(method='backfill', limit=1)
assert_frame_equal(filled['l1']['ItemA'],
self.panel4d['l1']['ItemA'].fillna(method='backfill', limit=1))

# With downcast.
rounded = self.panel4d.apply(lambda x: x.apply(np.round))
filled = rounded.fillna(method='backfill', downcast='infer')
assert_frame_equal(filled['l1']['ItemA'],
rounded['l1']['ItemA'].fillna(method='backfill', downcast='infer'))

# Now explicitly request axis 2.
filled = self.panel4d.fillna(method='backfill', axis=2)
assert_frame_equal(filled['l1']['ItemA'],
self.panel4d['l1']['ItemA'].fillna(method='backfill', axis=0))

# Fill along axis 3, equivalent to filling along axis 1 of each
# DataFrame.
filled = self.panel4d.fillna(method='backfill', axis=3)
assert_frame_equal(filled['l1']['ItemA'],
self.panel4d['l1']['ItemA'].fillna(method='backfill', axis=1))

# Fill an empty panel.
empty = self.panel4d.reindex(items=[])
filled = empty.fillna(0)
assert_panel4d_equal(filled, empty)

# either method or value must be specified
self.assertRaises(ValueError, self.panel4d.fillna)
# method and value can not both be specified
self.assertRaises(ValueError, self.panel4d.fillna, 5, method='ffill')

# can't pass list or tuple, only scalar
self.assertRaises(TypeError, self.panel4d.fillna, [1, 2])
self.assertRaises(TypeError, self.panel4d.fillna, (1, 2))

# limit not implemented when only value is specified
p = Panel4D(np.random.randn(3,4,5,6))
p.iloc[0:2,0:2,0:2,0:2] = np.nan
self.assertRaises(NotImplementedError, lambda : p.fillna(999, limit=1))

def test_fillna_axis_0(self):
# GH 8395

# Back fill along axis 0, interpolating values across Panels
filled = self.panel4d.fillna(method='bfill', axis=0)
nan_indexes = self.panel4d['l1']['ItemB']['C'].index[
self.panel4d['l1']['ItemB']['C'].apply(np.isnan)]

# Values from ItemC are filled into ItemB.
assert_series_equal(filled['l1']['ItemB']['C'][nan_indexes],
self.panel4d['l1']['ItemC']['C'][nan_indexes])

# Forward fill along axis 0.
filled = self.panel4d.fillna(method='ffill', axis=0)

# The test data lacks values that can be backfilled on axis 0.
assert_panel4d_equal(filled, self.panel4d)

# Reverse the panel and backfill along axis 0, to properly test
# forward fill.
reverse_panel = self.panel4d.reindex_axis(reversed(self.panel4d.axes[0]))
filled = reverse_panel.fillna(method='ffill', axis=0)
nan_indexes = reverse_panel['l3']['ItemB']['C'].index[
reverse_panel['l3']['ItemB']['C'].apply(np.isnan)]
assert_series_equal(filled['l3']['ItemB']['C'][nan_indexes],
reverse_panel['l1']['ItemB']['C'][nan_indexes])

# Fill along axis 0 with limit.
filled = self.panel4d.fillna(method='bfill', axis=0, limit=1)
c_nan = self.panel4d['l1']['ItemC']['C'].index[
self.panel4d['l1']['ItemC']['C'].apply(np.isnan)]
b_nan = self.panel4d['l1']['ItemB']['C'].index[
self.panel4d['l1']['ItemB']['C'].apply(np.isnan)]

# Cells that are nan in ItemB but not in ItemC remain unfilled in
# ItemA.
self.assertTrue(
filled['l1']['ItemA']['C'][b_nan.diff(c_nan)].apply(np.isnan).all())


def test_swapaxes(self):
result = self.panel4d.swapaxes('labels', 'items')
Expand Down