Skip to content

BUG: fix Panel.fillna() ignoring axis parameter (re-submission) #11445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

stahlous
Copy link
Contributor

closes #3570
closes #8251

This is a re-submission of PR #8395 and addresses issue #8251. My apologies for the duplicate PR, but despite rebasing to pydata/pandas master, the original PR would not update with my new commits and I couldn't get rid of the LooseVersion() errors in the Travis build.

This PR may need some fleshing out still, but I wanted to submit this to see what the thoughts are on this approach. For this PR I've created a new interpolation mechanism at the block level to implement filling across 3 or more dimensions. This avoids the fiddly problems of trying to implement filling of 3+ dimensions at the frame level. However, it isn't possible with this technique to fill across blocks of different dtypes, although that seems like that should be a rare occurrence.

Incidentally, this will also address #3570, which is one of the issues referenced in #9862.

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 27, 2015
@@ -1764,6 +1764,35 @@ def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None, dt
return values


def interpolate_nd(values, method='pad', axis=0, limit=None, fill_value=None, dtype=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are going to do this, ok, but I think a reorg of all of these little functions is in order. If before this PR (e.g. send another), put all of the interp methods that are in core/common.py into core/missing.py would be helpful

@stahlous
Copy link
Contributor Author

I merged the previous interpolate_2d() and interpolate_nd() functions into a single function and then renamed it simply pad(). I renamed interpolated_1d() to simply interpolate().

Looking through the code, I see a number of places where the terms "pad", "fill", and "interpolate" are used semi-interchangably and with different meanings. I think it would clear things up if, at least internally, these terms had these fixed definitions:

fill: fill in missing values with a static value
pad: fill in missing values with the most recent non-missing value
interpolate: interpolate in the scipy.interpolate sense

I'm not suggesting any changes to the public API, but I would be willing to work on a new PR to try and makes things internally consistent. I don't want to confuse this PR by trying to roll it in here.

In any event, this fix seems to work OK. Not sure if it needs more documentation other than mention of bugs fixed in whatsnew.

if ndim == 3:
if axis == 0:
for n in range(shape[1]):
values[:,n] = func(values[:,n], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks quite duplicative.
this type of if/then is so error prone. just compute the axis to use.

@stahlous
Copy link
Contributor Author

stahlous commented Nov 3, 2015

updated

limit=limit,
fill_value=fill_value,
dtype=self.dtype)
values = mis.pad(values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this import as missing and not mis

@stahlous
Copy link
Contributor Author

stahlous commented Nov 5, 2015

Implemented most of the suggestions - just need clarification on what is meant by about the ndims being the same.


if ndim == 3:
axis = 0 if (axis > 1) else 1
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are changing the meaning of axis here. it is exactly [slice(None)]*ndim indexed by axis for all types.
this is equivalent to .loc[.....] and NOT repeated sectioning. I know that may be confusing, but I said that for your tests are doing incorrect indexing. pls make the changes.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2015

@stahlous do you want to rebase / update. fyi, you might want to squash as well.

@stahlous
Copy link
Contributor Author

stahlous commented Dec 1, 2015

I squashed the commits. Sorry for the delay - had other things going on the past couple weeks. Can you take a look and let me know if I've addressed everything?

@@ -2774,40 +2774,28 @@ def fillna(self, value=None, method=None, axis=None, inplace=False,
# set the default here, so functions examining the signaure
# can detect if something was set (e.g. in groupby) (GH9221)
if axis is None:
axis = 0
axis = self._stat_axis_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to document this 'change' in the doc-string

@jreback
Copy link
Contributor

jreback commented Jan 20, 2016

closing, we are going to deprecate Panel in 0.18.0

@jreback jreback closed this Jan 20, 2016
@jreback jreback modified the milestone: 0.18.0 Jan 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
2 participants