Skip to content

BUG: fix Panel.fillna() ignoring axis parameter #8395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from
Closed

BUG: fix Panel.fillna() ignoring axis parameter #8395

wants to merge 29 commits into from

Conversation

stahlous
Copy link
Contributor

This fixes #8251. It may need some more comprehensive tests but before I go too much farther I want to make sure the axis numbering scheme makes sense . The default behavior of fillna is to fill along axis=0. In the case of a DataFrame this fills along the columns. However, in the case of a Panel, filling along axis=0 means you're filling along items. I'm not sure if that's the behavior that most users would think is the default. You might note that I also removed the references to what the behavior of each axis value is in the docstring since it gets a bit arbitrary when you consider both DataFrames and Panels.

@stahlous
Copy link
Contributor Author

@0zeroth please take a look if you can

* 0: fill column-by-column
* 1: fill row-by-row
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change arguments orders

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the order in the function signature. I just noticed the docstring was not ordered the same way as the signature, so I made them consistent. I can change them back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you are right. this is fine

@jreback
Copy link
Contributor

jreback commented Sep 26, 2014

default filling should be along rows for frames (axis=0), this is the stat_axis
I think could be the same for Panel which has a stat_axis of 1

it just needs to be documented
look through other functions and see if this is consistent

# need to downcast here because of all of the transposes
result._data = result._data.downcast()

return result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you taking this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't take it out. I moved it after the check for if it's a 3d object. It doesn't make sense to run this check on a Panel and it can cause errors if you happen to have a mixed type Panel that you're filling along axis 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is a bit tricky I think. I suppose its possible to fill a mixed-type Panel in some cases. But if you see below, I think we should use Panel.apply (which can then call DataFrame.fillna) to fix/solve this problem, so this restrictino needs to remain in place. I suspect some more tests will find if what I am saying is true (or I am FOS). :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should still be functional for panels with mixed_type data. The way panel filling works is the panels are broken down into their constituent dataframes and then those are filled quasi-recursively. When the mixed-type dataframes come around the check will be performed and they'll be handled appropriately.

I will definitely add some tests to make sure this is the case.

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Sep 26, 2014
@jreback jreback added this to the 0.15.1 milestone Sep 26, 2014
@stahlous
Copy link
Contributor Author

I've been trying to implement this with Panel.apply but there's an issue that perhaps I'm just not savvy enough to work around. It's simple enough to write the algorithm using Panel.apply passing the chosen axis. However for a mixed-type Panel each constituent series get passed with a dtype=object even if the series it's passing are not of dtype=object to begin with. Hence, the dtype for all items in the panel becomes object.

In [50]:
from pandas.util.testing import makePanel
p = makePanel()
p.dtypes

Out[50]:
ItemA    float64
ItemB    float64
ItemC    float64
dtype: object


In [51]:
p['str'] = 'foo'
p.dtypes

Out[51]:
ItemA    float64
ItemB    float64
ItemC    float64
str       object
dtype: object


In [52]:
p.apply(lambda s: s).dtypes

Out[52]:
ItemA    object
ItemB    object
ItemC    object
str      object
dtype: object

Using Panel.apply, the resulting panels look ostensibly like they should, but their dtypes are wrong. Any ideas on how to work around this?

@jreback
Copy link
Contributor

jreback commented Sep 29, 2014

In [15]: p = tm.makePanel()

In [16]: p['str'] = 'foo'

In [17]: p.apply(lambda df: df.dtypes,axis=['items','major_axis'])
Out[17]: 
             A        B        C        D
ItemA  float64  float64  float64  float64
ItemB  float64  float64  float64  float64
ItemC  float64  float64  float64  float64
str     object   object   object   object

you need to specifiy multiple axes in order to get a frame.

@stahlous
Copy link
Contributor Author

Oh, I didn't mention my attempts with passing dataframes, which was actually my original idea. I played around with passing dual axes as you suggested, but the axes get transposed. That leaves you having to manually transpose the axes again, which is what we were trying to avoid.

In [35]: p
Out[35]:
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 30 (major_axis) x 4 (minor_axis)
Items axis: ItemA to ItemC
Major_axis axis: 2000-01-03 00:00:00 to 2000-02-11 00:00:00
Minor_axis axis: A to D

In [36]: p.apply(lambda df: df, axis=(0,1))
Out[36]:
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 30 (major_axis) x 3 (minor_axis)
Items axis: A to D
Major_axis axis: 2000-01-03 00:00:00 to 2000-02-11 00:00:00
Minor_axis axis: ItemA to ItemC

However, now that I reconsider this, it guess it might be preferable to use apply and transpose rather than collect everything into a dict and rebuild the panel from that.

@stahlous
Copy link
Contributor Author

I implemented the Panel.apply approach. Please take a look. If it looks OK I'll squash the commits.

@@ -2288,11 +2281,32 @@ def fillna(self, value=None, method=None, axis=0, inplace=False,

# 3d
elif self.ndim == 3:
if axis == 0:
new_axis = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use something like this

axes = list(range(self.ndim))
axes.remove(axis)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. The apply_axes need to contain the axis which is being filled.

@stahlous
Copy link
Contributor Author

stahlous commented Oct 1, 2014

Realized that this method was almost trivial to extend to Panel4D, so I implemented that (along with tests).

Unless I'm dopey, which is totally possible, I don't think your suggestion of using range and remove for determining the apply_axes is workable. In any event, as it turns out, for both Panel and Panel4D you want the apply_axes=(0, 1) for axis 0 and apply_axes=(1, 2) for any other chosen axis.

@stahlous
Copy link
Contributor Author

stahlous commented Oct 1, 2014

Also added some documentation

)

# 3d or 4d
if self.ndim >= 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to fix this to avoid doing all these ndim specific things. (and then transposing and such)

@stahlous
Copy link
Contributor Author

This works... mostly. Unfortunately it fails for categoricals since they don't have apply support yet. One option would be to hold off on this PR until they do support apply operations. The other options I suppose would be invoking one of the less elegant methods that I've proposed in earlier commits as they don't seem to suffer from the same problem.

I would also point out that I needed to invoke a convert_objects operation since Panel.apply converts everything to a dtype of object if anything in the frame is an object dtype. At first I thought this was a code smell, but I discovered DataFrame.apply uses the same technique. Perhaps an issue should be opened to address this point with Panel.apply.

@stahlous
Copy link
Contributor Author

I implemented a method for restoring categorical data. Not sure if it's robust enough to be widely applicable, but it does work for this use case and the categorical tests pass.

@jreback
Copy link
Contributor

jreback commented Oct 16, 2014

will have to look after 0.15.0 is released (end of the week).

@stahlous
Copy link
Contributor Author

If you want to take a look, I have time to work on this PR this week.

@@ -324,6 +324,8 @@ API changes
- Merge errors will now be sub-classes of ``ValueError`` rather than raw ``Exception`` (:issue:`8501`)
- ``DataFrame.plot`` and ``Series.plot`` keywords are now have consistent orders (:issue:`8037`)

- ``Panel.fillna()`` now defaults to the stat axis (axis 1) when filling with a method. Previously the default axis was 0. Similarly, ``Panel.ffill()`` and ``Panel.bfill()`` default to axis 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to move to v0.15.1.txt

@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

@stahlous can you rebase / revisit?

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@jreback
Copy link
Contributor

jreback commented Jul 12, 2015

pls rebase/update. closing, but reopen if you want to work on.

@jreback jreback closed this Jul 12, 2015
@stahlous
Copy link
Contributor Author

Revisiting this after an extended hiatus. Do I need to submit another pull request, or can this one be re-opened?

@jreback
Copy link
Contributor

jreback commented Oct 26, 2015

no, you can just push to this one

@stahlous
Copy link
Contributor Author

I pushed to the fillna_bug branch of my fork, but it doesn't seem to have any effect here. I still only see the old commits in this PR.

Also, I'm having an odd problem with Travis. Anywhere that LooseVersion() is used I get an error. For example:

======================================================================
ERROR: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/stahlous/pandas/pandas/tests/test_categorical.py", line 1261, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/distutils/version.py", line 70, in __ge__
    c = self._cmp(other)
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/distutils/version.py", line 337, in _cmp
    if self.version < other.version:
TypeError: unorderable types: str() < int()

It's like LooseVersion() has lost its mind on the Travis machine. I can't reproduce the problem on my machine and the tests pass without a problem. Have you seen anything like this?

@jreback
Copy link
Contributor

jreback commented Oct 26, 2015

you need to rebase on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panel.fillna with method='ffill' ignores the axis parameter and only fills along axis=1
2 participants