ENH: GH9746 DataFrame.unstack and Series.unstack now take fill_value … #10246

amcpherson · 2015-06-01T19:36:24Z

…kw for filling NaN when unstack results in a sparse DataFrame

closes #9746

shoyer · 2015-06-01T20:47:44Z

This may interact with #9023

amcpherson · 2015-06-12T17:16:30Z

I can resolve conflicts where necessary, let me know when and how to proceed.

jreback · 2015-10-11T15:47:36Z

pandas/core/reshape.py

@@ -179,6 +180,10 @@ def get_new_values(self):
        if self.mask.all():
            dtype = values.dtype
            new_values = np.empty(result_shape, dtype=dtype)
+        elif self.fill_value is not None:


this should happen before the self.mask.all() check I think

amcpherson · 2015-10-21T20:44:51Z

Reworked and simplified, and now just pass fill_value to _maybe_promote, which I believe is more correct. As an effect of this I had to handle a None fill_value passed to _maybe_promote as this is the default of unstack(). As an alternative the default of unstack() could be fill_value=np.nan.

jreback · 2015-10-21T20:51:22Z

doc/source/whatsnew/v0.17.0.txt

@@ -865,6 +865,7 @@ Changes to ``Categorical.unique``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 ``Categorical.unique`` now returns new ``Categoricals`` with ``categories`` and ``codes`` that are unique, rather than returning ``np.array`` (:issue:`10508`)
+- ``DataFrame.unstack`` and ``Series.unstack`` now take ``fill_value`` keyword to allow direct replacement of missing values when an unstack results in missing values in the resulting ``DataFrame``.  As an added benefit, specifying ``fill_value`` will preserve the data type of the original stacked data.


move to 0.17.1

jreback · 2015-10-25T14:10:43Z

pandas/tests/test_frame.py

+        data.index = MultiIndex.from_tuples(
+            [('x', 'a'), ('x', 'b'), ('y', 'b'), ('z', 'a')])
+
+        result = data.unstack()


shouldn't this also test with fill_value?

Yes... its unfortunate that TestDataFrame inherits from unittest.TestCase. It would be nice to use the nose's generator functionality to yield a set of similar tests, but that doesnt work for unittest.TestCase classes. I may move the tests to another class in test_frame.py

you can just add another class as a mix-in. nose really doesn't support parameterized tests, so just use a loop.

I noticed other people have done the same (added mix-in classes) for specific tests where they need to use the generator style test creation for nose. I assume this is because the main test class TestDataFrame inherits from unittest.TestCase (to allow use of self.assert* functions from unittest.TestCase), but inheriting from unittest.TestCase prevents use of generator style tests for nose. Seems a bit of a hack.

I dont have a better idea, that doesnt involve copying the assert* functions into util.testing.TestCase, or does something hacky with getattr in util.testing.TestCase, so ill just add a mix in class.

Went with the simplest approach, just added a few more tests as it is more readable anyway.

jreback · 2015-11-18T20:17:28Z

can you rebase/update according to comments.

jreback · 2015-12-11T01:13:45Z

doc/source/reshaping.rst

+
+.. ipython:: python
+
+   df3 = df.ix[[0, 1, 4, 7], [1, 2]]


use .loc or .iloc

jreback · 2015-12-11T01:19:30Z

@amcpherson only a couple of doc changes. pls ping when ready & green.

thanks for the patience.....i periodically cycle thru PR's and sometimes don't make it all the way thru....(before I start again) :)

jreback · 2016-01-02T23:17:15Z

can you run: git diff master | flake8 --diff and fix anything. going to shortly be enforcing PEP8.

jreback · 2016-01-20T14:15:26Z

can you rebase / update according to comments

amcpherson · 2016-01-20T16:44:50Z

Apologies for the delay, let me know how it looks now.

jreback · 2016-01-20T16:46:41Z

doc/source/reshaping.rst

+.. versionadded: 0.18.0
+
+Unstacking can result in missing values if subgroups do not have the same
+set of labels.  By default, missing values will be replaced with NaN.


double back-ticks around NaN (mention that for datetimelike these will be filled with NaT), so maybe say the fill value for that dtype

jreback · 2016-01-30T15:22:20Z

lmk when you have a chance to update

…kw for filling NaN when unstack results in a sparse DataFrame

amcpherson · 2016-01-30T16:26:36Z

Some tests failing, checking now.

amcpherson · 2016-01-30T18:41:39Z

Nevermind, tests were failing due to out of date .so, all looks good.

jreback · 2016-01-30T19:28:24Z

@amcpherson thanks!

jreback · 2016-01-30T19:29:37Z

pandas/core/common.py

@@ -1127,6 +1127,12 @@ def _maybe_promote(dtype, fill_value=np.nan):
                    # the proper thing to do here would probably be to upcast
                    # to object (but numpy 1.6.1 doesn't do this properly)
                    fill_value = tslib.iNaT
+            elif issubclass(dtype.type, np.timedelta64):
+                try:
+                    fill_value = lib.Timedelta(fill_value).value


@amcpherson I think we can eliminate this section here, and just upcast to np.object_ if the fill_value cannot be coerced, (and the above about datetimes), can you create an issue for this? (and PR would be great as well!) thanks

This is some pretty old code I think

amcpherson force-pushed the unstackfillvalue branch from 18ee145 to b48791a Compare August 10, 2015 20:05

jreback reviewed Oct 11, 2015
View reviewed changes

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Oct 11, 2015

amcpherson force-pushed the unstackfillvalue branch from b48791a to 7070031 Compare October 21, 2015 17:17

jreback reviewed Oct 21, 2015
View reviewed changes

amcpherson force-pushed the unstackfillvalue branch 2 times, most recently from 56ce589 to a66be32 Compare October 22, 2015 14:59

jreback reviewed Oct 25, 2015
View reviewed changes

amcpherson force-pushed the unstackfillvalue branch 2 times, most recently from d030f7e to 9700827 Compare October 28, 2015 18:13

amcpherson force-pushed the unstackfillvalue branch from 9700827 to 2f9fe02 Compare November 18, 2015 20:31

jreback reviewed Dec 11, 2015
View reviewed changes

doc/source/reshaping.rst

.. ipython:: python

df3 = df.ix[[0, 1, 4, 7], [1, 2]]

Copy link

Contributor

jreback Dec 11, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use .loc or .iloc

amcpherson force-pushed the unstackfillvalue branch from 2f9fe02 to a86002f Compare January 20, 2016 16:43

jreback reviewed Jan 20, 2016
View reviewed changes

jreback added this to the 0.18.0 milestone Jan 20, 2016

amcpherson force-pushed the unstackfillvalue branch 2 times, most recently from 286660c to fcf2f91 Compare January 20, 2016 19:13

ENH: GH9746 DataFrame.unstack and Series.unstack now take fill_value …

7c2f9d1

…kw for filling NaN when unstack results in a sparse DataFrame

amcpherson force-pushed the unstackfillvalue branch from fcf2f91 to 7c2f9d1 Compare January 30, 2016 16:07

jreback closed this in de46056 Jan 30, 2016

jreback reviewed Jan 30, 2016
View reviewed changes

jreback mentioned this pull request Apr 6, 2016

BUG: unstacking with object columns defaults to None as a fill value #12815

Closed

Uh oh!

ENH: GH9746 DataFrame.unstack and Series.unstack now take fill_value … #10246

ENH: GH9746 DataFrame.unstack and Series.unstack now take fill_value … #10246

Uh oh!

Conversation

amcpherson commented Jun 1, 2015

Uh oh!

shoyer commented Jun 1, 2015

Uh oh!

amcpherson commented Jun 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amcpherson commented Oct 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 11, 2015

Uh oh!

jreback commented Jan 2, 2016

Uh oh!

jreback commented Jan 20, 2016

Uh oh!

amcpherson commented Jan 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Jan 30, 2016

Uh oh!

amcpherson commented Jan 30, 2016

Uh oh!

amcpherson commented Jan 30, 2016

Uh oh!

jreback commented Jan 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!