BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36185

steveya · 2020-09-07T10:07:24Z

closes BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36113
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2020-09-07T15:31:22Z

Hello @steveya! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-26 10:50:16 UTC

jreback · 2020-09-07T19:40:18Z

@steveya is this the correct issue reference?

steveya · 2020-09-08T02:06:15Z

@jrreback, It should fix the error seen in #36113, this is my first contribution so please let me know the specifics if I have made even an obvious error. Thanks.

ivanovmg · 2020-09-08T08:41:54Z

pandas/tests/frame/test_reshape.py

+    tm.assert_series_equal(
+        DataFrame().stack(), Series(index=MultiIndex([[], []], [[], []]), dtype=object)


Why do you expect that there should be MultiIndex?
Would it be reasonable to expect Series([], dtype=object)?

Note that Series([1, 2, 3]).unstack() will throw an error as it does not know what to put on the columns, this is because it has only one level of index. I think the same should apply to an empty Series, so Series([]).unstack() would throw the same error and not become an empty DataFrame.

Given this constraint, for stack/unstack round-trip to work, DataFrame([]).stack() needs to return an empty Series with empty multi index with two levels, one from its original index and one from its empty column.

ivanovmg · 2020-09-08T08:44:17Z

pandas/core/reshape/reshape.py

@@ -517,7 +517,7 @@ def factorize(index):
        # For homogeneous EAs, frame._values will coerce to object. So
        # we concatenate instead.
        dtypes = list(frame.dtypes._values)
-        dtype = dtypes[0]
+        dtype = dtypes[0] if len(dtypes) > 0 else object


What if you return the Series right away if the dataframe is empty?

if not frame.empty: dtypes = list(frame.dtypes._values) dtype = dtypes[0] else: return Series([], dtype=object)

But you would still need to solve unstacking from an empty series.

likely you can just add a not frame.empty and frame._is_homogenerous_type on L516 might work

… instead of object

jreback · 2020-09-09T12:22:31Z

pandas/tests/frame/test_reshape.py

@@ -1273,6 +1273,18 @@ def test_stack_timezone_aware_values():
    tm.assert_series_equal(result, expected)


+def test_stack_empty_frame():
+    tm.assert_series_equal(


use

result=
expected=
tm.assert_series_equal (or frame)

pls parameterize these cases

add a comment with the issue number

jreback · 2020-09-09T12:25:16Z

pandas/core/reshape/reshape.py

@@ -517,7 +517,7 @@ def factorize(index):
        # For homogeneous EAs, frame._values will coerce to object. So
        # we concatenate instead.
        dtypes = list(frame.dtypes._values)
-        dtype = dtypes[0]
+        dtype = dtypes[0] if len(dtypes) > 0 else object


likely you can just add a not frame.empty and frame._is_homogenerous_type on L516 might work

steveya · 2020-09-11T05:25:01Z

@jreback I have made the suggested changes.

pandas/tests/frame/test_reshape.py

steveya · 2020-09-13T12:35:41Z

@jreback @TomAugspurger I have updated the code to raise proper error when a Series with a single level index will raise an exception when it is unstacked. a DataFrame with single level of index and column will no longer raise an exception when unstack is called (it will return a Series(). I will dig deeper into @TomAugspurger's example further.

jreback · 2020-09-13T12:49:53Z

doc/source/whatsnew/v1.2.0.rst

@@ -322,7 +322,7 @@ Reshaping
 - Bug in :meth:`DataFrame.pivot_table` with ``aggfunc='count'`` or ``aggfunc='sum'`` returning ``NaN`` for missing categories when pivoted on a ``Categorical``. Now returning ``0`` (:issue:`31422`)
 - Bug in :func:`union_indexes` where input index names are not preserved in some cases. Affects :func:`concat` and :class:`DataFrame` constructor (:issue:`13475`)
 - Bug in func :meth:`crosstab` when using multiple columns with ``margins=True`` and ``normalize=True`` (:issue:`35144`)
-
+- Bug in :meth:`DataFrame.stack` for empty DataFrame (:issue:`36113`)


can you elaborate a bit on what is changing.

I have elaborate this a bit in the latest commit

jreback · 2020-09-13T12:50:13Z

pandas/core/reshape/reshape.py

+        # GH 36113
+        # Give nicer error messages when unstack a  Series whose
+        # Index is not a MultiIndex.
+        raise ValueError("index must be a MultiIndex to unstack")


does this have a test that hits it?

can you add: f'{type(obj.index)} was passed'

pandas/tests/frame/test_reshape.py

tsu-shiuan · 2020-09-14T10:43:31Z

Could we please create a patch for 0.25.3 when releasing ?

jreback · 2020-09-14T10:51:40Z

@tsu-shiuan thus won't even be backport to 1.x let alone 0.25.x

tsu-shiuan · 2020-09-14T10:55:21Z

@jreback Okay! ☹️

steveya · 2020-11-05T00:19:20Z

@jreback Sorry I have concluded that I am not sure how to do that. on my end, I have tried this

% git pull origin GH36113
From https://github.com/steveya/pandas

branch GH36113 -> FETCH_HEAD
Already up to date.
% git merge master
Already up to date.
% git push origin GH36113
Everything up-to-date

This was what I did the 12 days ago and nothing seemed to have happened. Can you provide more guides please, thank you.

arw2019 · 2020-11-05T00:22:09Z

@jreback Sorry I have concluded that I am not sure how to do that. on my end, I have tried this

% git pull origin GH36113
From https://github.com/steveya/pandas

branch GH36113 -> FETCH_HEAD
Already up to date.
% git merge master
Already up to date.
% git push origin GH36113
Everything up-to-date

This was what I did the 12 days ago and nothing seemed to have happened. Can you provide more guides please, thank you.

do

git fetch upstream
git merge upstream/master
git push origin GH36113

You may have to resolve conflicts

steveya · 2020-11-05T11:49:57Z

@jreback I checked out the tests that failed. I cannot reproduce these errors locally and I am not sure how to fix them. Any suggestions?

ivanovmg · 2020-11-05T11:57:21Z

pandas/tests/frame/test_stack_unstack.py

+        Series().unstack()
+


You may want to specify dtype here in the Series to something (like dtype='float64', maybe).

I just looked at the test failure and it seems that it is caused by the warning in repr.

if is_empty_data(data) and dtype is None: # gh-17261 > warnings.warn( "The default dtype for empty Series will be 'object' instead " "of 'float64' in a future version. Specify a dtype explicitly " "to silence this warning.", DeprecationWarning, stacklevel=2, ) E DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. pandas/core/series.py:234: DeprecationWarning

ivanovmg · 2020-11-05T12:00:25Z

pandas/tests/frame/test_stack_unstack.py

+@pytest.mark.parametrize("fill_value", [None, 0])
+def test_stack_unstack_empty_frame(dropna, fill_value):
+    # GH 36113
+    result = DataFrame().stack(dropna=dropna).unstack(fill_value=fill_value)


This test fails in 32 bit.

> def groupsort_indexer(const int64_t[:] index, Py_ssize_t ngroups): E ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int'

Looks like also a problem with the dtype.
I am not sure if that is critical, but what if you specify here dtype=np.intp?

ivanovmg · 2020-11-06T16:29:02Z

I see that the failures on py37-32bit persist.
Is it possible that dtype changes when performing stack/unstack?

=================================== FAILURES ===================================
__________________ test_stack_unstack_empty_frame[None-True] ___________________

dropna = True, fill_value = None

@pytest.mark.parametrize("dropna", [True, False])
@pytest.mark.parametrize("fill_value", [None, 0])
def test_stack_unstack_empty_frame(dropna, fill_value):
    # GH 36113
    result = (

      DataFrame(dtype=np.intp).stack(dropna=dropna).unstack(fill_value=fill_value)

pandas/tests/frame/test_stack_unstack.py:1191:

pandas/core/series.py:3872: in unstack
return unstack(self, level, fill_value)
pandas/core/reshape/reshape.py:431: in unstack
obj.index, level=level, constructor=obj._constructor_expanddim
pandas/core/reshape/reshape.py:118: in init
self._make_selectors()
pandas/core/reshape/reshape.py:152: in _make_selectors
remaining_labels = self.sorted_labels[:-1]
pandas/_libs/properties.pyx:33: in pandas._libs.properties.CachedProperty.get
val = self.func(obj)
pandas/core/reshape/reshape.py:139: in sorted_labels
indexer, to_sort = self._indexer_and_to_sort
pandas/_libs/properties.pyx:33: in pandas._libs.properties.CachedProperty.get
val = self.func(obj)
pandas/core/reshape/reshape.py:132: in _indexer_and_to_sort
indexer = libalgos.groupsort_indexer(comp_index, ngroups)[0]

def groupsort_indexer(const int64_t[:] index, Py_ssize_t ngroups):
E ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int'

pandas/_libs/algos.pyx:177: ValueError

jreback

comment on the 32-bit, ping on green (or ping if you this doesn't fix)

pandas/tests/frame/test_stack_unstack.py

steveya · 2020-11-20T09:20:01Z

@jreback the error persists after the change on 32bit

ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int'

jreback · 2020-11-20T16:36:39Z

ok try this:

pandas/pandas/core/sorting.py

Line 613 in bc537b7

return comp_ids, obs_group_ids

here you want to add
return ensure_int64(comp_ids), ensure_int64(obs_group_ids)

if you can also update the Returns sections to indicate these are int64 indexers

steveya · 2020-11-22T11:30:10Z

@jreback by the Returns section you mean the doc for "compress_group_index" in sorting.py?

jreback · 2020-11-24T13:49:51Z

actually this should pass, @steveya can you merge master one more time.

to fix the 32bit build

steveya · 2020-11-25T12:42:42Z

@jreback yay there is only one remaining error in test_pivot.py in 32bit build.

jreback · 2020-11-25T21:58:45Z

_ TestPivot.test_pivot_empty __________________________
[XPASS(strict)] GH 36579: fail on 32-bit system

oh this is easy, you fixed the test, so remove the xfail.

merge master again

jreback · 2020-11-26T15:54:54Z

thanks @steveya very nice. thanks for sticking with it!

steveya · 2020-11-27T07:27:53Z

@jreback @arw2019 @ivanovmg , thank you all for helping me with my first PR.

steveya added 2 commits September 7, 2020 17:50

BUG: GH36113

165fd72

modify tests to avoid deprrecated errors

e0c1a8d

steveya added 3 commits September 7, 2020 23:32

PEP 8 compliant

f765acf

remove trailing white space

109d312

black format checked

519a140

ivanovmg reviewed Sep 8, 2020

View reviewed changes

steveya added 4 commits September 9, 2020 14:15

DataFrame().stack should return an empty Series with dtype np.float64…

9d20ff5

… instead of object

PEP8 again.

d460db6

remove trailing space...\

bae2bd8

add a comma to pass black lint

047ae40

jreback changed the title ~~BUG: GH36113~~ BUG: pandas.DataFrame().stack() raise an error, while expected is empty Sep 9, 2020

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 9, 2020

jreback requested changes Sep 9, 2020

View reviewed changes

simply fixes and parameterize tests

c0fffe8

jreback added this to the 1.2 milestone Sep 11, 2020

jreback requested changes Sep 11, 2020

View reviewed changes

pandas/tests/frame/test_reshape.py Outdated Show resolved Hide resolved

steveya added 3 commits September 12, 2020 22:52

add error messages when unstack frame and series with single level index

6b2b9bd

apply ValueError location

efc0603

change the place where error is raised

dac5f32

jreback requested changes Sep 13, 2020

View reviewed changes

steveya added 2 commits November 5, 2020 13:18

resolve doc/source/whatsnew/v1.2.0.rst conflicts

07d9ad5

fix unittest assert error message

148b77d

ivanovmg suggested changes Nov 5, 2020

View reviewed changes

steveya added 2 commits November 5, 2020 20:34

change dtype of empty series and dataframe in test

99f8280

formatting

c4e244a

jreback requested changes Nov 18, 2020

View reviewed changes

pandas/tests/frame/test_stack_unstack.py Outdated Show resolved Hide resolved

steveya added 2 commits November 20, 2020 14:19

change intp to int64 in testing of stack unstack empty frame

668189f

Merge remote-tracking branch 'upstream/master' into GH36113

20858db

ensure indexer is of type int64

4f95523

jreback removed this from the 1.2 milestone Nov 24, 2020

Merge remote-tracking branch 'upstream/master' into GH36113

7ab1155

to fix the 32bit build

jreback added this to the 1.2 milestone Nov 25, 2020

steveya added 3 commits November 26, 2020 17:31

remove xfail

475f158

remove unsed import

bdf49d3

Merge remote-tracking branch 'upstream/master' into GH36113

f96453e

merge master again

jreback approved these changes Nov 26, 2020

View reviewed changes

jreback merged commit 0787b53 into pandas-dev:master Nov 26, 2020

steveya deleted the GH36113 branch November 27, 2020 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36185

BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36185

steveya commented Sep 7, 2020 •

edited by jreback

Loading

pep8speaks commented Sep 7, 2020 •

edited

Loading

jreback commented Sep 7, 2020

steveya commented Sep 8, 2020

ivanovmg Sep 8, 2020

steveya Sep 8, 2020

ivanovmg Sep 8, 2020 •

edited

Loading

ivanovmg Sep 8, 2020

jreback Sep 9, 2020

jreback Sep 9, 2020

jreback Sep 9, 2020

steveya commented Sep 11, 2020

steveya commented Sep 13, 2020

jreback Sep 13, 2020

steveya Sep 17, 2020

jreback Sep 13, 2020

jreback Sep 19, 2020

tsu-shiuan commented Sep 14, 2020

jreback commented Sep 14, 2020

tsu-shiuan commented Sep 14, 2020

steveya commented Nov 5, 2020

arw2019 commented Nov 5, 2020

steveya commented Nov 5, 2020

ivanovmg Nov 5, 2020

ivanovmg Nov 5, 2020 •

edited

Loading

ivanovmg commented Nov 6, 2020

jreback left a comment

steveya commented Nov 20, 2020

jreback commented Nov 20, 2020

steveya commented Nov 22, 2020

jreback commented Nov 24, 2020

steveya commented Nov 25, 2020

jreback commented Nov 25, 2020

jreback commented Nov 26, 2020

steveya commented Nov 27, 2020

		tm.assert_series_equal(
		DataFrame().stack(), Series(index=MultiIndex([[], []], [[], []]), dtype=object)

BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36185

BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36185

Conversation

steveya commented Sep 7, 2020 • edited by jreback Loading

pep8speaks commented Sep 7, 2020 • edited Loading

Comment last updated at 2020-11-26 10:50:16 UTC

jreback commented Sep 7, 2020

steveya commented Sep 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanovmg Sep 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveya commented Sep 11, 2020

steveya commented Sep 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsu-shiuan commented Sep 14, 2020

jreback commented Sep 14, 2020

tsu-shiuan commented Sep 14, 2020

steveya commented Nov 5, 2020

arw2019 commented Nov 5, 2020

steveya commented Nov 5, 2020

Choose a reason for hiding this comment

ivanovmg Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

ivanovmg commented Nov 6, 2020

jreback left a comment

Choose a reason for hiding this comment

steveya commented Nov 20, 2020

jreback commented Nov 20, 2020

steveya commented Nov 22, 2020

jreback commented Nov 24, 2020

steveya commented Nov 25, 2020

jreback commented Nov 25, 2020

jreback commented Nov 26, 2020

steveya commented Nov 27, 2020

steveya commented Sep 7, 2020 •

edited by jreback

Loading

pep8speaks commented Sep 7, 2020 •

edited

Loading

ivanovmg Sep 8, 2020 •

edited

Loading

ivanovmg Nov 5, 2020 •

edited

Loading