-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pandas.DataFrame().stack() raise an error, while expected is empty #36185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@steveya is this the correct issue reference? |
@jrreback, It should fix the error seen in #36113, this is my first contribution so please let me know the specifics if I have made even an obvious error. Thanks. |
pandas/tests/frame/test_reshape.py
Outdated
tm.assert_series_equal( | ||
DataFrame().stack(), Series(index=MultiIndex([[], []], [[], []]), dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you expect that there should be MultiIndex?
Would it be reasonable to expect Series([], dtype=object)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that Series([1, 2, 3]).unstack()
will throw an error as it does not know what to put on the columns, this is because it has only one level of index. I think the same should apply to an empty Series, so Series([]).unstack()
would throw the same error and not become an empty DataFrame.
Given this constraint, for stack/unstack round-trip to work, DataFrame([]).stack() needs to return an empty Series with empty multi index with two levels, one from its original index and one from its empty column.
pandas/core/reshape/reshape.py
Outdated
@@ -517,7 +517,7 @@ def factorize(index): | |||
# For homogeneous EAs, frame._values will coerce to object. So | |||
# we concatenate instead. | |||
dtypes = list(frame.dtypes._values) | |||
dtype = dtypes[0] | |||
dtype = dtypes[0] if len(dtypes) > 0 else object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you return the Series right away if the dataframe is empty?
if not frame.empty:
dtypes = list(frame.dtypes._values)
dtype = dtypes[0]
else:
return Series([], dtype=object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you would still need to solve unstacking from an empty series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely you can just add a not frame.empty and frame._is_homogenerous_type
on L516 might work
pandas/tests/frame/test_reshape.py
Outdated
@@ -1273,6 +1273,18 @@ def test_stack_timezone_aware_values(): | |||
tm.assert_series_equal(result, expected) | |||
|
|||
|
|||
def test_stack_empty_frame(): | |||
tm.assert_series_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
result=
expected=
tm.assert_series_equal (or frame)
pls parameterize these cases
add a comment with the issue number
pandas/core/reshape/reshape.py
Outdated
@@ -517,7 +517,7 @@ def factorize(index): | |||
# For homogeneous EAs, frame._values will coerce to object. So | |||
# we concatenate instead. | |||
dtypes = list(frame.dtypes._values) | |||
dtype = dtypes[0] | |||
dtype = dtypes[0] if len(dtypes) > 0 else object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely you can just add a not frame.empty and frame._is_homogenerous_type
on L516 might work
@jreback I have made the suggested changes. |
@jreback @TomAugspurger I have updated the code to raise proper error when a Series with a single level index will raise an exception when it is unstacked. a DataFrame with single level of index and column will no longer raise an exception when unstack is called (it will return a Series(). I will dig deeper into @TomAugspurger's example further. |
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -322,7 +322,7 @@ Reshaping | |||
- Bug in :meth:`DataFrame.pivot_table` with ``aggfunc='count'`` or ``aggfunc='sum'`` returning ``NaN`` for missing categories when pivoted on a ``Categorical``. Now returning ``0`` (:issue:`31422`) | |||
- Bug in :func:`union_indexes` where input index names are not preserved in some cases. Affects :func:`concat` and :class:`DataFrame` constructor (:issue:`13475`) | |||
- Bug in func :meth:`crosstab` when using multiple columns with ``margins=True`` and ``normalize=True`` (:issue:`35144`) | |||
- | |||
- Bug in :meth:`DataFrame.stack` for empty DataFrame (:issue:`36113`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate a bit on what is changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have elaborate this a bit in the latest commit
pandas/core/reshape/reshape.py
Outdated
# GH 36113 | ||
# Give nicer error messages when unstack a Series whose | ||
# Index is not a MultiIndex. | ||
raise ValueError("index must be a MultiIndex to unstack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have a test that hits it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add: f'{type(obj.index)} was passed'
Could we please create a patch for 0.25.3 when releasing ? |
@tsu-shiuan thus won't even be backport to 1.x let alone 0.25.x |
@jreback Okay! |
@jreback Sorry I have concluded that I am not sure how to do that. on my end, I have tried this % git pull origin GH36113
This was what I did the 12 days ago and nothing seemed to have happened. Can you provide more guides please, thank you. |
do
You may have to resolve conflicts |
@jreback I checked out the tests that failed. I cannot reproduce these errors locally and I am not sure how to fix them. Any suggestions? |
Series().unstack() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to specify dtype
here in the Series to something (like dtype='float64'
, maybe).
I just looked at the test failure and it seems that it is caused by the warning in repr.
if is_empty_data(data) and dtype is None:
# gh-17261
> warnings.warn(
"The default dtype for empty Series will be 'object' instead "
"of 'float64' in a future version. Specify a dtype explicitly "
"to silence this warning.",
DeprecationWarning,
stacklevel=2,
)
E DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
pandas/core/series.py:234: DeprecationWarning
@pytest.mark.parametrize("fill_value", [None, 0]) | ||
def test_stack_unstack_empty_frame(dropna, fill_value): | ||
# GH 36113 | ||
result = DataFrame().stack(dropna=dropna).unstack(fill_value=fill_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails in 32 bit.
> def groupsort_indexer(const int64_t[:] index, Py_ssize_t ngroups):
E ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int'
Looks like also a problem with the dtype.
I am not sure if that is critical, but what if you specify here dtype=np.intp
?
I see that the failures on py37-32bit persist. =================================== FAILURES =================================== dropna = True, fill_value = None
pandas/tests/frame/test_stack_unstack.py:1191: pandas/core/series.py:3872: in unstack
pandas/_libs/algos.pyx:177: ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on the 32-bit, ping on green (or ping if you this doesn't fix)
@jreback the error persists after the change on 32bit ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int' |
ok try this: Line 613 in bc537b7
here you want to add if you can also update the Returns sections to indicate these are int64 indexers |
@jreback by the Returns section you mean the doc for "compress_group_index" in sorting.py? |
actually this should pass, @steveya can you merge master one more time. |
to fix the 32bit build
@jreback yay there is only one remaining error in test_pivot.py in 32bit build. |
oh this is easy, you fixed the test, so remove the xfail. |
thanks @steveya very nice. thanks for sticking with it! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff