-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: conversion of empty DataFrame to SparseDtype (#33113) #33118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: conversion of empty DataFrame to SparseDtype (#33113) #33118
Conversation
rebase on master as have merged #33110 which might fix this (if so, you can just push the test itself) |
@jreback now this is synced with master, i removed my changes in |
then there maybe still an issue |
pandas/tests/frame/test_dtypes.py
Outdated
@@ -552,6 +553,10 @@ def test_astype_to_incorrect_datetimelike(self, unit): | |||
with pytest.raises(TypeError, match=msg): | |||
df.astype(dtype) | |||
|
|||
def test_astype_sparsedtype_empty_dataframe(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the same issue occurs with all extension array dtypes can you add a test to pandas\tests\extension\base\dtype.py instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
@simonjayhawkins i've added the new test. |
pandas/tests/extension/base/dtype.py
Outdated
|
||
def test_astype_empty_dataframe(self, dtype): | ||
empty_dataframe = pd.DataFrame() | ||
assert empty_dataframe.astype(dtype) == empty_dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use tm.assert_frame_equal(result, empty_dataframe) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
i've added a small fix by simply returning an empty dataframe when there's nothing to concatenate. |
Thanks @tgy can you add a whatsnew |
@simonjayhawkins done! |
@simonjayhawkins @jreback it seems that using
maybe i should only do what do you recommend? there's also a typing validation error:
what should i do about that? |
That was my mistake asking for the test to be moved in #33118 (comment). We'll probably need to keep the test where is was and use parameterization instead. We don't have a fixture specifically for extension dtypes in conftest.py. (It maybe elsewhere), so could manually copy the dtypes from each of the /tests/extensions/.. tests for now. |
The issue only affects DataFrame? but for mypy the return type needs to the same as the input type. maybe the original fix, see #33118 (comment), is more appropriate? |
@jreback @simonjayhawkins I've updated the fix and moved the test. What do you guys think? |
Thanks @tgy for being patient here and sorry for reviewing sooner
ah my bad. should have been i've moved the test (again!) to pandas\tests\extension\base\casting.py can you add a release note to doc\source\whatsnew\v1.1.0.rst |
…as into sparsedtype-conversion-empty-df
@simonjayhawkins done! I was the one who postponed that issue :-) Thanks for reviewing so quickly on a Sunday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tgy generally lgtm.
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -1068,6 +1068,7 @@ Sparse | |||
- Bug in :meth:`Series.sum` with ``SparseArray`` raises ``TypeError`` (:issue:`25777`) | |||
- Bug where :class:`DataFrame` containing :class:`SparseArray` filled with ``NaN`` when indexed by a list-like (:issue:`27781`, :issue:`29563`) | |||
- The repr of :class:`SparseDtype` now includes the repr of its ``fill_value`` attribute. Previously it used ``fill_value``'s string representation (:issue:`34352`) | |||
- Fixed bug where :class:`DataFrame` or :class:`Series` could not be cast to :class:`SparseDtype` when empty (:issue:`33113`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this fix is only for DataFrame. Series.astype works fine on master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I tried and Series.astype
works fine on master
. Will fix the entry.
@@ -5548,6 +5548,9 @@ def astype( | |||
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,) | |||
return self._constructor(new_data).__finalize__(self, method="astype") | |||
|
|||
if not results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be in elif is_extension_array_dtype(dtype) and self.ndim > 1:
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although then the issue mentioned in #33113 (comment) would not be fixed. maybe keep this where it is and a test for #33113 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, did not see your second comment. Ok I'll do that. Anyhow if we reach the "concat" part of the code and results
is empty, then we have a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, one could argue that pd.concat
of an empty list could return an empty frame or series instead of raising an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins Actually starting to think the other issue was not an issue but an error on the author's part. See #33113 (comment). Awaiting your answer to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other issue is relevant.
on master and in this PR (latest commit)...
>>> df = pd.DataFrame([[1, 2], [3, 4]])
>>> df
0 1
0 1 2
1 3 4
>>>
>>> res = df.astype(dict())
>>> res
0 1
0 1 2
1 3 4
>>>
>>> res is df
False
>>>
>>> df = pd.DataFrame([dict(), dict()])
>>>
>>> df.astype(dict())
Traceback (most recent call last):
...
ValueError: No objects to concatenate
>>>
this PR before moving #33118 (comment) ...
>>> df = pd.DataFrame([dict(), dict()])
>>>
>>> df.astype(dict())
Empty DataFrame
Columns: []
Index: [0, 1]
>>>
>>> res is df
False
so it appears astype with an empty dict is a valid op with a copy returned. so this should also work on an empty DataFrame (even if created with empty dict).
these sort of circumstances can arise on data pipelines and having consistency reduces the complexity of downstream code.
so I think need to move the changes back and add a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will revert the changes back and add a test. I still find it very strange to call .astype(T)
where T
is an instance of an object and not a type (except of course for string-defined such as .astype("float64")
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame.astype can also accept a dict of column name -> data type as the dtype argument, https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.astype.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh that makes a lot of sense now.
@simonjayhawkins let me know what you think :) |
see #33118 (comment) |
Thanks @tgy for adding the test. The test is not applicable to the extension array dtype tests. can you move it to pandas\tests\frame\methods\test_astype.py |
The test will still fail with you can run it locally
can you also add |
Thanks, should work now. Locally it did. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @tgy |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff