-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: conversion of empty DataFrame to SparseDtype (#33113) #33118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jreback
merged 17 commits into
pandas-dev:master
from
choucavalier:sparsedtype-conversion-empty-df
Jun 25, 2020
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
6ac82f8
BUG: fix GH33113, conversion of empty DataFrame to SparseDtype
choucavalier e76f1fc
return empty dataframe when nothing to concatenate
choucavalier 8a96a47
revert black formatting
choucavalier a4aad14
restore-to-master
choucavalier d42709b
return instance of class to handle frame or series
choucavalier 708fdc7
add test
choucavalier 5b1fe9f
use type(self) instead of self.__class__
choucavalier 738739e
return self.copy()
choucavalier a9e7e15
move test to extension array tests
simonjayhawkins 9c7f06b
black fixup
simonjayhawkins 49280cb
add whatsnew entry
choucavalier 369735b
Merge branch 'sparsedtype-conversion-empty-df' of github.com:tgy/pand…
choucavalier f0f099c
address @simonjayhawkins's comments
choucavalier dcdf11f
revert back to previous solution + add test
choucavalier e2d4241
move test
choucavalier 7bc6255
fix test
choucavalier 95fd399
fix test again
choucavalier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be in
elif is_extension_array_dtype(dtype) and self.ndim > 1:
block.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although then the issue mentioned in #33113 (comment) would not be fixed. maybe keep this where it is and a test for #33113 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, did not see your second comment. Ok I'll do that. Anyhow if we reach the "concat" part of the code and
results
is empty, then we have a problem.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, one could argue that
pd.concat
of an empty list could return an empty frame or series instead of raising an exception.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins Actually starting to think the other issue was not an issue but an error on the author's part. See #33113 (comment). Awaiting your answer to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other issue is relevant.
on master and in this PR (latest commit)...
this PR before moving #33118 (comment) ...
so it appears astype with an empty dict is a valid op with a copy returned. so this should also work on an empty DataFrame (even if created with empty dict).
these sort of circumstances can arise on data pipelines and having consistency reduces the complexity of downstream code.
so I think need to move the changes back and add a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will revert the changes back and add a test. I still find it very strange to call
.astype(T)
whereT
is an instance of an object and not a type (except of course for string-defined such as.astype("float64")
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame.astype can also accept a dict of column name -> data type as the dtype argument, https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.astype.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh that makes a lot of sense now.