-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG GH23744 ufuncs on DataFrame keeps dtype sparseness #23755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
b85bdb9
ad33f76
c39fe11
4aba3f8
bcdf01b
79be557
99c8796
0868c47
de0ecf3
491b908
f6230f6
42ca43a
ee2c462
bca539f
d8670ef
30d83a6
c15afe3
b4ab44b
d153f74
d6e22a8
8f151dc
be8750f
551ced8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -131,10 +131,11 @@ def get_result(self): | |
|
||
# ufunc | ||
elif isinstance(self.f, np.ufunc): | ||
with np.errstate(all='ignore'): | ||
results = self.f(self.values) | ||
return self.obj._constructor(data=results, index=self.index, | ||
columns=self.columns, copy=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You probably need to pass |
||
result = self.obj._constructor(index=self.index, copy=False) | ||
for col in self.columns: | ||
with np.errstate(all='ignore'): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is going to be very inefficient. use a list comprehension to iterate over the columns, then collect and contruct the result. something like
iterate thru the series and construct the result. construct a dict instead with the results, then do the construction at the end. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When given a list of series, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to be careful here. Previously, for a homogenous DataFrame of non-extension array values, If we go columnwise, we'll have Should this be done block-wise and then the results stitched together? |
||
result[col] = self.f(self.obj[col].values) | ||
return result | ||
|
||
# broadcasting | ||
if self.result_type == 'broadcast': | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -570,6 +570,16 @@ def test_apply_dup_names_multi_agg(self): | |
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_apply_keep_sparse_dtype(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be in the sparse frame tests |
||
# GH 23744 | ||
df = pd.SparseDataFrame(np.array([[0, 1, 0], [0, 0, 0], [0, 0, 1]]), | ||
columns=['a', 'b', 'c'], default_fill_value=1) | ||
df2 = pd.DataFrame(df) | ||
|
||
df = df.apply(np.exp) | ||
df2 = df2.apply(np.exp) | ||
tm.assert_frame_equal(df, df2) | ||
|
||
|
||
class TestInferOutputShape(object): | ||
# the user has supplied an opaque UDF where | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to the Sparse section