Skip to content

BUG: assign doesnt cast SparseDataFrame to DataFrame #19178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 12, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,7 @@ Reshaping
- Bug in :func:`cut` which fails when using readonly arrays (:issue:`18773`)
- Bug in :func:`Dataframe.pivot_table` which fails when the ``aggfunc`` arg is of type string. The behavior is now consistent with other methods like ``agg`` and ``apply`` (:issue:`18713`)
- Bug in :func:`DataFrame.merge` in which merging using ``Index`` objects as vectors raised an Exception (:issue:`19038`)
- Bug in :func:`DataFrame.assign` which doesn't cast ``SparseDataFrame`` as ``DataFrame``. (:issue:`19163`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :class`DataFrame` and so on here


Numeric
^^^^^^^
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2713,7 +2713,9 @@ def assign(self, **kwargs):
8 9 0.549296 2.197225
9 10 -0.758542 2.302585
"""
data = self.copy()

# See GH19163
data = self.copy().to_dense()
Copy link
Contributor

@TomAugspurger TomAugspurger Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define assign on SparseDataFrame and only densify if nescessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and actually you don't want to densify, rather you want to do something like (in SparseDataFrame)

def assign(self, **kwargs):

      # coerce to a DataFrame
      self = DataFrame(self)
      return self.assign(**kwargs)

this actually ends up copying twice though. So the real solution is to move the guts of DataFrame.assign to _assign (and leave the copy part in .assign), then call ._assign in the sparse version.


# do all calculations first...
results = OrderedDict()
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/frame/test_mutate_columns.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ def test_assign(self):
result = df.assign(A=lambda x: x.A + x.B)
assert_frame_equal(result, expected)

# SparseDataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a separate test

# See GH 19163
result = df.to_sparse(fill_value=False).assign(newcol=False)
expected = df.assign(newcol=False)
assert type(result) is DataFrame
assert_frame_equal(expected, result)

def test_assign_multiple(self):
df = DataFrame([[1, 4], [2, 5], [3, 6]], columns=['A', 'B'])
result = df.assign(C=[7, 8, 9], D=df.A, E=lambda x: x.B)
Expand Down