Skip to content

Subclassed reshape #15564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

Subclassed reshape #15564

wants to merge 15 commits into from

Conversation

delgadom
Copy link

@delgadom delgadom commented Mar 4, 2017

Reshape operations in pandas/core/reshape.py modified to use obj._constructor, obj._constructor_sliced, and obj._constructor_expanddim for Series and DataFrame. This preserves a subclass family on stack, unstack, and pivot operations.

@codecov-io
Copy link

codecov-io commented Mar 4, 2017

Codecov Report

Merging #15564 into master will decrease coverage by -0.02%.
The diff coverage is 92.85%.

@@            Coverage Diff             @@
##           master   #15564      +/-   ##
==========================================
- Coverage   91.06%   91.04%   -0.02%     
==========================================
  Files         137      136       -1     
  Lines       49307    49171     -136     
==========================================
- Hits        44899    44769     -130     
+ Misses       4408     4402       -6
Impacted Files Coverage Δ
pandas/core/reshape.py 99.08% <92.85%> (-0.2%)
pandas/io/gbq.py 25% <0%> (-58.34%)
pandas/io/json/json.py 89.22% <0%> (-1.05%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/indexes/frozen.py 93.24% <0%> (-0.31%)
pandas/core/frame.py 97.83% <0%> (-0.1%)
pandas/core/config_init.py 95.27% <0%> (-0.08%)
pandas/indexes/base.py 96.17% <0%> (-0.07%)
pandas/io/json/normalize.py 97.87% <0%> (-0.05%)
pandas/tools/pivot.py 95.03% <0%> (-0.04%)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e097bf5...dacdf40. Read the comment docs.

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 4, 2017
@jreback
Copy link
Contributor

jreback commented Mar 4, 2017

can you also have a look over: http://pandas-docs.github.io/pandas-docs-travis/internals.html#subclassing-pandas-data-structures

and see if anything needs amendment

@@ -105,6 +105,8 @@ New features
- DataFrame.drop now supports non-unique indexes (:issue:`2101`)
- Panel.shift now supports negative periods (:issue:`2164`)
- DataFrame now support unary ~ operator (:issue:`2110`)
- `stack`, `unstack`, and `pivot` operations now preserve subclass family
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.20.0

@@ -410,13 +413,24 @@ def unstack(obj, level, fill_value=None):

if isinstance(obj, DataFrame):
if isinstance(obj.index, MultiIndex):
return _unstack_frame(obj, level, fill_value=fill_value)
unstacked = _unstack_frame(obj, level, fill_value=fill_value)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think this should be done inside the .stack and _unstack_frame functions no?

else:
unstacker = _Unstacker(obj.values, obj.index, level=level,
fill_value=fill_value)
return unstacker.get_result()
unstacked = unstacker.get_result()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, this now has logic in 2 places, the creation itself of the unstacked result, then the return class. should be a part of get_result (it certainly could be another method that is called by get_result) though

@@ -515,7 +529,7 @@ def factorize(index):
mask = notnull(new_values)
new_values = new_values[mask]
new_index = new_index[mask]
return Series(new_values, index=new_index)
return frame._constructor_sliced(new_values, index=new_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is good

@@ -125,6 +125,144 @@ def test_indexing_sliced(self):
tm.assert_series_equal(res, exp)
tm.assertIsInstance(res, tm.SubclassedSeries)

def test_subclass_stack(self):
df = tm.SubclassedDataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number as a comment

@jreback
Copy link
Contributor

jreback commented Mar 4, 2017

thanks @delgadom

reshape could certainly use some tlc!

@delgadom
Copy link
Author

delgadom commented Mar 8, 2017

Thanks @jreback - I'm working on your suggestions.

Keeping this clean is a bit tough, especially because unstacking is sometimes handled by _unstack_multiple and sometimes by _Unstacker. I'd modify _unstack_multiple and _Unstacker, except _Unstacker accepts a numpy array and returns a DataFrame, so there's no way to access the _constructor* properties from within this class.

To me this class seems really nasty and should be handled by functions rather than a class (would you ever use _Unstacker except to get an unstacked object?) but originally was tentative in making changes. Do you think this should be re-written? At the very least it seems as though _Unstacker should be able to handle DataFrame arguments.

Happy to take a stab if you agree.

@jreback
Copy link
Contributor

jreback commented Mar 8, 2017

@delgadom the reshaping code is certainly in need of refactoring to make things easier :>

certainly take a stab at it. Yes the _Unstacker class is merely a helper class, so you can change things if you need (not public).

You might want to do a refactor before this PR to make it easier to test / debug. Though if its relatively simple go ahead here.

@delgadom
Copy link
Author

delgadom commented Mar 8, 2017

Yeah I think I'll refactor and then create a new PR. Thanks for the help!

@delgadom delgadom closed this Mar 8, 2017
@jreback
Copy link
Contributor

jreback commented Mar 8, 2017

ok!

@delgadom delgadom mentioned this pull request Mar 11, 2017
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve subclass family on reshape operations
3 participants