Skip to content

WIP: Panel Transpose Cleanup #25958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Apr 2, 2019

Progress towards #25632

Prefacing this PR with the statement that this is pretty wonky, but I would consider that largely an artifact of cleaning up the existing code base.

Right now transpose is defined in NDFrame but is only used by DataFrame and Panel, as Series objects end up using a separate transpose defined in IndexOpsMixin.

There are duplicate calls to validate_transpose* with various signatures that aren't well defined, and everything is passed rather ambiguously via args and kwargs which makes validation rather difficult.

Existing validation is arguably rough. Here's a sample call and response on master:

>>> df = pd.DataFrame(np.ones((2, 2)))
>>> df.transpose('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/williamayd/clones/pandas/pandas/core/frame.py", line 2643, in transpose
    nv.validate_transpose(args, dict())
  File "/Users/williamayd/clones/pandas/pandas/compat/numpy/function.py", line 55, in __call__
    self.defaults)
  File "/Users/williamayd/clones/pandas/pandas/util/_validators.py", line 218, in validate_args_and_kwargs
    validate_kwargs(fname, kwargs, compat_args)
  File "/Users/williamayd/clones/pandas/pandas/util/_validators.py", line 157, in validate_kwargs
    _check_for_default_values(fname, kwds, compat_args)
  File "/Users/williamayd/clones/pandas/pandas/util/_validators.py", line 69, in _check_for_default_values
    format(fname=fname, arg=key)))
ValueError: the 'axes' parameter is not supported in the pandas implementation of transpose()

As you can see the error message mentions the 'axes' parameter, but that isn't part of the signature of transpose and is instead ambiguously defined based off of args.

What I've done here is removed the Panel implementation and tried to make the NDFrame implementation more explicit. Because it's only used by DataFrame however, an alternate and arguably preferable implementation would be to raise NotImplementedError in NDFrame and handle everything in the DataFrame implementation once Panel is gone

@@ -651,32 +650,24 @@ def _set_axis(self, axis, labels):
self._data.set_axis(axis, labels)
self._clear_item_cache()

def transpose(self, *args, **kwargs):
def transpose(self, copy, valid_axes=[1, 0]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the default valid_axes=None and then handle as appropriate within the method? Mutable default arguments can lead to problems so probably best to avoid just to be safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great callout! Do you think better to define here or even just move this code to the DataFrame class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with this, so don't want my opinion to carry too much weight, but I think it'd be cleaner to push this back to the DataFrame class and remove it from here entirely. If we're removing Panel.transpose then I think DataFrame is the only thing that uses this method, so seems more direct that way? Looks like Series/Index inherit from IndexOpsMixin and sparse things have their own implementation, so should be safe to remove unless I'm forgetting something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could move the "core" implementation into DataFrame, but I would think we should keep a skeleton implementation on transpose in NDFrame (even simple as raise NotImplemented, as transpose is arguably something that should be part of the implementation of anything that inherits from this base class).

@WillAyd
Copy link
Member Author

WillAyd commented Apr 2, 2019

Should also mention that this is somewhat of an "optional" cleanup. What brought me here was seeing Panel appear in the comment of validate_transpose_for_generic and then tracing that back to see that there were potentially duplicative calls to validate_transpose for DataFrame, whereas Panel solely relied on the former

At the end of the day Panel could simply be removed and no change, but I figure this is worth cleaning up to simplify is possible

@gfyoung
Copy link
Member

gfyoung commented Apr 2, 2019

@WillAyd : I understand what you're trying to do, but the motivation for having those "kwargs" and "args" in the signature was for numpy compatibility, which not ideal for us maintainers, was something that came up as an issue for users, hence why we implemented it.

If this is just cleanup, I would just make sure we can separate out this method so that we can cleanly take it out for Panel at the end of the day. Don't modify the signature at this point.

@WillAyd
Copy link
Member Author

WillAyd commented Apr 2, 2019

the motivation for having those "kwargs" and "args" in the signature was for numpy compatibility, which not ideal for us maintainers, was something that came up as an issue for users

Do you know if there was an issue attached to that? Without knowing the backstory just looking at the current implementation it seems like there's a lot of unnecessary code and strange error reporting with what's in place, so I think worth revisiting if maybe the compatibility piece was only applicable to older versions.

AFAICT the keyword arguments only ever make their way to the call to _construct_axes_from_arguments (which is pretty limited in scope) and enable copy to be passed, though that can be done with an explicit keyword argument as well

@gfyoung
Copy link
Member

gfyoung commented Apr 2, 2019

was only applicable to older versions.

@WillAyd : By compatibility, it means compatibility with numpy, period. This is not an issue regarding older versions. For reference, you can look at the current signature for transpose in 1.16, and you can see what I mean. That being said, happy to always look at this with fresh eyes and see how we can either improve or streamline.

Do you know if there was an issue attached to that?

It's been awhile, but if you'd like to dive into the history, you can go in a lot of directions from this PR I wrote then: #12810

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand the purpose of this PR. We can simply remove the use of Panel for transpose rather than actually change what you are doing.

@WillAyd
Copy link
Member Author

WillAyd commented Apr 2, 2019 via email

@jreback
Copy link
Contributor

jreback commented Apr 2, 2019

I am not sure this does anything except add confusion. would rather just remove Panel methods / whole section that do this.

@jreback jreback closed this Apr 2, 2019
@WillAyd WillAyd deleted the panel-transpose-cleanup branch January 16, 2020 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants