Skip to content

BUG: pivot_table losing tz #32558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 14, 2020

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Mar 9, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This gets rid of the last values_from_object usage (pending other PRs)

_tile_compat likely makes more sense as an Index method. I kept it here as a proof of concept because I think we actually need it in the EA interface too. Being able to broadcast a length=1 EA to a length=N EA will be necessary for some of the arithmetic perf going on.

@jbrockmendel
Copy link
Member Author

cc @jorisvandenbossche @TomAugspurger @jreback thoughts on adding something like tile or broadcast_to to EA interface?

@TomAugspurger
Copy link
Contributor

What are your proposed semantics of an EA.tile? If it's restricted to a single dimension, can to be emulated using _concat_same_type?

@jbrockmendel
Copy link
Member Author

What are your proposed semantics of an EA.tile? If it's restricted to a single dimension, can to be emulated using _concat_same_type?

The base class implementation would probably just use _concat_same_type. For ndarray-backed EAs I'd use broadcast_to which doesn't make a copy, but results in a non-writeable array (in that sense it would behave more like broadcast_to than np.tile)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should just have EA.tile; this is very messy otherwise.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 11, 2020
@jreback
Copy link
Contributor

jreback commented Mar 11, 2020

can you see if we have any issues about this (pivot and tz), I think we might.


def _broadcast_tile(arr: np.ndarray, num: int) -> np.ndarray:
"""
Emulate np.tile but using views instead of copies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an important optimization? Currently we also use np.tile ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large inputs yes

@jorisvandenbossche
Copy link
Member

The base class implementation would probably just use _concat_same_type.

We could also simplify this a lot by using _concat_same_type here (which basically repeats my inline question: is this optimization important?)

Regarding tile in the EA interface: numpy also has no tile method, so it's not that it would make it more compatible with numpy arrays (for that, we would need to implement a protocol to make np.tile work on EAs).
Personally, having the centralized "tile for EA function" here is fine for me.

@jbrockmendel
Copy link
Member Author

We could also simplify this a lot by using _concat_same_type here (which basically repeats my inline question: is this optimization important?)

This is actually close to orthogonal to the broadcast_to optimization. When I disable the broadcat_to optimization and just use np.tile, this implementation is about 2x faster than one that uses _concat_same_type (based on arr = pd.date_range('2016-01-01', periods=10**5, freq='S') and num=5)

@jbrockmendel
Copy link
Member Author

can you see if we have any issues about this (pivot and tz), I think we might.

searching on "pivot" didnt show anything tz-related

@jorisvandenbossche
Copy link
Member

This is actually close to orthogonal to the broadcast_to optimization.

Why? It seems the reason you are not using the slower/simpler np.tile / _concat_same_type is to be able to use this optimization?

this implementation is about 2x faster than one that uses _concat_same_type

This is only for the tile? What does it give in the full pivot operation?

@jbrockmendel
Copy link
Member Author

Why? It seems the reason you are not using the slower/simpler np.tile / _concat_same_type is to be able to use this optimization?

No, its also because the non-concat_same_type implementation is 2x faster, even without the broadcast_to optimization.

For arithmetic I'm soon going to need to broadcast/tile length=1 EAs to length=N, which is very wasteful without the broadcast_to; that is the forward-looking motivation for using the broadcast_to optimization.

This is only for the tile? What does it give in the full pivot operation?

No idea.

@jbrockmendel
Copy link
Member Author

Updated to use a simpler implementation that does not do any optimizations.

@jreback jreback added this to the 1.1 milestone Mar 14, 2020
@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

looks fine, can you add a release note, ping on green.

@jbrockmendel
Copy link
Member Author

ping

@jreback jreback merged commit 0ed6d53 into pandas-dev:master Mar 14, 2020
@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

thxs

@jbrockmendel
Copy link
Member Author

booyah! there goes our last usage of values_from_object

@jbrockmendel jbrockmendel deleted the no-values_from_object-reshape branch March 14, 2020 23:31
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants