Skip to content

BUG: Fix+test dataframe tranpose with datetimeTZ #23730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jbrockmendel
Copy link
Member

Reminder of the bug here:

>>> dti = pd.date_range('1977-04-15', periods=3, freq='MS', tz='US/Hawaii')
>>> df = pd.DataFrame(dti)
>>> df.T
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 2562, in transpose
    return super(DataFrame, self).transpose(1, 0, **kwargs)
  File "pandas/core/generic.py", line 686, in transpose
    new_values = self.values.transpose(axes_numbers)
  File "pandas/core/base.py", line 672, in transpose
    nv.validate_transpose(args, kwargs)
  File "pandas/compat/numpy/function.py", line 56, in __call__
    self.defaults)
  File "pandas/util/_validators.py", line 218, in validate_args_and_kwargs
    validate_kwargs(fname, kwargs, compat_args)
  File "pandas/util/_validators.py", line 157, in validate_kwargs
    _check_for_default_values(fname, kwds, compat_args)
  File "pandas/util/_validators.py", line 69, in _check_for_default_values
    format(fname=fname, arg=key)))
ValueError: the 'axes' parameter is not supported in the pandas implementation of transpose()

The implementation is an unholy mess. It can be made a little bit prettier, but to actually get a nice implementation we need to allow 2D EAs.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@@ -939,3 +939,34 @@ def test_unstack_fill_frame_object():
index=list('xyz')
)
assert_frame_equal(result, expected)


def test_transpose_dt64tz():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test transposing a mixed dtype dataframe if we aren't already (e.g. datetimetz and int)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Only real difference is that it doesn't round-trip (though I think that behavior is correct)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. Just good to ensure that there's no aggressive coercion of other data dtypes in the DataFrame once transposed i.e. the columns in df.T end up as object.

@codecov
Copy link

codecov bot commented Nov 16, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@2af56d4). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #23730   +/-   ##
=========================================
  Coverage          ?   92.29%           
=========================================
  Files             ?      161           
  Lines             ?    51478           
  Branches          ?        0           
=========================================
  Hits              ?    47511           
  Misses            ?     3967           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.68% <100%> (?)
#single 42.31% <74.07%> (?)
Impacted Files Coverage Δ
pandas/core/internals/managers.py 95.74% <100%> (ø)
pandas/core/frame.py 97.02% <100%> (ø)
pandas/core/generic.py 96.87% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2af56d4...0f38a69. Read the comment docs.

utc_values = utc_values.copy()
result = self._constructor(utc_values, **new_axes)
if self.ndim > 2:
# We're assuming DataFrame from here on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you should do this here at all. Move this to internals and just create new blocks, rather than hacking it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that it is going to be a hack regardless unless we allow 2D EAs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not going to happen any time soon if ever

as i said this is way too hacky and needs to be pushed to internals

@gfyoung gfyoung added Testing pandas testing functions or related to the test suite Datetime Datetime data dtype Bug and removed Testing pandas testing functions or related to the test suite labels Nov 16, 2018
@jbrockmendel
Copy link
Member Author

Updated to be a little prettier, but still fundamentally a kludge.

@TomAugspurger @jorisvandenbossche thoughts on this? I maintain the non-kludge way to fix this is to allow 2D DatetimeArray.

Also, will need its own issue, but adding to the "let's not layer on more kludges" pile:

>>> ts = pd.Timestamp.now().tz_localize('US/Pacific')
>>> data = np.array([[ts, ts, ts], [ts, ts, ts]])
>>> pd.DataFrame(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 416, in __init__
    copy=copy)
  File "pandas/core/frame.py", line 574, in _init_ndarray
    return create_block_manager_from_blocks([values], [columns, index])
  File "pandas/core/internals/managers.py", line 1665, in create_block_manager_from_blocks
    mgr = BlockManager(blocks, axes)
  File "pandas/core/internals/managers.py", line 118, in __init__
    self=self.ndim))
AssertionError: Number of Block dimensions (1) must equal number of axes (2)

@jreback
Copy link
Contributor

jreback commented Nov 18, 2018

@jbrockmendel I would rather you set this while transpose bizness aside. This needs to be pushed into internals, and does not need EA 2-D support to make this work. This kludge won't be accepted in its current form. Much rather get datetime EA fnished. This is not neceessary for that.

@TomAugspurger
Copy link
Contributor

Will look closely later, but I think the transpose stuff will be handled automatically by internals when DatetimeArray is an extension array.

@jbrockmendel
Copy link
Member Author

Travis fail is the test_rank segfault.

An argument against 2D DatetimeArray: freq wouldn't make much sense.

When other things clear up I'll take a look at pushing this into internals. This bug is high on my list of pet peeves at the moment, so I'm not likely to let it go.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2018

@jbrockmendel I would put this aside. This is just a hack and not worth the time / effort right now. Lots more fish to fry. This 'bug' has existed for quite a long time, and to be honest this is just a not-that interesting edge case.

@jbrockmendel
Copy link
Member Author

Closing as not actionable.

@jbrockmendel jbrockmendel deleted the transpose branch April 5, 2020 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants