Skip to content

BUG: fix tzaware dataframe transpose bug #26825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 27, 2019
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c9130f8
BUG: fix tzaware dataframe transpose bug
jbrockmendel Jun 13, 2019
908465a
move TestTranspose
jbrockmendel Jun 13, 2019
2b89d35
actually save
jbrockmendel Jun 13, 2019
7bcdf16
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 13, 2019
f5759e6
troubleshoot windows fails
jbrockmendel Jun 13, 2019
3419983
Fix one more FIXME
jbrockmendel Jun 13, 2019
528015e
separate out _recast_datetimelike_Result
jbrockmendel Jun 13, 2019
508f8ae
Add GH references to tests
jbrockmendel Jun 13, 2019
c64d31f
add whatsnew
jbrockmendel Jun 13, 2019
6bd1a0a
annotation, typo fixup
jbrockmendel Jun 13, 2019
baacaaf
dont alter inplace
jbrockmendel Jun 13, 2019
c23edcc
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 16, 2019
b559753
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 17, 2019
e39370c
use maybe_convert_objects
jbrockmendel Jun 17, 2019
00b31e4
xfail tests where possible
jbrockmendel Jun 17, 2019
0a9a886
simplify list comprehension
jbrockmendel Jun 17, 2019
e88bc00
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 19, 2019
3c49874
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 24, 2019
5c38a76
single assignment
jbrockmendel Jun 24, 2019
657aa0c
fall through to create_block_manager_from_blocks
jbrockmendel Jun 24, 2019
be106cc
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 25, 2019
820c4e4
Fix assignment error
jbrockmendel Jun 25, 2019
8b2372e
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 26, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,7 @@ Reshaping
- Bug in :func:`pandas.cut` where large bins could incorrectly raise an error due to an integer overflow (:issue:`26045`)
- Bug in :func:`DataFrame.sort_index` where an error is thrown when a multi-indexed ``DataFrame`` is sorted on all levels with the initial level sorted last (:issue:`26053`)
- Bug in :meth:`Series.nlargest` treats ``True`` as smaller than ``False`` (:issue:`26154`)
- Bug in :meth:`DataFrame.transpose` where transposing a DataFrame with a timezone-aware datetime column would incorrectly raise ``ValueError`` (:issue:`26825`)

Sparse
^^^^^^
Expand All @@ -735,6 +736,7 @@ Other
- Removed unused C functions from vendored UltraJSON implementation (:issue:`26198`)
- Bug in :func:`factorize` when passing an ``ExtensionArray`` with a custom ``na_sentinel`` (:issue:`25696`).
- Allow :class:`Index` and :class:`RangeIndex` to be passed to numpy ``min`` and ``max`` functions (:issue:`26125`)
- Bug in :class:`DataFrame` where passing an object array of timezone-aware `datetime` objects would incorrectly raise ``ValueError`` (:issue:`13287`)

.. _whatsnew_0.250.contributors:

Expand Down
46 changes: 35 additions & 11 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,6 @@ def _decide_output_index(self, output, labels):

def _wrap_applied_output(self, keys, values, not_indexed_same=False):
from pandas.core.index import _all_indexes_same
from pandas.core.tools.numeric import to_numeric

if len(keys) == 0:
return DataFrame(index=keys)
Expand Down Expand Up @@ -360,7 +359,6 @@ def first_not_none(values):
# provide a reduction (Frame -> Series) if groups are
# unique
if self.squeeze:

# assign the name to this series
if singular_series:
values[0].name = keys[0]
Expand Down Expand Up @@ -434,15 +432,8 @@ def first_not_none(values):
# if we have date/time like in the original, then coerce dates
# as we are stacking can easily have object dtypes here
so = self._selected_obj
if (so.ndim == 2 and so.dtypes.apply(is_datetimelike).any()):
result = result.apply(
lambda x: to_numeric(x, errors='ignore'))
date_cols = self._selected_obj.select_dtypes(
include=['datetime', 'timedelta']).columns
date_cols = date_cols.intersection(result.columns)
result[date_cols] = (result[date_cols]
._convert(datetime=True,
coerce=True))
if so.ndim == 2 and so.dtypes.apply(is_datetimelike).any():
_recast_datetimelike_result(result)
else:
result = result._convert(datetime=True)

Expand Down Expand Up @@ -1664,3 +1655,36 @@ def _normalize_keyword_aggregation(kwargs):
order.append((column,
com.get_callable_name(aggfunc) or aggfunc))
return aggspec, columns, order


def _recast_datetimelike_result(result):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i would put this in pandas.core.dtypes.cast, our dumping ground for casting, right after / before maybe_infer_to-datetimelike (and you can de-privatize)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really kludgy code (and is replacing equally kludgy code; kind of like turtles all the way down). I'd rather keep it close to its only usage and hope it is ripped out by someone who knows this part of the code better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i said this look a lot like maybe_convert_objects, try to use that here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also: we only have 5 tests that go through this path

"""
If we have date/time like in the original, then coerce dates
as we are stacking can easily have object dtypes here.

Parameters
----------
result : DataFrame

Notes
-----
- Assumes Groupby._selected_obj has ndim==2 and at least one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note doesn't seem relevant as you are passing in frame right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't obvious to me bc were talking about the dimensions of two separate objects. Are they necessarily the same?

datetimelike column
- Modifies `result` inplace
"""
ocols = [idx for idx in range(len(result.columns))
if result.dtypes[idx] == object]

for cidx in ocols:
# TODO: get maybe_convert_objects working here
cvals = result.iloc[:, cidx]

# TODO: should we use skipna=True?
exdtype = lib.infer_dtype(cvals.values, skipna=False)
if exdtype == 'integer':
result.iloc[:, cidx] = cvals.astype(np.int64)
if exdtype in ['float', 'mixed-integer-float']:
result.iloc[:, cidx] = cvals.astype(np.float64)
if exdtype == 'datetime':
# TODO: what about z-aware?
result.iloc[:, cidx] = cvals.astype('M8[ns]')
24 changes: 23 additions & 1 deletion pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,29 @@ def init_ndarray(values, index, columns, dtype=None, copy=False):
# on the entire block; this is to convert if we have datetimelike's
# embedded in an object type
if dtype is None and is_object_dtype(values):
values = maybe_infer_to_datetimelike(values)

if values.ndim == 2 and values.shape[0] != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much more messy, can we change something else to make this nicer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I'm looking into the other places where maybe_infer_to_datetimelike is used to see if some of this can go into that. We could separate this whole block into a dedicated function. But one way or another we need to bite the bullet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the inside of list loop should be in pandas.core.dtypes.cast, no? (obviously up until you make the blocks themselves)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave this for the next pass when I'm taking a more systematic look at maybe_infer_to_datetimelike

# kludge to transpose and separate blocks
# unnecessary if we ever allow 2D DatetimeArray

dvals_list = [maybe_infer_to_datetimelike(values[n, :])
for n in range(len(values))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a list comprehension & use enumerate if you must., this is very hard to read.

for n in range(len(dvals_list)):
if isinstance(dvals_list[n], np.ndarray):
dvals_list[n] = dvals_list[n].reshape(1, -1)

from pandas.core.internals.blocks import make_block

# TODO: What about re-joining object columns?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls reuse the block creation routines below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempts so far have broken everything. do you have a particular routine in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is you can remove the create_block_manager_from_blocks and let it fall thru to 184 with I think a very small change, e.g.


if .....
     blocks = bvals
else:
     dvals = .......
      blocks = [dvals]

of course pls use a longer name than dvals

bdvals = [make_block(dvals_list[n], placement=[n])
for n in range(len(dvals_list))]
return create_block_manager_from_blocks(bdvals,
[columns, index])

else:
dvals = maybe_infer_to_datetimelike(values)

values = dvals

return create_block_manager_from_blocks([values], [columns, index])

Expand Down
93 changes: 35 additions & 58 deletions pandas/tests/arithmetic/test_datetime64.py
Original file line number Diff line number Diff line change
Expand Up @@ -438,8 +438,7 @@ def test_dti_cmp_null_scalar_inequality(self, tz_naive_fixture, other,
# GH#19301
tz = tz_naive_fixture
dti = pd.date_range('2016-01-01', periods=2, tz=tz)
# FIXME: ValueError with transpose
dtarr = tm.box_expected(dti, box_with_array, transpose=False)
dtarr = tm.box_expected(dti, box_with_array)
msg = 'Invalid comparison between'
with pytest.raises(TypeError, match=msg):
dtarr < other
Expand Down Expand Up @@ -592,15 +591,15 @@ def test_comparison_tzawareness_compat(self, op, box_with_array):
dr = pd.date_range('2016-01-01', periods=6)
dz = dr.tz_localize('US/Pacific')

# FIXME: ValueError with transpose
dr = tm.box_expected(dr, box_with_array, transpose=False)
dz = tm.box_expected(dz, box_with_array, transpose=False)
dr = tm.box_expected(dr, box_with_array)
dz = tm.box_expected(dz, box_with_array)

msg = 'Cannot compare tz-naive and tz-aware'
with pytest.raises(TypeError, match=msg):
op(dr, dz)
if box_with_array is not pd.DataFrame:
# DataFrame op is invalid until transpose bug is fixed
# FIXME: DataFrame case fails to raise for == and !=, wrong
# message for inequalities
with pytest.raises(TypeError, match=msg):
op(dr, list(dz))
with pytest.raises(TypeError, match=msg):
Expand All @@ -609,7 +608,8 @@ def test_comparison_tzawareness_compat(self, op, box_with_array):
with pytest.raises(TypeError, match=msg):
op(dz, dr)
if box_with_array is not pd.DataFrame:
# DataFrame op is invalid until transpose bug is fixed
# FIXME: DataFrame case fails to raise for == and !=, wrong
# message for inequalities
with pytest.raises(TypeError, match=msg):
op(dz, list(dr))
with pytest.raises(TypeError, match=msg):
Expand All @@ -620,8 +620,8 @@ def test_comparison_tzawareness_compat(self, op, box_with_array):
assert_all(dr == dr)
assert_all(dz == dz)
if box_with_array is not pd.DataFrame:
# DataFrame doesn't align the lists correctly unless we transpose,
# which we cannot do at the moment
# FIXME: DataFrame case fails to raise for == and !=, wrong
# message for inequalities
assert (dr == list(dr)).all()
assert (dz == list(dz)).all()

Expand Down Expand Up @@ -652,8 +652,7 @@ def test_scalar_comparison_tzawareness(self, op, other, tz_aware_fixture,
tz = tz_aware_fixture
dti = pd.date_range('2016-01-01', periods=2, tz=tz)

# FIXME: ValueError with transpose
dtarr = tm.box_expected(dti, box_with_array, transpose=False)
dtarr = tm.box_expected(dti, box_with_array)
msg = 'Cannot compare tz-naive and tz-aware'
with pytest.raises(TypeError, match=msg):
op(dtarr, other)
Expand Down Expand Up @@ -715,17 +714,16 @@ def test_dt64arr_cmp_scalar_invalid(self, other, tz_naive_fixture,
xbox = box_with_array if box_with_array is not pd.Index else np.ndarray

rng = date_range('1/1/2000', periods=10, tz=tz)
# FIXME: ValueError with transpose
rng = tm.box_expected(rng, box_with_array, transpose=False)
rng = tm.box_expected(rng, box_with_array)

result = rng == other
expected = np.array([False] * 10)
expected = tm.box_expected(expected, xbox, transpose=False)
expected = tm.box_expected(expected, xbox)
tm.assert_equal(result, expected)

result = rng != other
expected = np.array([True] * 10)
expected = tm.box_expected(expected, xbox, transpose=False)
expected = tm.box_expected(expected, xbox)
tm.assert_equal(result, expected)
msg = 'Invalid comparison between'
with pytest.raises(TypeError, match=msg):
Expand Down Expand Up @@ -816,9 +814,8 @@ def test_dt64arr_add_timedeltalike_scalar(self, tz_naive_fixture,
expected = pd.date_range('2000-01-01 02:00',
'2000-02-01 02:00', tz=tz)

# FIXME: calling with transpose=True raises ValueError
rng = tm.box_expected(rng, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array, transpose=False)
rng = tm.box_expected(rng, box_with_array)
expected = tm.box_expected(expected, box_with_array)

result = rng + two_hours
tm.assert_equal(result, expected)
Expand All @@ -831,9 +828,8 @@ def test_dt64arr_iadd_timedeltalike_scalar(self, tz_naive_fixture,
expected = pd.date_range('2000-01-01 02:00',
'2000-02-01 02:00', tz=tz)

# FIXME: calling with transpose=True raises ValueError
rng = tm.box_expected(rng, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array, transpose=False)
rng = tm.box_expected(rng, box_with_array)
expected = tm.box_expected(expected, box_with_array)

rng += two_hours
tm.assert_equal(rng, expected)
Expand All @@ -846,9 +842,8 @@ def test_dt64arr_sub_timedeltalike_scalar(self, tz_naive_fixture,
expected = pd.date_range('1999-12-31 22:00',
'2000-01-31 22:00', tz=tz)

# FIXME: calling with transpose=True raises ValueError
rng = tm.box_expected(rng, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array, transpose=False)
rng = tm.box_expected(rng, box_with_array)
expected = tm.box_expected(expected, box_with_array)

result = rng - two_hours
tm.assert_equal(result, expected)
Expand All @@ -861,9 +856,8 @@ def test_dt64arr_isub_timedeltalike_scalar(self, tz_naive_fixture,
expected = pd.date_range('1999-12-31 22:00',
'2000-01-31 22:00', tz=tz)

# FIXME: calling with transpose=True raises ValueError
rng = tm.box_expected(rng, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array, transpose=False)
rng = tm.box_expected(rng, box_with_array)
expected = tm.box_expected(expected, box_with_array)

rng -= two_hours
tm.assert_equal(rng, expected)
Expand Down Expand Up @@ -918,9 +912,6 @@ def test_dt64arr_add_sub_td64_nat(self, box_with_array, tz_naive_fixture):

def test_dt64arr_add_sub_td64ndarray(self, tz_naive_fixture,
box_with_array):
if box_with_array is pd.DataFrame:
pytest.xfail("FIXME: ValueError with transpose; "
"alignment error without")

tz = tz_naive_fixture
dti = pd.date_range('2016-01-01', periods=3, tz=tz)
Expand All @@ -942,7 +933,7 @@ def test_dt64arr_add_sub_td64ndarray(self, tz_naive_fixture,

result = dtarr - tdarr
tm.assert_equal(result, expected)
msg = 'cannot subtract'
msg = 'cannot subtract|bad operand type for unary -'
with pytest.raises(TypeError, match=msg):
tdarr - dtarr

Expand Down Expand Up @@ -987,34 +978,31 @@ def test_dt64arr_sub_timestamp(self, box_with_array):
tz='US/Eastern')
ts = ser[0]

# FIXME: transpose raises ValueError
ser = tm.box_expected(ser, box_with_array, transpose=False)
ser = tm.box_expected(ser, box_with_array)

delta_series = pd.Series([np.timedelta64(0, 'D'),
np.timedelta64(1, 'D')])
expected = tm.box_expected(delta_series, box_with_array,
transpose=False)
expected = tm.box_expected(delta_series, box_with_array)

tm.assert_equal(ser - ts, expected)
tm.assert_equal(ts - ser, -expected)

def test_dt64arr_sub_NaT(self, box_with_array):
# GH#18808
dti = pd.DatetimeIndex([pd.NaT, pd.Timestamp('19900315')])
ser = tm.box_expected(dti, box_with_array, transpose=False)
ser = tm.box_expected(dti, box_with_array)

result = ser - pd.NaT
expected = pd.Series([pd.NaT, pd.NaT], dtype='timedelta64[ns]')
# FIXME: raises ValueError with transpose
expected = tm.box_expected(expected, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array)
tm.assert_equal(result, expected)

dti_tz = dti.tz_localize('Asia/Tokyo')
ser_tz = tm.box_expected(dti_tz, box_with_array, transpose=False)
ser_tz = tm.box_expected(dti_tz, box_with_array)

result = ser_tz - pd.NaT
expected = pd.Series([pd.NaT, pd.NaT], dtype='timedelta64[ns]')
expected = tm.box_expected(expected, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array)
tm.assert_equal(result, expected)

# -------------------------------------------------------------
Expand All @@ -1034,16 +1022,13 @@ def test_dt64arr_naive_sub_dt64ndarray(self, box_with_array):

def test_dt64arr_aware_sub_dt64ndarray_raises(self, tz_aware_fixture,
box_with_array):
if box_with_array is pd.DataFrame:
pytest.xfail("FIXME: ValueError with transpose; "
"alignment error without")

tz = tz_aware_fixture
dti = pd.date_range('2016-01-01', periods=3, tz=tz)
dt64vals = dti.values

dtarr = tm.box_expected(dti, box_with_array)
msg = 'DatetimeArray subtraction must have the same timezones or'
msg = 'subtraction must have the same timezones or'
with pytest.raises(TypeError, match=msg):
dtarr - dt64vals
with pytest.raises(TypeError, match=msg):
Expand All @@ -1054,9 +1039,6 @@ def test_dt64arr_aware_sub_dt64ndarray_raises(self, tz_aware_fixture,

def test_dt64arr_add_dt64ndarray_raises(self, tz_naive_fixture,
box_with_array):
if box_with_array is pd.DataFrame:
pytest.xfail("FIXME: ValueError with transpose; "
"alignment error without")

tz = tz_naive_fixture
dti = pd.date_range('2016-01-01', periods=3, tz=tz)
Expand Down Expand Up @@ -1204,9 +1186,8 @@ def test_dti_add_tick_tzaware(self, tz_aware_fixture, box_with_array):
expected = DatetimeIndex(['2010-11-01 05:00', '2010-11-01 06:00',
'2010-11-01 07:00'], freq='H', tz=tz)

# FIXME: these raise ValueError with transpose=True
dates = tm.box_expected(dates, box_with_array, transpose=False)
expected = tm.box_expected(expected, box_with_array, transpose=False)
dates = tm.box_expected(dates, box_with_array)
expected = tm.box_expected(expected, box_with_array)

# TODO: parametrize over the scalar being added? radd? sub?
offset = dates + pd.offsets.Hour(5)
Expand Down Expand Up @@ -1359,26 +1340,25 @@ def test_dt64arr_add_sub_DateOffset(self, box_with_array):

s = DatetimeIndex([Timestamp('2000-01-15 00:15:00', tz='US/Central'),
Timestamp('2000-02-15', tz='US/Central')], name='a')
# FIXME: ValueError with tzaware DataFrame transpose
s = tm.box_expected(s, box_with_array, transpose=False)
s = tm.box_expected(s, box_with_array)
result = s + pd.offsets.Day()
result2 = pd.offsets.Day() + s
exp = DatetimeIndex([Timestamp('2000-01-16 00:15:00', tz='US/Central'),
Timestamp('2000-02-16', tz='US/Central')],
name='a')
exp = tm.box_expected(exp, box_with_array, transpose=False)
exp = tm.box_expected(exp, box_with_array)
tm.assert_equal(result, exp)
tm.assert_equal(result2, exp)

s = DatetimeIndex([Timestamp('2000-01-15 00:15:00', tz='US/Central'),
Timestamp('2000-02-15', tz='US/Central')], name='a')
s = tm.box_expected(s, box_with_array, transpose=False)
s = tm.box_expected(s, box_with_array)
result = s + pd.offsets.MonthEnd()
result2 = pd.offsets.MonthEnd() + s
exp = DatetimeIndex([Timestamp('2000-01-31 00:15:00', tz='US/Central'),
Timestamp('2000-02-29', tz='US/Central')],
name='a')
exp = tm.box_expected(exp, box_with_array, transpose=False)
exp = tm.box_expected(exp, box_with_array)
tm.assert_equal(result, exp)
tm.assert_equal(result2, exp)

Expand Down Expand Up @@ -1415,9 +1395,6 @@ def test_dt64arr_add_mixed_offset_array(self, box_with_array):
def test_dt64arr_add_sub_offset_ndarray(self, tz_naive_fixture,
box_with_array):
# GH#18849
if box_with_array is pd.DataFrame:
pytest.xfail("FIXME: ValueError with transpose; "
"alignment error without")

tz = tz_naive_fixture
dti = pd.date_range('2017-01-01', periods=2, tz=tz)
Expand Down
Loading