Skip to content

BUG: fix tzaware dataframe transpose bug #26825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 27, 2019
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c9130f8
BUG: fix tzaware dataframe transpose bug
jbrockmendel Jun 13, 2019
908465a
move TestTranspose
jbrockmendel Jun 13, 2019
2b89d35
actually save
jbrockmendel Jun 13, 2019
7bcdf16
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 13, 2019
f5759e6
troubleshoot windows fails
jbrockmendel Jun 13, 2019
3419983
Fix one more FIXME
jbrockmendel Jun 13, 2019
528015e
separate out _recast_datetimelike_Result
jbrockmendel Jun 13, 2019
508f8ae
Add GH references to tests
jbrockmendel Jun 13, 2019
c64d31f
add whatsnew
jbrockmendel Jun 13, 2019
6bd1a0a
annotation, typo fixup
jbrockmendel Jun 13, 2019
baacaaf
dont alter inplace
jbrockmendel Jun 13, 2019
c23edcc
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 16, 2019
b559753
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 17, 2019
e39370c
use maybe_convert_objects
jbrockmendel Jun 17, 2019
00b31e4
xfail tests where possible
jbrockmendel Jun 17, 2019
0a9a886
simplify list comprehension
jbrockmendel Jun 17, 2019
e88bc00
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 19, 2019
3c49874
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 24, 2019
5c38a76
single assignment
jbrockmendel Jun 24, 2019
657aa0c
fall through to create_block_manager_from_blocks
jbrockmendel Jun 24, 2019
be106cc
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 25, 2019
820c4e4
Fix assignment error
jbrockmendel Jun 25, 2019
8b2372e
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Jun 26, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -761,6 +761,7 @@ Reshaping
- Bug in :func:`DataFrame.sort_index` where an error is thrown when a multi-indexed ``DataFrame`` is sorted on all levels with the initial level sorted last (:issue:`26053`)
- Bug in :meth:`Series.nlargest` treats ``True`` as smaller than ``False`` (:issue:`26154`)
- Bug in :func:`DataFrame.pivot_table` with a :class:`IntervalIndex` as pivot index would raise ``TypeError`` (:issue:`25814`)
- Bug in :meth:`DataFrame.transpose` where transposing a DataFrame with a timezone-aware datetime column would incorrectly raise ``ValueError`` (:issue:`26825`)

Sparse
^^^^^^
Expand All @@ -776,6 +777,7 @@ Other
- Removed unused C functions from vendored UltraJSON implementation (:issue:`26198`)
- Bug in :func:`factorize` when passing an ``ExtensionArray`` with a custom ``na_sentinel`` (:issue:`25696`).
- Allow :class:`Index` and :class:`RangeIndex` to be passed to numpy ``min`` and ``max`` functions (:issue:`26125`)
- Bug in :class:`DataFrame` where passing an object array of timezone-aware `datetime` objects would incorrectly raise ``ValueError`` (:issue:`13287`)

.. _whatsnew_0.250.contributors:

Expand Down
50 changes: 37 additions & 13 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,12 @@
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender, Substitution

from pandas.core.dtypes.cast import maybe_downcast_to_dtype
from pandas.core.dtypes.cast import (
maybe_convert_objects, maybe_downcast_to_dtype)
from pandas.core.dtypes.common import (
ensure_int64, ensure_platform_int, is_bool, is_datetimelike,
is_integer_dtype, is_interval_dtype, is_numeric_dtype, is_scalar)
is_integer_dtype, is_interval_dtype, is_numeric_dtype, is_object_dtype,
is_scalar)
from pandas.core.dtypes.missing import isna, notna

from pandas._typing import FrameOrSeries
Expand Down Expand Up @@ -334,7 +336,6 @@ def _decide_output_index(self, output, labels):

def _wrap_applied_output(self, keys, values, not_indexed_same=False):
from pandas.core.index import _all_indexes_same
from pandas.core.tools.numeric import to_numeric

if len(keys) == 0:
return DataFrame(index=keys)
Expand Down Expand Up @@ -406,7 +407,6 @@ def first_not_none(values):
# provide a reduction (Frame -> Series) if groups are
# unique
if self.squeeze:

# assign the name to this series
if singular_series:
values[0].name = keys[0]
Expand Down Expand Up @@ -480,15 +480,8 @@ def first_not_none(values):
# if we have date/time like in the original, then coerce dates
# as we are stacking can easily have object dtypes here
so = self._selected_obj
if (so.ndim == 2 and so.dtypes.apply(is_datetimelike).any()):
result = result.apply(
lambda x: to_numeric(x, errors='ignore'))
date_cols = self._selected_obj.select_dtypes(
include=['datetime', 'timedelta']).columns
date_cols = date_cols.intersection(result.columns)
result[date_cols] = (result[date_cols]
._convert(datetime=True,
coerce=True))
if so.ndim == 2 and so.dtypes.apply(is_datetimelike).any():
result = _recast_datetimelike_result(result)
else:
result = result._convert(datetime=True)

Expand Down Expand Up @@ -1710,3 +1703,34 @@ def _normalize_keyword_aggregation(kwargs):
order.append((column,
com.get_callable_name(aggfunc) or aggfunc))
return aggspec, columns, order


def _recast_datetimelike_result(result: DataFrame) -> DataFrame:
"""
If we have date/time like in the original, then coerce dates
as we are stacking can easily have object dtypes here.

Parameters
----------
result : DataFrame

Returns
-------
DataFrame

Notes
-----
- Assumes Groupby._selected_obj has ndim==2 and at least one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note doesn't seem relevant as you are passing in frame right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't obvious to me bc were talking about the dimensions of two separate objects. Are they necessarily the same?

datetimelike column
"""
result = result.copy()

ocols = [idx for idx in range(len(result.columns))
if is_object_dtype(result.dtypes[idx])]

for cidx in ocols:
cvals = result.iloc[:, cidx].values
result.iloc[:, cidx] = maybe_convert_objects(cvals,
convert_numeric=False)

return result
22 changes: 21 additions & 1 deletion pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,27 @@ def init_ndarray(values, index, columns, dtype=None, copy=False):
# on the entire block; this is to convert if we have datetimelike's
# embedded in an object type
if dtype is None and is_object_dtype(values):
values = maybe_infer_to_datetimelike(values)

if values.ndim == 2 and values.shape[0] != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much more messy, can we change something else to make this nicer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I'm looking into the other places where maybe_infer_to_datetimelike is used to see if some of this can go into that. We could separate this whole block into a dedicated function. But one way or another we need to bite the bullet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the inside of list loop should be in pandas.core.dtypes.cast, no? (obviously up until you make the blocks themselves)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave this for the next pass when I'm taking a more systematic look at maybe_infer_to_datetimelike

# transpose and separate blocks

dvals_list = [maybe_infer_to_datetimelike(row) for row in values]
for n in range(len(dvals_list)):
if isinstance(dvals_list[n], np.ndarray):
dvals_list[n] = dvals_list[n].reshape(1, -1)

from pandas.core.internals.blocks import make_block

# TODO: What about re-joining object columns?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls reuse the block creation routines below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempts so far have broken everything. do you have a particular routine in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is you can remove the create_block_manager_from_blocks and let it fall thru to 184 with I think a very small change, e.g.


if .....
     blocks = bvals
else:
     dvals = .......
      blocks = [dvals]

of course pls use a longer name than dvals

bdvals = [make_block(dvals_list[n], placement=[n])
for n in range(len(dvals_list))]
return create_block_manager_from_blocks(bdvals,
[columns, index])

else:
dvals = maybe_infer_to_datetimelike(values)

values = dvals

return create_block_manager_from_blocks([values], [columns, index])

Expand Down
Loading