Skip to content

BUG-22796 Concat multicolumn tz-aware DataFrame #23036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 9, 2018
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -883,6 +883,7 @@ Reshaping
- Bug in :func:`pandas.wide_to_long` when a string is passed to the stubnames argument and a column name is a substring of that stubname (:issue:`22468`)
- Bug in :func:`merge` when merging ``datetime64[ns, tz]`` data that contained a DST transition (:issue:`18885`)
- Bug in :func:`merge_asof` when merging on float values within defined tolerance (:issue:`22981`)
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue`22796`)

Build Changes
^^^^^^^^^^^^^
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/internals/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
from pandas.core.dtypes.cast import maybe_promote
import pandas.core.dtypes.concat as _concat

from pandas.core.indexes.datetimes import DatetimeIndex

import pandas.core.algorithms as algos


Expand Down Expand Up @@ -186,6 +188,11 @@ def get_reindexed_values(self, empty_dtype, upcasted_na):

if getattr(self.block, 'is_datetimetz', False) or \
is_datetimetz(empty_dtype):
if self.block is None:
missing_arr = np.full(self.shape, fill_value)
missing_time = DatetimeIndex(missing_arr[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u walk back up the stack and see exactly where block is None- this is a guarantee iirc of this function to not be None so this might be an error higher up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block is none when there are a mismatched number of columns between the two concatenating dataframes. I assumed this was correct behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I last changed this, I was really trying NOT to use DTI directly here as we are trying isolate things like that to higher level methods.

See if you can revise, otherwise I will take a look.

Copy link
Contributor Author

@tonytao2012 tonytao2012 Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in that case, I'm not sure what to do. Clearly, block is supposed to be None in cases where the columns are mismatched:

    for blkno, placements in libinternals.get_blkno_placements(blknos,
                                                               mgr.nblocks,
                                                               group=False):

        assert placements.is_slice_like

        join_unit_indexers = indexers.copy()

        shape = list(mgr_shape)
        shape[0] = len(placements)
        shape = tuple(shape)

        if blkno == -1:
            unit = JoinUnit(None, shape)

When block is None, we have to create and return some array in get_reindexed_values, but np arrays can't have a tz dtype. I apologize if I'm missing something obvious. Feel free to take over the issue if you'd like, as I'm unsure of how to continue from here. I'll also keep thinking about it.

dtype=empty_dtype)
return missing_time
pass
elif getattr(self.block, 'is_categorical', False):
pass
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/frame/test_combine_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import pandas.util.testing as tm
from pandas.util.testing import assert_frame_equal, assert_series_equal

import pytest


class TestDataFrameConcatCommon(TestData):

Expand Down Expand Up @@ -53,6 +55,24 @@ def test_concat_multiple_tzs(self):
expected = DataFrame(dict(time=[ts2, ts3]))
assert_frame_equal(results, expected)

@pytest.mark.parametrize('t1', ['2015-01-01',
pytest.param(pd.NaT, marks=pytest.mark.xfail(
reason='GH23037 incorrect dtype when concatenating'))])
def test_concat_tz_NaT(self, t1):
# GH 22796
# Concating tz-aware multicolumn DataFrames
ts1 = Timestamp(t1, tz='UTC')
ts2 = Timestamp('2015-01-01', tz='UTC')
ts3 = Timestamp('2015-01-01', tz='UTC')

df1 = DataFrame([[ts1, ts2]])
df2 = DataFrame([[ts3]])

result = pd.concat([df1, df2])
expected = DataFrame([[ts1, ts2], [ts3, pd.NaT]], index=[0, 0])

assert_frame_equal(result, expected)

def test_concat_tuple_keys(self):
# GH 14438
df1 = pd.DataFrame(np.ones((2, 2)), columns=list('AB'))
Expand Down