-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG-22796 Concat multicolumn tz-aware DataFrame #23036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
9e760a0
5ef5dd2
c640f69
5e57d77
0730adf
2e89848
53c85aa
51ef6e5
54c153d
d6ab866
a0570bd
644b349
ce31274
93e97dc
de4c427
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,8 @@ | |
from pandas.core.dtypes.cast import maybe_promote | ||
import pandas.core.dtypes.concat as _concat | ||
|
||
from pandas.core.indexes.datetimes import DatetimeIndex | ||
|
||
import pandas.core.algorithms as algos | ||
|
||
|
||
|
@@ -186,6 +188,11 @@ def get_reindexed_values(self, empty_dtype, upcasted_na): | |
|
||
if getattr(self.block, 'is_datetimetz', False) or \ | ||
is_datetimetz(empty_dtype): | ||
if self.block is None: | ||
missing_arr = np.full(self.shape, fill_value) | ||
missing_time = DatetimeIndex(missing_arr[0], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can u walk back up the stack and see exactly where block is None- this is a guarantee iirc of this function to not be None so this might be an error higher up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Block is none when there are a mismatched number of columns between the two concatenating dataframes. I assumed this was correct behavior? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when I last changed this, I was really trying NOT to use DTI directly here as we are trying isolate things like that to higher level methods. See if you can revise, otherwise I will take a look. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, in that case, I'm not sure what to do. Clearly, block is supposed to be None in cases where the columns are mismatched:
When block is None, we have to create and return some array in get_reindexed_values, but np arrays can't have a tz dtype. I apologize if I'm missing something obvious. Feel free to take over the issue if you'd like, as I'm unsure of how to continue from here. I'll also keep thinking about it. |
||
dtype=empty_dtype) | ||
return missing_time | ||
pass | ||
elif getattr(self.block, 'is_categorical', False): | ||
pass | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,8 @@ | |
import pandas.util.testing as tm | ||
from pandas.util.testing import assert_frame_equal, assert_series_equal | ||
|
||
import pytest | ||
|
||
|
||
class TestDataFrameConcatCommon(TestData): | ||
|
||
|
@@ -53,6 +55,24 @@ def test_concat_multiple_tzs(self): | |
expected = DataFrame(dict(time=[ts2, ts3])) | ||
assert_frame_equal(results, expected) | ||
|
||
@pytest.mark.parametrize('t1', ['2015-01-01', | ||
pytest.param('pd.NaT', marks=pytest.mark.xfail( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is |
||
reason='GH23037 incorrect dtype when concatenating'))]) | ||
def test_concat_tz_NaT(self, t1): | ||
# GH 22796 | ||
# Concating tz-aware multicolumn DataFrames | ||
ts1 = Timestamp(t1, tz='UTC') | ||
ts2 = Timestamp('2015-01-01', tz='UTC') | ||
ts3 = Timestamp('2015-01-01', tz='UTC') | ||
|
||
df1 = pd.DataFrame([[ts1, ts2]]) | ||
df2 = pd.DataFrame([[ts3]]) | ||
|
||
result = pd.concat([df1, df2]) | ||
expected = DataFrame([[ts1, ts2], [ts3, pd.NaT]], index=[0, 0]) | ||
|
||
assert_frame_equal(result, expected) | ||
|
||
def test_concat_tuple_keys(self): | ||
# GH 14438 | ||
df1 = pd.DataFrame(np.ones((2, 2)), columns=list('AB')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific here? The isn't an issue with all multi-column DataFrames with tz-aware data. What are the exact conditions needed?