Skip to content

Concat of dataframe with tz-aware datetime column against dataframe without, fails #22796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bchu opened this issue Sep 20, 2018 · 3 comments · Fixed by #23036
Closed

Concat of dataframe with tz-aware datetime column against dataframe without, fails #22796

bchu opened this issue Sep 20, 2018 · 3 comments · Fixed by #23036
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@bchu
Copy link

bchu commented Sep 20, 2018

a = pd.DataFrame([[1, 2]], dtype='datetime64[ns, UTC]')
b = pd.DataFrame([[3]], dtype='datetime64[ns, UTC]')
pd.concat([a, b])

Fails:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-457226d62f27> in <module>()
      1 a = pd.DataFrame([[1, 2]], dtype='datetime64[ns, UTC]')
      2 b = pd.DataFrame([[3]], dtype='datetime64[ns, UTC]')
----> 3 pd.concat([a, b])

~/.pyenv/versions/3.6.2/envs/general/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    224                        verify_integrity=verify_integrity,
    225                        copy=copy, sort=sort)
--> 226     return op.get_result()
    227 
    228 

~/.pyenv/versions/3.6.2/envs/general/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self)
    421             new_data = concatenate_block_managers(
    422                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 423                 copy=self.copy)
    424             if not self.copy:
    425                 new_data._consolidate_inplace()

~/.pyenv/versions/3.6.2/envs/general/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   5419         else:
   5420             b = make_block(
-> 5421                 concatenate_join_units(join_units, concat_axis, copy=copy),
   5422                 placement=placement)
   5423         blocks.append(b)

~/.pyenv/versions/3.6.2/envs/general/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
   5563     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   5564                                          upcasted_na=upcasted_na)
-> 5565                  for ju in join_units]
   5566 
   5567     if len(to_concat) == 1:

~/.pyenv/versions/3.6.2/envs/general/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0)
   5563     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   5564                                          upcasted_na=upcasted_na)
-> 5565                  for ju in join_units]
   5566 
   5567     if len(to_concat) == 1:

~/.pyenv/versions/3.6.2/envs/general/lib/python3.6/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   5849 
   5850             if not self.indexers:
-> 5851                 if not self.block._can_consolidate:
   5852                     # preserve these for validation in _concat_compat
   5853                     return self.block.values

AttributeError: 'NoneType' object has no attribute '_can_consolidate'

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.0.6
pip: 9.0.1
setuptools: 39.1.0
Cython: 0.28.3
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

@mroeschke
Copy link
Member

Thanks for the report, this seems to only be an issue with axis=0 and merging multicolumn DataFrames with tz-aware data

I suspect it has something to do with the fact that the DatetimeTZBlock only stores data in a 1D array. Investigations welcome!

In [13]: pd.__version__
Out[13]: '0.24.0.dev0+624.g4612a8282'

In [14]: a = pd.DataFrame([[1, 2]], dtype='datetime64[ns, UTC]')
    ...: b = pd.DataFrame([[3]], dtype='datetime64[ns, UTC]')
    ...: pd.concat([a, b], axis=1)
    ...:
    ...:
Out[14]:
                                    0                 ...                                                   0
0 1970-01-01 00:00:00.000000001+00:00                 ...                 1970-01-01 00:00:00.000000003+00:00

[1 rows x 3 columns]

In [25]: a = pd.DataFrame([[1, 2]], dtype='datetime64[ns, UTC]')
    ...: b = pd.DataFrame([[3]], dtype='datetime64[ns, UTC]')
    ...: pd.concat([a.iloc[:, [0]], b])
    ...:
    ...:
Out[25]:
                                    0
0 1970-01-01 00:00:00.000000001+00:00
0 1970-01-01 00:00:00.000000003+00:00

@mroeschke mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Sep 20, 2018
@mroeschke mroeschke added this to the Contributions Welcome milestone Sep 20, 2018
@loneosama
Copy link

loneosama commented Sep 25, 2018

@mroeschke @bchu i think that it is logical that they didn't allow the option of axis = 0 for this case.
Let say instead of date time we had numeric data like

print(pd.__version__)
a = pd.DataFrame([[1, 2]])
b = pd.DataFrame([[3]] )
c = pd.concat([a, b],axis=0)
print(a)
print(b)
print(c)

the output would be something like
a
   0  1
0  1  2
b
   0
0  3
c
   0    1
0  1  2.0
0  3  NaN


the thing is that when there was not a number python defaulted to NaN
but if we were to do the same thing with datetime there is not an NaD(Not a Date) and the 1970 
00:00:00 would be a wrong value to put there so the behavior is already correct that's why i think that 
this operation with axis 0 should not be permitted. 

@mroeschke
Copy link
Member

The missing value should be filled with NaT

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants