Skip to content

DataFrame with datetime column cannot concat with non-identical columns #26288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ruiann opened this issue May 6, 2019 · 9 comments
Open
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type

Comments

@ruiann
Copy link
Contributor

ruiann commented May 6, 2019

Code Sample, a copy-pastable example if possible

In [12]: df1 = pd.DataFrame({"A": pd.core.arrays.SparseArray(pd.date_range("2000", periods=2)), "B": [1, 2]})

In [13]: df2 = pd.DataFrame({"B": [1, 2]})

In [14]: pd.concat([df1, df2])
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-14-6197998bc0b3> in <module>
----> 1 pd.concat([df1, df2])

~/sandbox/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    253     )
    254
--> 255     return op.get_result()
    256
    257

~/sandbox/pandas/pandas/core/reshape/concat.py in get_result(self)
    468
    469             new_data = concatenate_block_managers(
--> 470                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
    471             )
    472             if not self.copy:

~/sandbox/pandas/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   1992         else:
   1993             b = make_block(
-> 1994                 concatenate_join_units(join_units, concat_axis, copy=copy),
   1995                 placement=placement,
   1996             )

~/sandbox/pandas/pandas/core/internals/concat.py in concatenate_join_units(join_units, concat_axis, copy)
    262                 concat_values = concat_values.copy()
    263     else:
--> 264         concat_values = concat_compat(to_concat, axis=concat_axis)
    265
    266     return concat_values

~/sandbox/pandas/pandas/core/dtypes/concat.py in concat_compat(to_concat, axis)
    111
    112     elif _contains_datetime or "timedelta" in typs or _contains_period:
--> 113         return concat_datetime(to_concat, axis=axis, typs=typs)
    114
    115     # these are mandated to handle empties as well

~/sandbox/pandas/pandas/core/dtypes/concat.py in concat_datetime(to_concat, axis, typs)
    387     if len(typs) != 1:
    388         return _concatenate_2d(
--> 389             [_convert_datetimelike_to_object(x) for x in to_concat], axis=axis
    390         )
    391

~/sandbox/pandas/pandas/core/dtypes/concat.py in <listcomp>(.0)
    387     if len(typs) != 1:
    388         return _concatenate_2d(
--> 389             [_convert_datetimelike_to_object(x) for x in to_concat], axis=axis
    390         )
    391

~/sandbox/pandas/pandas/core/dtypes/concat.py in _convert_datetimelike_to_object(x)
    422         else:
    423             shape = x.shape
--> 424             x = tslib.ints_to_pydatetime(x.view(np.int64).ravel(), box="timestamp")
    425             x = x.reshape(shape)
    426

~/sandbox/pandas/pandas/core/arrays/base.py in view(self, dtype)
    907         #   giving a view with the same dtype as self.
    908         if dtype is not None:
--> 909             raise NotImplementedError(dtype)
    910         return self[:]
    911

NotImplementedError: <class 'numpy.int64'>
@gfyoung gfyoung added Sparse Sparse Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode Needs Info Clarification about behavior needed to assess issue labels May 6, 2019
@gfyoung
Copy link
Member

gfyoung commented May 6, 2019

cc @jreback

@ruiann : Thanks for reporting this! Can you provide some more code to allow us to run this?

@ruiann
Copy link
Contributor Author

ruiann commented May 7, 2019

@gfyoung

import pandas as pd


a = pd.DataFrame([{'t': '1970-01-01', 'id': 1, 'v': 2}])
a.t = pd.to_datetime(a.t)
b = pd.DataFrame([{'id': 2}])

a = a.to_sparse()
b = b.to_sparse()
pd.concat([a, b])

then it will fail

@gfyoung gfyoung added Bug and removed Needs Info Clarification about behavior needed to assess issue labels May 7, 2019
@gfyoung
Copy link
Member

gfyoung commented May 7, 2019

Thanks! I can indeed that this is a bug. Investigation and PR are welcome!

@TomAugspurger
Copy link
Contributor

We get the wrong type in concatenate_join_units

    229 def concatenate_join_units(join_units, concat_axis, copy):
    230     """
    231     Concatenate values from several join units along selected axis.
    232     """
    233     if concat_axis == 0 and len(join_units) > 1:
    234         # Concatenating join units along ax0 is handled in _merge_blocks.
    235         raise AssertionError("Concatenating join units along axis0")
    236
    237     empty_dtype, upcasted_na = get_empty_dtype_and_na(join_units)
    238
    239     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
    240                                          upcasted_na=upcasted_na)
    241                  for ju in join_units]
    242
    243     if len(to_concat) == 1:
    244         # Only one block, nothing to concatenate.
    245         concat_values = to_concat[0]
    246         if copy:
    247             if isinstance(concat_values, np.ndarray):
    248                 # non-reindexed (=not yet copied) arrays are made into a view
    249                 # in JoinUnit.get_reindexed_values
    250                 if concat_values.base is not None:
    251                     concat_values = concat_values.copy()
    252             else:
    253                 concat_values = concat_values.copy()
    254     else:
--> 255         concat_values = _concat._concat_compat(to_concat, axis=concat_axis)
    256
    257     return concat_values
    258

ipdb> empty_dtype, upcasted_na
(dtype('<M8[ns]'), -9223372036854775808)

that should be Sparse[datetime64[ns], nan]]

@TomAugspurger
Copy link
Contributor

I should say, this isn't at all implemented yet. #22994 will fix this.

@gfyoung gfyoung added Enhancement and removed Bug labels May 8, 2019
@gfyoung
Copy link
Member

gfyoung commented May 8, 2019

Marking as an enhancement in light @TomAugspurger comment

@jbrockmendel
Copy link
Member

@TomAugspurger does the removal of SparseDataFrame make this closeable?

@TomAugspurger TomAugspurger changed the title Sparse DF with time column cannot concat DataFrame with datetime column cannot concat Dec 2, 2019
@TomAugspurger TomAugspurger changed the title DataFrame with datetime column cannot concat DataFrame with datetime column cannot concat with non-identical columns Dec 2, 2019
@TomAugspurger
Copy link
Contributor

No. I updated the original example to use DataFrame[sparse]. Different error message, but I think the same underlying issue.

@s-scherrer
Copy link
Contributor

I investigated this issue a bit, and I believe the issue is in concat_compat in pandas/core/dtypes/concat.py:

Using the provided example, the value of typs in line 103 is {'sparse', 'datetime'}.

Further on we check whether typs contains 'datetime' before checking for 'sparse'. The provided example runs successfully if I change the order of the elif-clauses, but I didn't run any other tests to see whether this would break anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

5 participants