-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Concatenating categorical datetime columns raises a ValueError since v1.2 #39443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @aiudirog for the excellent report! If I've done git bisect correctly,
cc @arw2019 |
|
looking |
moving to 1.2.3 |
This example works on 1.2.2 and current 1.2.x. If I've bisected correctly, was fixed by #39615 (which went in 1.2.2). A test should be added, is adding a whatsnew also needed? |
take |
I've provided the above PR adding a test for this case. However, I'm running in to some trouble with the PR passing the CI checks. The initial version of the test I wrote was this like so: # GH 39443
df1 = DataFrame(
{"x": Series(datetime(2021, 1, 1), index=[0], dtype="category")}
)
df2 = DataFrame(
{"x": Series(datetime(2021, 1, 2), index=[1], dtype="category")}
)
result = pd.concat([df1, df2])
expected = DataFrame(
{
"x": Series(
[datetime(2021, 1, 1), datetime(2021, 1, 2)], dtype="category"
)
}
)
tm.assert_equal(result, expected) However, this test fails because the isinstance(df1.dtypes['x'], CategoricalDtype) # True
isinstance(df2.dtypes['x'], CategoricalDtype) # True
isinstance(result.dtypes['x'], CategoricalDtype) # False
result.dtypes['x'] # dtype('<M8[ns]') So I rewrote the above test to pass by removing the expected = DataFrame({"x": Series([datetime(2021, 1, 1), datetime(2021, 1, 2)])}) This passed locally with
I then modified my test to mimick the original (preserving the categorical dtype), commited and pushed to the PR. But this time, another check failed:
Basically, the opposite error as before. This second time, as I had seen locally, the categorical dtype was not being preserved after concatenation. In the first case, a GH Action test was failing whereas in the second, it's the Azure Pipelines tests (all of them). I have no idea what's going on but it seems the tests are running different Pandas versions or, as @MarcoGorelli pointed out in Gitter, more appropriately, different dependencies. |
Hi , |
u could try this as well for this |
take |
1 similar comment
take |
take |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
Since v1.2, concatenating two or more dataframes with at least one categorical datetime column with different categories raises this error:
Traceback
I have confirmed that:
v1.2.0
,v1.2.1
, andv1.3.0.dev0+553.gd0cfa0303
on both Windows 10 and Linuxv1.2
or if the categories are the sameExpected Output
A single, properly concatenated dataframe:
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: