-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Why pd.BooleanDtype() is casted to Float64 by groupby/last? #33071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks. Looks like the bug exists at least for In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"a": [1, 2], "b": pd.array([True, False])})
In [3]: df.dtypes
Out[3]:
a int64
b boolean
dtype: object
In [4]: df.groupby("a")["b"].min()
Out[4]:
a
1 1.0
2 0.0
Name: b, dtype: float64
In [5]: df.groupby("a")["b"].max()
Out[5]:
a
1 1.0
2 0.0
Name: b, dtype: float64 |
also occurs for IntDtype see #32194 >>> import pandas as pd
>>> pd.__version__
'1.1.0.dev0+999.gc47e9ca8b'
>>>
>>> df = pd.DataFrame(
... {"a": ["x", "x", "y", "y"], "b": ["x", "x", "y", "y"], "c": [0, 1, 2, 3]}
... )
>>> df["d"] = df.c.astype(pd.Int64Dtype())
>>>
>>> df.dtypes
a object
b object
c int64
d Int64
dtype: object
>>>
>>>
>>> df.groupby(["a", "b"]).c.last()
a b
x x 1
y y 3
Name: c, dtype: int64
>>>
>>>
>>> df.groupby(["a", "b"]).d.last()
a b
x x 1.0
y y 3.0
Name: d, dtype: float64
>>> |
You are welcome. I am glad that I can participate (at least by testing) to the development of such a marvel as pandas is. Best regards. |
In dataframe bellow, lag is Int64 dtype, that I had to cast to int in order make it work:
Regards. |
@ghasemnaddaf Can you provide a small reproducible example for this problem? |
Here you go:
|
Thanks @ghuname , I've opened up a separate issue / PR for this bug |
You are welcome. |
@ghuname We can leave this issue open, it should be closed automatically if / when the associated PR is merged |
Code Sample, a copy-pastable example if possible
Problem description
df.groupby(['a', 'b']).c.last() returns False, but df.groupby(['a', 'b']).d.last() returns Float64.
Why the difference?
Expected Output
I expect that both values should be False
Output of
pd.show_versions()
python : 3.7.4.final.0
pandas : 1.0.3
The text was updated successfully, but these errors were encountered: