-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Missing Values and Categoricals - inconsistent dtypes #23242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can with In [24]: pd.api.types.CategoricalDtype(categories=pd.Index([], dtype=int)).categories
Out[24]: Int64Index([], dtype='int64') CategoricalDtype.categories is just an index. Would you want to accept a
I think that's the root issue. |
Yes, I think that would help. The other thing that would help is if |
I'd prefer to avoid special casing empty / all-NaN columns.
I think adding a `dtype` keyword to the CategoricalDtype constructor would
be fine, with a default of float for backwards compatibility.
…On Mon, Oct 22, 2018 at 4:45 PM Dr. Irv ***@***.***> wrote:
Would you want to accept a dtype parameter in CategoricalDtype that's
passed through?
Yes, I think that would help.
The other thing that would help is if union_categoricals would accept the
union of two categories where the dtype was different, and one of the
categories was empty. Then the result could have the dtype of the
category that had items in it. The reason I need this is that I'm reading a
large file in chunks, and I know which columns are category columns, and
want to keep doing union_categoricals as new categories are discovered,
and if a chunk was all missing values, have the types correctly inferred.
(See my comment here: #14177 (comment)
<#14177 (comment)>
)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#23242 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkhOGh6RdADauT84FmvpWLRCOzYDks5unjxrgaJpZM4XxeJP>
.
|
I think the default would have to be |
Code Sample, a copy-pastable example if possible
Problem description
In the above, if you convert a
Series
usingastype('category')
, and theSeries
has allNaN
values, the underlying dtype isfloat
, while if you passCategoricalDtype([])
, the underlying dtype isobject
.There are a couple of issues that I don't know how to deal with:
s1
to beobject
)CategoricalDtype
constructorNow, you might ask, why does this matter? Let's suppose I have data that I know to be categorical, and I have missing values, and I want to use
union_categoricals()
to merge the categories of two different Series that are both category dtype, and each Series was constructed usingastype('category')
. Let's say that one had all missing values and has underlying dtypefloat
and the second one had strings and missing values, so it ends up with dtypeO
, then I can't dounion_categoricals()
on them.I know there are various workarounds for this, but I still think there should be some way to manage the underlying dtype of the categories of a
CategoricalDtype
.Alternatively, maybe
union_categoricals()
should be smart that when you are doing a union of categories and one of the categories has no choices, then it ignores thedtype
when doing the union.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.8.1
pip: 10.0.1
setuptools: 40.4.3
Cython: 0.28.5
numpy: 1.15.2
scipy: 1.1.0
pyarrow: None
xarray: 0.10.9
IPython: 7.0.1
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: