-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Setting with enlargement on categorical data #25383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure I see the issue here - from the code posted it looks like you are trying to mix tuples with categorical data which should be an object. Do you mean to be using the http://pandas.pydata.org/pandas-docs/stable//user_guide/categorical.html#appending-new-categories |
this seems the core issue should be addressed before this note that indexing expansion is pretty inefficient and might be removed in the future ; better to explicitly append (which is also inefficient if doing it many times but it’s more obvious what is happening) |
You can use
I don't know enough of the pandas internals, but it seems kind of logical. I think overall support for these kinds of merging operations with categoricals is lacking in pandas.
I thought it was just some sugar coating on top of |
Code Sample, a copy-pastable example if possible
Problem description
There is no warning whatsoever, but still the dtype changes. In this dummy example this means we lose all information about the fact that
'd'
is also a possible value. (So simply doingastype('category')
wouldn't work here.)Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!
Expected Output
Keep the categorical dtype if the added value is in the list of categories, throw an error/warning otherwise.
If people don't care about the categorical, they can always call
.astype('object')
before adding the row?I think this solution is also in the spirit of 'explicit is better than implicit`?
Output of
pd.show_versions()
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-33-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.0
pytest: 4.1.1
pip: 18.1
setuptools: 40.2.0
Cython: 0.29.2
numpy: 1.16.0
scipy: 1.1.0
pyarrow: 0.12.0
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: 0.4.0
matplotlib: 2.2.3
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.5
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: