-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Merge on categorical type columns gives wrong results #19551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the bug report. Should be fixed by #19553 if you want to take a look. |
Just took a look, changes makes sense to what I was seeing. Changes makes sense though, so hopefully that cover what I was seeing on my larger dataset too 👍 |
Sidenote: should we consider doing a bugfix release for 0.22 series? (don't know how many issues we would have to backport, and depending on how long 0.23.0 will still take) |
Other side note: is there actually a good usecase to have differently ordered categories for an unordered categorical? (although always sorting the categories upon construction might also give trouble down the road in other places) |
i think we should just release 0.23 soon |
Not sure... These are pretty serious bugs, but I don't have a good feeling
for how common differently-ordered unordered categories are.
…On Tue, Feb 6, 2018 at 3:06 PM, Joris Van den Bossche < ***@***.***> wrote:
Other side note: is there actually a good usecase to have differently
ordered categories for an unordered categorical? (although always sorting
the categories upon construction might also give trouble down the road in
other places)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19551 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIqaoWP06Ols3ODPvA-CZ4XhwJEUfks5tSL7LgaJpZM4R7SCZ>
.
|
In my case they are pretty common, but this is because most of my data is coming from |
Code Sample, a copy-pastable example if possible
Problem description
Since upgrading from
v0.20.3
tov0.22.0
I noticed data missing on my datasets. After a few hours debugging I narrowed it down to an issue involving merges that involveCategoricals
. I downgraded tov0.20.3
to test my original code and didn't have the issue. I then tested onv.0.21.0
and noticed the issue was first introduced on that version.While the example provided doesn't run in
v0.20.3
, it does highlight the bug and I think shows why its happening. Notice that the merge should give a result ofinstead of
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.12.1
scipy: 0.19.1
pyarrow: 0.7.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: