Skip to content

BUG: union_categoricals can't handle NaN #13759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 22, 2016

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew not needed

union_categoricals doesn't handle NaN properly.

on current master:

from pandas.types.concat import union_categoricals
union_categoricals([pd.Categorical([np.nan, 1]), pd.Categorical([2, np.nan])])
# [1, 1, 2, 2]
# Categories (2, int64): [1, 2]

union_categoricals([pd.Categorical([np.nan]), pd.Categorical([np.nan])])
# IndexError: cannot do a non-empty take from an empty axes.

@sinhrks sinhrks added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels Jul 22, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Jul 22, 2016
@codecov-io
Copy link

codecov-io commented Jul 22, 2016

Current coverage is 84.57% (diff: 100%)

Merging #13759 into master will increase coverage by <.01%

@@             master     #13759   diff @@
==========================================
  Files           141        141          
  Lines         51195      51200     +5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43295      43300     +5   
  Misses         7900       7900          
  Partials          0          0          

Powered by Codecov. Last update 9f94e6a...ab99976

@jreback
Copy link
Contributor

jreback commented Jul 22, 2016

lgtm.

cc @chris-b1

@chris-b1
Copy link
Contributor

Yeah, I had actually fixed this in #13406 too - this PR fixes some other corners I missed. I used take_1d with a fill value, which seems to be a little faster?

In [13]: cats = [pd.Categorical(list('abcdefghijkl') * 100) for _ in range(100)]

In [14]: union_categoricals_pr = pd.types.concat.union_categoricals # from 13406

In [15]: def union_categoricals_sinhrks(to_union):
 <snip - this pr>

In [16]: %timeit union_categoricals_pr(cats)
100 loops, best of 3: 2.78 ms per loop

In [17]: %timeit union_categoricals_sinhrks(cats)
100 loops, best of 3: 3.45 ms per loop

@sinhrks sinhrks force-pushed the union_categoricals_nan branch from ab99976 to 58e91ff Compare July 22, 2016 22:26
@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2016

@chris-b1 thx, changed to use take_1d.

@sinhrks sinhrks force-pushed the union_categoricals_nan branch from 58e91ff to 8195acf Compare July 22, 2016 22:30
@sinhrks sinhrks force-pushed the union_categoricals_nan branch from 8195acf to 4312a32 Compare July 22, 2016 22:35
@jorisvandenbossche
Copy link
Member

Looks good to me, go ahead with merging if ready!

@sinhrks sinhrks closed this in 5a3b071 Jul 23, 2016
@sinhrks sinhrks deleted the union_categoricals_nan branch July 23, 2016 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants