-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Change in behaviour of unique method regarding None values #20866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This may have subsequently been "fixed" on master >>> pd.__version__
'0.23.0.dev0+813.gd274d0b22'
>>> s = pd.Series(['test', None])
>>> s.unique()
array(['test', None], dtype=object) |
@WillAyd can see if there is a test (and if not put one up)? |
Hmm, strange, I did a full clean rebuild of master, and this is still giving nan for me:
|
I'm also getting the same output as @jorisvandenbossche. |
It's very strange that the tests are passing though (in the PR that @WillAyd did: #20893) I was testing on linux / python 3.5 / numpy 1.13.3 (not sure what else might influence this), the failing tests on geopandas were using the linux travis build with python 3.6 and pandas installed with pip from github master. |
I suppose this might be the cause (not of the fact we are seeing different behaviour though): 7f7f3d4 |
@jorisvandenbossche : see my comment on the PR. |
@jorisvandenbossche after reviewing some more I am able to reproduce the error you provided after doing a full clean / rebuild. After looking at it, I believe you are correct that 7f7f3d4 is causing the regression, as a clean / rebuild on the commit prior still returned |
Looking at this further I noticed that the behavior is somewhat inconsistent with the commit prior to the regression. Note below that >>> pd.__version__
'0.23.0.dev0+752.g0bd8a5a62'
>>> pd.Series(['foo', None]).unique()
array(['foo', None], dtype=object)
>>> pd.Series(['foo', pd.NaT]).unique()
array(['foo', nan], dtype=object)
>>> pd.Series(['foo', np.nan]).unique()
array(['foo', nan], dtype=object) On the commit with the regression, everything is >>> pd.__version__
'0.23.0.dev0+753.g7f7f3d49b'
>>> pd.Series(['foo', None]).unique()
array(['foo', nan], dtype=object)
>>> pd.Series(['foo', pd.NaT]).unique()
array(['foo', nan], dtype=object)
>>> pd.Series(['foo', np.nan]).unique()
array(['foo', nan], dtype=object) The difference in the treatment of |
The behaviour of
unique
regardingNone
values recently changed on master (somewhere in the last month) :vs
Not sure what the desired result is (we typically handle None like it is missing like NaN, but on the other hand, we also didn't coerce to it on construction, i.e. we do allow None values to be stored), but it is a breaking change (discovered it because geopandas tests started failing; geopandas/geopandas#711).
The text was updated successfully, but these errors were encountered: