Change in behaviour of unique method regarding None values #20866

jorisvandenbossche · 2018-04-29T12:58:02Z

The behaviour of unique regarding None values recently changed on master (somewhere in the last month) :

In [1]: pd.__version__
Out[1]: '0.22.0'

In [2]: s = pd.Series(['test', None])

In [3]: s.unique()
Out[3]: array(['test', None], dtype=object)

vs

In [1]: pd.__version__
Out[1]: '0.23.0.dev0+807.g563a6ad'

In [2]: s = pd.Series(['test', None])

In [3]: s.unique()
Out[3]: array(['test', nan], dtype=object)

Not sure what the desired result is (we typically handle None like it is missing like NaN, but on the other hand, we also didn't coerce to it on construction, i.e. we do allow None values to be stored), but it is a breaking change (discovered it because geopandas tests started failing; geopandas/geopandas#711).

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-04-30T18:37:39Z

This may have subsequently been "fixed" on master

>>> pd.__version__
'0.23.0.dev0+813.gd274d0b22'
>>> s = pd.Series(['test', None])
>>> s.unique()
array(['test', None], dtype=object)

jreback · 2018-05-01T00:34:16Z

@WillAyd can see if there is a test (and if not put one up)?

jorisvandenbossche · 2018-05-01T06:50:02Z

Hmm, strange, I did a full clean rebuild of master, and this is still giving nan for me:

In [1]: s = pd.Series(['test', None])

In [2]: s.unique()
Out[2]: array(['test', nan], dtype=object)

In [3]: pd.__version__
Out[3]: '0.23.0.dev0+815.gf799916'

jschendel · 2018-05-01T08:06:03Z

I'm also getting the same output as @jorisvandenbossche.

jorisvandenbossche · 2018-05-01T08:11:13Z

It's very strange that the tests are passing though (in the PR that @WillAyd did: #20893)

I was testing on linux / python 3.5 / numpy 1.13.3 (not sure what else might influence this), the failing tests on geopandas were using the linux travis build with python 3.6 and pandas installed with pip from github master.

jorisvandenbossche · 2018-05-01T08:19:39Z

I suppose this might be the cause (not of the fact we are seeing different behaviour though): 7f7f3d4

jschendel · 2018-05-01T08:33:00Z

@jorisvandenbossche : see my comment on the PR.

WillAyd · 2018-05-01T17:32:55Z

@jorisvandenbossche after reviewing some more I am able to reproduce the error you provided after doing a full clean / rebuild. After looking at it, I believe you are correct that 7f7f3d4 is causing the regression, as a clean / rebuild on the commit prior still returned None but a full clean / rebuild on that commit does return NaN.

WillAyd · 2018-05-01T19:39:14Z

Looking at this further I noticed that the behavior is somewhat inconsistent with the commit prior to the regression. Note below that None is preserved but pd.NaT becomes np.nan:

>>> pd.__version__
'0.23.0.dev0+752.g0bd8a5a62'
>>> pd.Series(['foo', None]).unique()
array(['foo', None], dtype=object)
>>> pd.Series(['foo', pd.NaT]).unique()
array(['foo', nan], dtype=object)
>>> pd.Series(['foo', np.nan]).unique()
array(['foo', nan], dtype=object)

On the commit with the regression, everything is np.nan:

>>> pd.__version__
'0.23.0.dev0+753.g7f7f3d49b'
>>> pd.Series(['foo', None]).unique()
array(['foo', nan], dtype=object)
>>> pd.Series(['foo', pd.NaT]).unique()
array(['foo', nan], dtype=object)
>>> pd.Series(['foo', np.nan]).unique()
array(['foo', nan], dtype=object)

The difference in the treatment of None has to do with the replacement of _checknan with checknull in the referenced PR. Do we care to revert this? IMO this is a pretty dark corner and if unique wasn't preserving pd.NaT before I don't take it as a guarantee that it should have preserved None either.

jorisvandenbossche mentioned this issue Apr 29, 2018

Failing to_file test on pandas master geopandas/geopandas#711

Closed

TomAugspurger added this to the 0.23.0 milestone Apr 29, 2018

WillAyd mentioned this issue May 1, 2018

Preserve None in Series unique #20893

Merged

4 tasks

jreback closed this as completed in #20893 May 11, 2018

realead mentioned this issue Aug 15, 2018

BUG: don't mangle NaN-float-values and pd.NaT (GH 22295) #22296

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change in behaviour of unique method regarding None values #20866

Change in behaviour of unique method regarding None values #20866

jorisvandenbossche commented Apr 29, 2018

WillAyd commented Apr 30, 2018

jreback commented May 1, 2018

jorisvandenbossche commented May 1, 2018 •

edited

Loading

jschendel commented May 1, 2018

jorisvandenbossche commented May 1, 2018

jorisvandenbossche commented May 1, 2018

jschendel commented May 1, 2018

WillAyd commented May 1, 2018

WillAyd commented May 1, 2018

Change in behaviour of unique method regarding None values #20866

Change in behaviour of unique method regarding None values #20866

Comments

jorisvandenbossche commented Apr 29, 2018

WillAyd commented Apr 30, 2018

jreback commented May 1, 2018

jorisvandenbossche commented May 1, 2018 • edited Loading

jschendel commented May 1, 2018

jorisvandenbossche commented May 1, 2018

jorisvandenbossche commented May 1, 2018

jschendel commented May 1, 2018

WillAyd commented May 1, 2018

WillAyd commented May 1, 2018

jorisvandenbossche commented May 1, 2018 •

edited

Loading