Skip to content

v0.15.1 breaks index set operations #9095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Dec 16, 2014 · 12 comments · Fixed by #9630
Closed

v0.15.1 breaks index set operations #9095

jorisvandenbossche opened this issue Dec 16, 2014 · 12 comments · Fixed by #9630
Labels
Deprecate Functionality to remove in pandas Testing pandas testing functions or related to the test suite
Milestone

Comments

@jorisvandenbossche
Copy link
Member

In 0.15.0, the - index set operation raises a warning as is explained in the whatsnew docs: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecations

In [1]: pd.__version__
Out[1]: '0.15.0'

In [2]: idx = pd.Index([1,2])

In [3]: idx
Out[3]: Int64Index([1, 2], dtype='int64')

In [4]: idx - idx
C:\Anaconda\envs\pandas0150\lib\site-packages\pandas\core\index.py:1162: FutureW
arning: using '-' to provide set differences with Indexes is deprecated, use .di
fference()
  "use .difference()",FutureWarning)
Out[4]: Index([], dtype='object')

In [6]: idx = pd.date_range('2012-01-01', periods=2)

In [7]: idx - idx
Out[7]: Index([], dtype='object')

In [8]: idx = pd.Index([1.1,2])

In [10]: idx
Out[10]: Float64Index([1.1, 2.0], dtype='float64')

In [11]: idx - idx
Out[11]: Index([], dtype='object')

In [12]: idx = pd.Index(['a', 'b'])

In [13]: idx - idx
Out[13]: Index([], dtype='object')

But in 0.15.1:

In [1]: pd.__version__
Out[1]: '0.15.1'

In [2]: idx = pd.Index([1,2])

In [3]: idx
Out[3]: Int64Index([1, 2], dtype='int64')

In [4]: idx - idx
Out[4]: Int64Index([0, 0], dtype='int64')

In [5]: idx = pd.date_range('2012-01-01', periods=2)

In [6]: idx - idx
Out[6]: Index([], dtype='object')

In [7]: idx = pd.Index([1.1,2])

In [8]: idx - idx
Out[8]: Float64Index([0.0, 0.0], dtype='float64')

In [9]: idx = pd.Index(['a', 'b'])

In [10]: idx - idx
pandas\core\index.py:1172: FutureWarning: using '-' to provide set differences w
ith Indexes is deprecated, use .difference()
  "use .difference()",FutureWarning)
Out[10]: Index([], dtype='object')

So

  • for int/float index, it is a numeric operation and not set -> api break
  • for datetime it is set operation, but does not raise a warning
  • only for object (Index object itself) it raises the warning

And the same on master (and 0.15.2)

@jorisvandenbossche
Copy link
Member Author

@jreback @shoyer @immerrr

@jreback jreback added the Deprecate Functionality to remove in pandas label Dec 16, 2014
@jreback
Copy link
Contributor

jreback commented Dec 16, 2014

@jorisvandenbossche these warnings are correct (though I admit a bit confusing). Not sure how they got changed, but lots of moving parts.

Numeric operations are fully operable (though not well tested). Index will warn on -/+. DatetimeIndex no warning at them moment (I guess should have been one).

@jreback
Copy link
Contributor

jreback commented Dec 16, 2014

Further, the numeric were ALWAYS actual numeric operations (int/float index). So this is not an API break at all.

@jorisvandenbossche
Copy link
Member Author

Maybe I wasn't really clear (it is a large code chunk above), but:

In [1]: pd.__version__
Out[1]: '0.15.0'

In [2]: idx = pd.Index([1, 2])

In [3]: idx
Out[3]: Int64Index([1, 2], dtype='int64')

In [4]: idx - idx
C:\Anaconda\envs\pandas0150\lib\site-packages\pandas\core\index.py:1162: FutureW
arning: using '-' to provide set differences with Indexes is deprecated, use .di
fference()
  "use .difference()",FutureWarning)
Out[4]: Index([], dtype='object')
In [1]: pd.__version__
Out[1]: '0.15.1'

In [2]: idx = pd.Index([1,2])

In [4]: idx - idx
Out[4]: Int64Index([0, 0], dtype='int64')

this is a huge API change that we missed, in a minor release without any mention in the release notes. Or do I miss something?

@jreback
Copy link
Contributor

jreback commented Dec 16, 2014

@jorisvandenbossche

it was always like this (that's one reason I changed it), the set operations were just confusing. (with +/-)

@jorisvandenbossche
Copy link
Member Author

It is the other way around, no? It has always been set operation (see example above in 0.15.0, and I also checked in 0.14.1 and 0.13.1).

That is why we put a deprecation in place.

@jorisvandenbossche
Copy link
Member Author

Possibly related to #8634?

@jreback
Copy link
Contributor

jreback commented Dec 17, 2014

All of the ops with a scalar were actually numeric ops (and always have been), e.g. +,-,*,/. I suppose the set ops for Int64Index (for +/-) were there.

No tests existed for anything else (Numeric Index, and stil dont today). If people were using it was undocumented behavior.

In [1]: Index([1,2])+1
Out[1]: Int64Index([2, 3], dtype='int64')

Not sure what you are proposing. I don't think this warrants anything.

@shoyer
Copy link
Member

shoyer commented Dec 17, 2014

@jreback It may not have been explicitly listed, but I do think is strongly implied by even the current documentation. I do think this is a regression -- albeit untested and deprecated behavior.

If we are going to release another bugfix release 0.15.3 in the near future, I think we should restore this. On the other hand, if it's going to be a few months until 0.16, we might just move on with the deprecation (though that is faster than usually desired). The good news is that at least 0.15.0 does include some warnings.

If we are going to revert this and do a more gradual deprecation, we should not wait until February -- then we'll end up with users relying on the new behavior (in fact, this may already be an issue).

@jorisvandenbossche
Copy link
Member Author

@jreback I am not speaking about index - scalar operations, these indeed are and have always been numeric. But I am speaking about index - index operations (idx - idx), and this has been explicitely documented as a feature that it are set operations. This was also tested, but those tests have been adapted when we deprecated it (#8227).

We also used this feature within the pandas codebase (those have been adapted to union/difference). I myself for example used it in code for my daily work, but I did not yet used that code since 0.15.1, so I did not yet noticed it.

@jreback
Copy link
Contributor

jreback commented Dec 17, 2014

ok, so an untested, partially documented behavior that was deprecated anyhow is a break. This is the reason fro tests (and it was specifically mentioned that this was untested).

Keep in mind that an IntIndex and this is what is referred here, is Numeric in all other ways, e.g. multiplication with another index,ndarray (which is , the fix 'caused' this), and ops with scalars.

No 0.15.3. @jorisvandenbossche if you really want to put a note in prior documentation, then go for it.

So will make this issue for testing of these features. One of you pls put in place some tests for the non-Index, non-datetimelike indexes.

I am going to make the #9094 issue into the change for DatetimeInde/PeriodIndex (as TimedeltaIndex) already does this (only for subtraction, + will have to raise a TypeError)

@jreback jreback added the Testing pandas testing functions or related to the test suite label Dec 17, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 17, 2014
@jorisvandenbossche
Copy link
Member Author

ok, so an untested, partially documented behavior that was deprecated anyhow is a break. This is the reason fro tests (and it was specifically mentioned that this was untested).

It is true that the tests were not thorough enough, so we didn't catch this change in 0.15.1 (eg no real test for the deprecation itself, and only test for Index itself and not for all subclasses).

Keep in mind that an IntIndex and this is what is referred here, is Numeric in all other ways

That is not really the point. I think we all agree that the set operations are a bit strange and confusing, as other operations (index with scalar/array, multiplication, ..) are all the 'normal' element-wise operations. We discussed that thoroughly in #8226, and we all agreed to change this behaviour, but decided to first give it a deprecation cycle.

The question is a bit how 'severe' this break is, and I must say I can't really judge it. We did not yet receive any reports (which could suggest it is not used a lot), but on the other hand, I think a lot of people still have to update to >= 0.15.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants