Skip to content

BUG: Fix numpy boolean subtraction error in Series.diff #27755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 193 commits into from

Conversation

Unprocessable
Copy link
Contributor

@Unprocessable Unprocessable commented Aug 5, 2019

This fixes #27755, for more than three years now NumPy has not allowed the subtraction of boolean series.

TypeError Traceback (most recent call last)
in
1 data = pd.Series([0,-1,-2,-3,-4,-3,-2,-1,0,-1,-1,0,-1,-2,-3,-2,0])
2 filtered = data.between(-2,0, inclusive = True)
----> 3 filtered.diff()
4 print(filtered)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in diff(self, periods)
2191 dtype: float64
2192 """
-> 2193 result = algorithms.diff(com.values_from_object(self), periods)
2194 return self._constructor(result, index=self.index).finalize(self)
2195

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\algorithms.py in diff(arr, n, axis)
1817 out_arr[res_indexer] = result
1818 else:
-> 1819 out_arr[res_indexer] = arr[res_indexer] - arr[lag_indexer]
1820
1821 if is_timedelta:

TypeError: numpy boolean subtract, the - operator, is deprecated, use the bitwise_xor, the ^ operator, or the logical_xor function instead.

Numpy no longer allows subtraction of boolean values. Not sure if Numpy version checking should be done before this code...
Note: I haven't actually run this code, feel free to check it, but should be correct.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-3da3b949c6bd> in <module>
      1 data = pd.Series([0,-1,-2,-3,-4,-3,-2,-1,0,-1,-1,0,-1,-2,-3,-2,0])
      2 filtered = data.between(-2,0, inclusive = True)
----> 3 filtered.diff()
      4 print(filtered)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in diff(self, periods)
   2191         dtype: float64
   2192         """
-> 2193         result = algorithms.diff(com.values_from_object(self), periods)
   2194         return self._constructor(result, index=self.index).__finalize__(self)
   2195 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\algorithms.py in diff(arr, n, axis)
   1817             out_arr[res_indexer] = result
   1818         else:
-> 1819             out_arr[res_indexer] = arr[res_indexer] - arr[lag_indexer]
   1820 
   1821     if is_timedelta:

TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are always the first thing we write, pls add them

@pep8speaks
Copy link

pep8speaks commented Aug 5, 2019

Hello @Unprocessable! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-10 23:56:48 UTC

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a release note. What version of NumPy will / does this show up in?

@jbrockmendel
Copy link
Member

Are there other tests for diff? If so they should be collected here or this should join them.

@Unprocessable Unprocessable changed the title Fix numpy boolean subtraction error in Series.diff BUG: Fix numpy boolean subtraction error in Series.diff Aug 6, 2019
I copied the data over to the other file, diff tests should be in one file and the boolean test cannot be added to this one as it is not a time series.
result, Series(TimedeltaIndex(["NaT"] + ["1 days"] * 4), name="foo")
)

# boolean series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for a mix of bool & NA? pd.Series([True, False, np.nan, False], dtype="object')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

jbrockmendel and others added 26 commits August 26, 2019 16:39
pandas-dev#28158)

* CLN: Use ABC classes for isinstance checks, remove unnecessary imports

* Formatting repairs
* REGR: Fix to_csv with IntervalIndex
* add isort:skip to "from .pandas_vb_common import setup"

* add isort:skip to noqa: E402 marked lines

* run black

* add noqa: E402 isort:skip where needed

* run pre-commit filters on asv_bench/benchmarks/

* parse the isort config when using pre-commit

* run isort on pandas/core/api.py

* run pre-commit filters and commit trivial import sorting changes

* specify flake8 errors in pandas/io/msgpack/__init__.py

* fix imports for doc/source/conf.py

* fix the [isort] skip entry in setup.cfg

Also I removed the files for which I have fixed the problems.
* CLN: catch less inside try/except
@simonjayhawkins
Copy link
Member

superseded by #28251

@Unprocessable Unprocessable deleted the patch-1 branch September 30, 2019 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.Series.diff() on boolean values