ENH: Exclude nuisance columns from result of window functions #27044

ihsansecer · 2019-06-25T22:21:11Z

closes ENH: window functions need to exclude nuiscance columns #12537
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…window

codecov · 2019-06-26T13:26:17Z

Codecov Report

Merging #27044 into master will decrease coverage by 0.01%.
The diff coverage is 61.29%.

@@            Coverage Diff             @@
##           master   #27044      +/-   ##
==========================================
- Coverage      92%   91.99%   -0.02%     
==========================================
  Files         180      180              
  Lines       50754    50774      +20     
==========================================
+ Hits        46698    46708      +10     
- Misses       4056     4066      +10

Flag	Coverage Δ
#multiple	`90.63% <61.29%> (-0.01%)`	⬇️
#single	`41.81% <3.22%> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/window.py	`96.01% <61.29%> (-0.63%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️
pandas/compat/_optional.py	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4aa1d6...035a697. Read the comment docs.

codecov · 2019-06-26T13:26:19Z

Codecov Report

❗ No coverage uploaded for pull request base (master@8b48f5c). Click here to learn what that means.
The diff coverage is 65.51%.

@@            Coverage Diff            @@
##             master   #27044   +/-   ##
=========================================
  Coverage          ?   92.03%           
=========================================
  Files             ?      180           
  Lines             ?    50735           
  Branches          ?        0           
=========================================
  Hits              ?    46692           
  Misses            ?     4043           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.67% <65.51%> (?)`
#single	`41.86% <6.89%> (?)`

Impacted Files	Coverage Δ
pandas/core/window.py	`96.23% <65.51%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b48f5c...fd678a5. Read the comment docs.

TomAugspurger · 2019-06-26T13:58:17Z

How does this interact with #23002, which would like to apply (certain) operations to non-numeric columns?

WillAyd

I don't think returning an empty series is correct here as it is not consistent with GroupBy ops with a similar limitation. For instance:

>>> pd.Series(['foo']).groupby([0]).mean()
DataError: No numeric types to aggregate

Granted I don't think this is even consistent internally:

>>> pd.Series(['foo']).groupby([0]).rank()
TypeError: 'NoneType' object is not callable

But the empty Series definitely seems wrong to me.

Any objection to the DataError as shown in the above?

ihsansecer · 2019-06-26T16:00:16Z

@WillAyd DataError seems reasonable. I just checked how DataFrame functions behave and it led me to returning an empty Series like:

>>> pd.DataFrame({"A": ['foo']}).mean()
Series([], dtype: float64)

@TomAugspurger this doesn't resolves it. Instead of an error an empty Series will be returned (as a result of using Rolling.apply with a string column) but it seems to be wrong as mentioned

ihsansecer · 2019-06-27T17:27:38Z

Not sure why some test on py35_macos are failing. Will try to reproduce it

jreback · 2019-06-27T20:44:21Z

this looks good. can you merge master and ping on green.

jreback · 2019-06-27T20:44:37Z

@WillAyd if any other comments.

WillAyd

Looks nice just need to update whatsnew

doc/source/whatsnew/v0.25.0.rst

…window

jreback · 2019-06-27T23:23:00Z

@ihsansecer see the failure https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=13461

I think this might be an ordering issue (as the dict is not ordered on 3.5).

jreback · 2019-06-28T20:00:19Z

pandas/core/window.py

        results = []
-        for b in blocks:
+        exclude = []
+        for dtype in list(dtypes):


best just to

for b in bocks_dict.values(): .....

then don't need anything else

Since my solution to unordered dict issue requires deleting nuisance blocks (block of columns with same type) I needed a shallow copy of keys to remove them iteratively

jreback · 2019-06-28T20:00:44Z

pandas/core/window.py

+            except (TypeError, NotImplementedError):
+                if isinstance(obj, ABCDataFrame):
+                    exclude.extend(b.columns)
+                    del blocks_dict[dtype]


why are you del here?

As you stated the order for dictionary differs in each run. So iteration order for blocks differs.

So in case of a DataFrame with columns ["A", "B"] with types [int, str] when iteration starts with "B" then:

results = [values_of_A] but blocks = [block_for_B, block_for_A]. There seems to be a mismatch here (values_of_A and block_for_B are iterated together). So I went with removing block_for_B which was the first thing came to my mind.

pandas/core/window.py

jreback · 2019-06-28T23:30:12Z

lgtm. @TomAugspurger @jorisvandenbossche if any comments.

WillAyd

lgtm

WillAyd · 2019-07-01T16:41:14Z

Thanks @ihsansecer !

ihsansecer added 4 commits June 26, 2019 00:50

Exclude nuisance columns from result of window functions

e06d307

Edit existing tests

0d3f912

Exclude nuisance columns from result of _apply_window

e812bf9

Add whatsnew

12a012b

gfyoung added Dtype Conversions Unexpected or buggy dtype conversions Enhancement labels Jun 26, 2019

Merge remote-tracking branch 'upstream/master' into exclude-nuisance-…

035a697

…window

WillAyd requested changes Jun 26, 2019

View reviewed changes

WillAyd added the Window rolling, ewma, expanding label Jun 26, 2019

ihsansecer added 2 commits June 27, 2019 11:49

Raise DataError instead of returning empty Series

392e7e6

Sort imports

76adb2c

jreback added this to the 0.25.0 milestone Jun 27, 2019

WillAyd requested changes Jun 27, 2019

View reviewed changes

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved

ihsansecer added 2 commits June 28, 2019 00:34

Update whatsnew

eaeca8e

Merge remote-tracking branch 'upstream/master' into exclude-nuisance-…

fd678a5

…window

jreback removed this from the 0.25.0 milestone Jun 27, 2019

Reimplement to fix issue with python 3.5 dict

ad9b5e2

jreback requested changes Jun 28, 2019

View reviewed changes

Change in favour of a cleaner implementation

6d7602e

jreback added this to the 0.25.0 milestone Jun 28, 2019

jreback approved these changes Jun 28, 2019

View reviewed changes

WillAyd approved these changes Jun 29, 2019

View reviewed changes

WillAyd merged commit 355e322 into pandas-dev:master Jul 1, 2019

ihsansecer mentioned this pull request Jul 6, 2019

ERR: Rolling().Apply() on Object-Series should raise internally caught errors #15085

Closed

jreback mentioned this pull request Jul 6, 2019

Confusing - rolling min( ) function "accepting" objects #20244

Closed

ihsansecer deleted the exclude-nuisance-window branch July 11, 2019 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Exclude nuisance columns from result of window functions #27044

ENH: Exclude nuisance columns from result of window functions #27044

ihsansecer commented Jun 25, 2019

codecov bot commented Jun 26, 2019

codecov bot commented Jun 26, 2019 •

edited

Loading

TomAugspurger commented Jun 26, 2019

WillAyd left a comment

ihsansecer commented Jun 26, 2019 •

edited

Loading

ihsansecer commented Jun 27, 2019

jreback commented Jun 27, 2019

jreback commented Jun 27, 2019

WillAyd left a comment

jreback commented Jun 27, 2019

jreback Jun 28, 2019

ihsansecer Jun 28, 2019

jreback Jun 28, 2019

ihsansecer Jun 28, 2019

jreback commented Jun 28, 2019

WillAyd left a comment

WillAyd commented Jul 1, 2019

ENH: Exclude nuisance columns from result of window functions #27044

ENH: Exclude nuisance columns from result of window functions #27044

Conversation

ihsansecer commented Jun 25, 2019

codecov bot commented Jun 26, 2019

Codecov Report

codecov bot commented Jun 26, 2019 • edited Loading

Codecov Report

TomAugspurger commented Jun 26, 2019

WillAyd left a comment

Choose a reason for hiding this comment

ihsansecer commented Jun 26, 2019 • edited Loading

ihsansecer commented Jun 27, 2019

jreback commented Jun 27, 2019

jreback commented Jun 27, 2019

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Jun 27, 2019

jreback Jun 28, 2019

Choose a reason for hiding this comment

ihsansecer Jun 28, 2019

Choose a reason for hiding this comment

jreback Jun 28, 2019

Choose a reason for hiding this comment

ihsansecer Jun 28, 2019

Choose a reason for hiding this comment

jreback commented Jun 28, 2019

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd commented Jul 1, 2019

codecov bot commented Jun 26, 2019 •

edited

Loading

ihsansecer commented Jun 26, 2019 •

edited

Loading