Implementing rolling min/max functions that can retain the original type #12595

joshuastorck · 2016-03-11T15:31:51Z

Enhances previous bugfix to BUG: rolling functions raise ValueError on float32 data #12373
Added test_rolling_min_max_test_types to test_windows
passes git diff upstream/master | flake8 --diff
Added documentation to computation.rst for the new as_float argument and updated whatsnew for v0.18.0
- Changed the rolling min/max functions in algos.pyx so
  that they use a cython fused type as input instead of a float64
  so that the function can accept arrays of any numeric type
- Merged the functionality of rolling min/max into a common
  function with branches based on whether or not it's running
  min/max
- When running rolling min/max for intergral types and there
  are not enough minimum periods, the output values returned
  are zero
- Added a unit test to test_moments to make sure that rolling
  min/max works for all integral types and float32/64
- Updated computations and whatsnew doc

kawochen · 2016-03-11T15:57:35Z

When running rolling min/max for intergral types and there are not enough minimum periods, the output values returned are zero

I don't think 0 is sensible, but I'm not sure how else it can be done either without integral nulls.

kawochen · 2016-03-11T16:11:15Z

Merged the functionality of rolling min/max into a common function with branches based on whether or not it's running min/max

the branching can be decided very soon (your is_max), so you don't need to branch in the loop

joshuastorck · 2016-03-11T17:18:28Z

If you branched earlier, you would effectively be passing around a function pointer. That doesn't give the compiler any opportunity to do something smarter, like a conditional move, which in this case would be pretty easy to generate.

Ideally, much of this code should be written in native C++ with templates using C++11 lambdas. That would make it possible to eliminate branching completely at compile time.

jreback · 2016-03-11T19:23:32Z

doc/source/computation.rst

@@ -250,6 +250,10 @@ accept the following arguments:
  result is NA)
 - ``center``: boolean, whether to set the labels at the center (default is False)

+.. note::
+
+   The ``min`` and ``max`` functions will by default return the result as a float. For integer inputs, integer outputs can be obtained by passing True as the ``as_float`` argument.


take this out, we won't be having an as_float argument

kawochen · 2016-03-12T01:54:36Z

@joshuastorck Ah! I see!

joshuastorck · 2016-03-17T00:11:19Z

Any additional work needed on this one?

jreback · 2016-03-17T00:27:51Z

will take a look - can u rebase in master

joshuastorck · 2016-03-17T00:31:56Z

Just rebased and pushed

jreback · 2016-03-17T00:48:02Z

also pls git rebase -i master then push upstream (with -f).

jreback · 2016-03-17T00:48:41Z

nvm. you are only 1 or 2 commits behind.

jreback · 2016-03-17T00:49:57Z

pandas/algos.pyx

-    "Moving max of 1d array of dtype=float64 along axis=0 ignoring NaNs."
-    cdef np.float64_t ai, aold
+def roll_max(ndarray[numeric] a, int window, int minp):
+    "Moving max of 1d array of any numeric type along axis=0 ignoring NaNs."


bonus if we can document the params (even though its an internal function)

jreback · 2016-03-17T01:02:18Z

I think you need to take out some of the _ensure_float64 that were implemented in the related PR.

I would expect this to return float32 no?

In [4]: Series([1,2,3,4],dtype='float32').rolling(window=2).max()
Out[4]: 
0    NaN
1    2.0
2    3.0
3    4.0
dtype: float64

pretty much everything else is going to be upcast though

joshuastorck · 2016-03-17T01:10:03Z

That was why I originally had an as_float argument, which was in there for backwards compatibility. Are you saying that you are ok with breaking type compatibility?

jreback · 2016-03-17T01:11:52Z

@joshuastorck not sure what you mean. We are now breaking type compat by casing to float64. if you can preserve the type then by all means do it. You don't need a flag for that it should just work. Of course for integers ATM this won't work, but that's a different issue.

joshuastorck · 2016-03-17T01:13:40Z

So you want this code in window.py:

        # GH #12373 : rolling functions error on float32 data
        # make sure the data is coerced to float64
        if com.is_float_dtype(values.dtype):
            values = com._ensure_float64(values)
        elif com.is_integer_dtype(values.dtype):
            values = com._ensure_float64(values)

To change to this?

        if com.is_integer_dtype(values.dtype):
            values = com._ensure_float64(values)

joshuastorck · 2016-03-17T01:15:15Z

Also, I only changed the roll_min/max functions. Wouldn't that code change break all of the other rolling functions?

jreback · 2016-03-17T01:19:11Z

hmm, yes ideally that would be the change. But you are right the rest have not been converted. ok then.

pls squash / ping when green.

jreback · 2016-03-17T01:22:21Z

ok, I linked the master issue: #8659

so next up is fixing the remainder of the rolling functions, so we don't need to upcast floats. :)

…ype: * Changed the rolling min/max functions in algos.pyx so that they use a cython fused type as input instead of a float64 so that the function can accept arrays of any numeric type * Merged the functionality of rolling min/max into a common function with branches based on whether or not it's running min/max * When running rolling min/max for intergral types and there are not enough minimum periods, the output values returned are zero * Added a unit test to test_moments to make sure that rolling min/max works for all integral types and float32/64 * Updated computations and whatsnew doc

joshuastorck · 2016-03-17T12:09:15Z

Rebased and Travis CI is green

jreback · 2016-03-17T13:34:11Z

pandas/algos.pyx

    return y

+cdef double_t _get_max(object skiplist, int nobs, int minp):


did you add this? I don't see this (or _get_min) used AT ALL in the codebase. any idea?

you don't need to fix, just lmk.

I think it was there before and was removed but I accidentally added it in a merge. I won’t have a chance to get to this today. Can you accept the pull request and I’ll make a separate PR to remove it?

From: Jeff Reback [mailto:[email protected]]
Sent: Thursday, March 17, 2016 9:36 AM
To: pydata/pandas
Cc: Joshua Storck
Subject: Re: [pandas] Implementing rolling min/max functions that can retain the original type (#12595)

In pandas/algos.pyxhttps://github.com//pull/12595#discussion_r56503823:

return y

+cdef double_t _get_max(object skiplist, int nobs, int minp):

you don't need to fix, just lmk.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/12595/files/b5a84cf309ffc8cbb20cb6b6534d2ad2083aa5bf#r56503823

I am going to merge this and take it out. thxnks

…ype: * Changed the rolling min/max functions in algos.pyx so that they use a cython fused type as input instead of a float64 so that the function can accept arrays of any numeric type * Merged the functionality of rolling min/max into a common function with branches based on whether or not it's running min/max * When running rolling min/max for intergral types and there are not enough minimum periods, the output values returned are zero * Added a unit test to test_moments to make sure that rolling min/max works for all integral types and float32/64 * Updated computations and whatsnew doc closes pandas-dev#12595

jreback · 2016-03-17T15:15:23Z

@joshuastorck thanks for the fixes!

#8659 (using fused types in the rest of the algos) is waiting for you!

jreback · 2016-06-07T12:40:26Z

@joshuastorck going to start going thru your changes for support on this

joshuastorck changed the title ~~Implementing rolling min/max functions that can retain the original t…~~ Implementing rolling min/max functions that can retain the original type Mar 11, 2016

jreback reviewed Mar 11, 2016
View reviewed changes

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Mar 11, 2016

jreback reviewed Mar 17, 2016
View reviewed changes

jreback added this to the 0.18.1 milestone Mar 17, 2016

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 17, 2016

leeong05 mentioned this pull request Mar 17, 2016

API/ENH: master issue for pd.rolling_apply #8659

Closed

14 tasks

joshuastorck force-pushed the generic_rolling_min_max branch from 4cdde46 to b5a84cf Compare March 17, 2016 03:39

jreback reviewed Mar 17, 2016
View reviewed changes

jreback closed this in bf89220 Mar 17, 2016

joshuastorck mentioned this pull request Apr 18, 2016

BUG: support fused types in roll_min/max #12373 #12481

Closed

		return y

		cdef double_t _get_max(object skiplist, int nobs, int minp):

Uh oh!

Implementing rolling min/max functions that can retain the original type #12595

Implementing rolling min/max functions that can retain the original type #12595

Uh oh!

Conversation

joshuastorck commented Mar 11, 2016

Uh oh!

kawochen commented Mar 11, 2016

Uh oh!

kawochen commented Mar 11, 2016

Uh oh!

joshuastorck commented Mar 11, 2016

Uh oh!

jreback Mar 11, 2016

Choose a reason for hiding this comment

Uh oh!

kawochen commented Mar 12, 2016

Uh oh!

joshuastorck commented Mar 17, 2016

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

joshuastorck commented Mar 17, 2016

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

jreback Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

joshuastorck commented Mar 17, 2016

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

joshuastorck commented Mar 17, 2016

Uh oh!

joshuastorck commented Mar 17, 2016

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

joshuastorck commented Mar 17, 2016

Uh oh!

jreback Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

joshuastorck Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 17, 2016

Uh oh!

jreback commented Jun 7, 2016

Uh oh!

Uh oh!