Skip to content

Implementing rolling min/max functions that can retain the original type #12595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

joshuastorck
Copy link
Contributor

  • Enhances previous bugfix to BUG: rolling functions raise ValueError on float32 data #12373
  • Added test_rolling_min_max_test_types to test_windows
  • passes git diff upstream/master | flake8 --diff
  • Added documentation to computation.rst for the new as_float argument and updated whatsnew for v0.18.0
    • Changed the rolling min/max functions in algos.pyx so
      that they use a cython fused type as input instead of a float64
      so that the function can accept arrays of any numeric type
    • Merged the functionality of rolling min/max into a common
      function with branches based on whether or not it's running
      min/max
    • When running rolling min/max for intergral types and there
      are not enough minimum periods, the output values returned
      are zero
    • Added a unit test to test_moments to make sure that rolling
      min/max works for all integral types and float32/64
    • Updated computations and whatsnew doc

@joshuastorck joshuastorck changed the title Implementing rolling min/max functions that can retain the original t… Implementing rolling min/max functions that can retain the original type Mar 11, 2016
@kawochen
Copy link
Contributor

When running rolling min/max for intergral types and there are not enough minimum periods, the output values returned are zero

I don't think 0 is sensible, but I'm not sure how else it can be done either without integral nulls.

@kawochen
Copy link
Contributor

Merged the functionality of rolling min/max into a common function with branches based on whether or not it's running min/max

the branching can be decided very soon (your is_max), so you don't need to branch in the loop

@joshuastorck
Copy link
Contributor Author

If you branched earlier, you would effectively be passing around a function pointer. That doesn't give the compiler any opportunity to do something smarter, like a conditional move, which in this case would be pretty easy to generate.

Ideally, much of this code should be written in native C++ with templates using C++11 lambdas. That would make it possible to eliminate branching completely at compile time.

@@ -250,6 +250,10 @@ accept the following arguments:
result is NA)
- ``center``: boolean, whether to set the labels at the center (default is False)

.. note::

The ``min`` and ``max`` functions will by default return the result as a float. For integer inputs, integer outputs can be obtained by passing True as the ``as_float`` argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take this out, we won't be having an as_float argument

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Mar 11, 2016
@kawochen
Copy link
Contributor

@joshuastorck Ah! I see!

@joshuastorck
Copy link
Contributor Author

Any additional work needed on this one?

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

will take a look - can u rebase in master

@joshuastorck
Copy link
Contributor Author

Just rebased and pushed

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

also pls git rebase -i master then push upstream (with -f).

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

nvm. you are only 1 or 2 commits behind.

"Moving max of 1d array of dtype=float64 along axis=0 ignoring NaNs."
cdef np.float64_t ai, aold
def roll_max(ndarray[numeric] a, int window, int minp):
"Moving max of 1d array of any numeric type along axis=0 ignoring NaNs."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus if we can document the params (even though its an internal function)

@jreback jreback added this to the 0.18.1 milestone Mar 17, 2016
@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 17, 2016
@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

I think you need to take out some of the _ensure_float64 that were implemented in the related PR.

I would expect this to return float32 no?

In [4]: Series([1,2,3,4],dtype='float32').rolling(window=2).max()
Out[4]: 
0    NaN
1    2.0
2    3.0
3    4.0
dtype: float64

pretty much everything else is going to be upcast though

@joshuastorck
Copy link
Contributor Author

That was why I originally had an as_float argument, which was in there for backwards compatibility. Are you saying that you are ok with breaking type compatibility?

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

@joshuastorck not sure what you mean. We are now breaking type compat by casing to float64. if you can preserve the type then by all means do it. You don't need a flag for that it should just work. Of course for integers ATM this won't work, but that's a different issue.

@joshuastorck
Copy link
Contributor Author

So you want this code in window.py:

        # GH #12373 : rolling functions error on float32 data
        # make sure the data is coerced to float64
        if com.is_float_dtype(values.dtype):
            values = com._ensure_float64(values)
        elif com.is_integer_dtype(values.dtype):
            values = com._ensure_float64(values)

To change to this?

        if com.is_integer_dtype(values.dtype):
            values = com._ensure_float64(values)

@joshuastorck
Copy link
Contributor Author

Also, I only changed the roll_min/max functions. Wouldn't that code change break all of the other rolling functions?

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

hmm, yes ideally that would be the change. But you are right the rest have not been converted. ok then.

pls squash / ping when green.

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

ok, I linked the master issue: #8659

so next up is fixing the remainder of the rolling functions, so we don't need to upcast floats. :)

…ype:

  * Changed the rolling min/max functions in algos.pyx so
    that they use a cython fused type as input instead of a float64
    so that the function can accept arrays of any numeric type
  * Merged the functionality of rolling min/max into a common
    function with branches based on whether or not it's running
    min/max
  * When running rolling min/max for intergral types and there
    are not enough minimum periods, the output values returned
    are zero
  * Added a unit test to test_moments to make sure that rolling
    min/max works for all integral types and float32/64
  * Updated computations and whatsnew doc
@joshuastorck joshuastorck force-pushed the generic_rolling_min_max branch from 4cdde46 to b5a84cf Compare March 17, 2016 03:39
@joshuastorck
Copy link
Contributor Author

Rebased and Travis CI is green

return y

cdef double_t _get_max(object skiplist, int nobs, int minp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you add this? I don't see this (or _get_min) used AT ALL in the codebase. any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to fix, just lmk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was there before and was removed but I accidentally added it in a merge. I won’t have a chance to get to this today. Can you accept the pull request and I’ll make a separate PR to remove it?

From: Jeff Reback [mailto:[email protected]]
Sent: Thursday, March 17, 2016 9:36 AM
To: pydata/pandas
Cc: Joshua Storck
Subject: Re: [pandas] Implementing rolling min/max functions that can retain the original type (#12595)

In pandas/algos.pyxhttps://github.com//pull/12595#discussion_r56503823:

 return y

+cdef double_t _get_max(object skiplist, int nobs, int minp):

you don't need to fix, just lmk.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/12595/files/b5a84cf309ffc8cbb20cb6b6534d2ad2083aa5bf#r56503823

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to merge this and take it out. thxnks

jreback pushed a commit to jreback/pandas that referenced this pull request Mar 17, 2016
…ype:

  * Changed the rolling min/max functions in algos.pyx so
    that they use a cython fused type as input instead of a float64
    so that the function can accept arrays of any numeric type
  * Merged the functionality of rolling min/max into a common
    function with branches based on whether or not it's running
    min/max
  * When running rolling min/max for intergral types and there
    are not enough minimum periods, the output values returned
    are zero
  * Added a unit test to test_moments to make sure that rolling
    min/max works for all integral types and float32/64
  * Updated computations and whatsnew doc

closes pandas-dev#12595
@jreback jreback closed this in bf89220 Mar 17, 2016
@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

@joshuastorck thanks for the fixes!

#8659 (using fused types in the rest of the algos) is waiting for you!

@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

@joshuastorck going to start going thru your changes for support on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants