Skip to content

BUG: Use Series.where rather than np.where in clip #3998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2013
Merged

BUG: Use Series.where rather than np.where in clip #3998

merged 1 commit into from
Jun 25, 2013

Conversation

miketkelly
Copy link

Closes #3996

@jreback
Copy link
Contributor

jreback commented Jun 22, 2013

can u add release notes mention

and another test that has NaT in the series?

@miketkelly
Copy link
Author

Good catch on NaT. I'll add some tests. Just to confirm, what behavior do you want in that case? This commit is currently consistent with 11.0 behavior, which is itself consistent with the fact that NaT is ordered strictly lower than other timestamps.

>>> pd.__version__
'0.11.0'

>>> s
0                   NaT
1   2000-01-02 00:00:00
2   2000-01-03 00:00:00
dtype: datetime64[ns]
>>> s.clip_lower(s[1].value)
0   2000-01-02 00:00:00
1   2000-01-02 00:00:00
2   2000-01-03 00:00:00
dtype: datetime64[ns]
>>> s.clip_upper(s[1].value)
0                   NaT
1   2000-01-02 00:00:00
2   2000-01-02 00:00:00
dtype: datetime64[ns]
>>> s.clip_lower(pd.NaT.value)
0                   NaT
1   2000-01-02 00:00:00
2   2000-01-03 00:00:00
dtype: datetime64[ns]
>>> s.clip_upper(pd.NaT.value)
0   NaT
1   NaT
2   NaT
dtype: datetime64[ns]

At the expense of explicit null checking and some backwards incompatibility, we could instead make the behavior more nan-like. Thoughts?

@jreback
Copy link
Contributor

jreback commented Jun 23, 2013

these should work with out u specifying value directly
eg s[1]

as far as NaT go they should be completely excluded
so this might be a bit more complicated

u need to compute the mask first, then do the where using that mask (so that those values not in the mask are only ones included), then put the Nan's back in place

this should work for datetimes and other dtypes

I'll give u a reference to look at

@jreback
Copy link
Contributor

jreback commented Jun 23, 2013

actually I think this easier that I said

just make the cond = (s > 0) & notnull(s)

I think will do the right thing

@miketkelly
Copy link
Author

I have it coded like this:


    def clip_upper(self, threshold):
        cond = (self < threshold) | isnull(self) | isnull(threshold)
        return self.where(cond, threshold)

    def clip_lower(self, threshold):
        cond = (self > threshold) | isnull(self) | isnull(threshold)
        return self.where(cond, threshold)

I'll finish up the PR tomorrow.

@jreback
Copy link
Contributor

jreback commented Jun 23, 2013

ok that looks right

throw in some test cases including some with nan (and different dtypes) I u can
thanks!

@jreback
Copy link
Contributor

jreback commented Jun 23, 2013

see if you can finish this up...want to get it itn

@miketkelly
Copy link
Author

Build failed in Python 3 for the case when the threshold is None. What behavior do you want in that case? Throw an exception? Return a copy of the series? Or just take it out of the test as a nonsensical condition?

@jreback
Copy link
Contributor

jreback commented Jun 24, 2013

if the threshold is None (which btw ONLY exists in an object series, otherwise its converted to nan), or nan, then I think the operation is undefined. Can't really compare to nan; so either treat it as inf, which means do nothing or raise. It think raising a ValueError would be fine.

@hayd
Copy link
Contributor

hayd commented Jun 24, 2013

Python 3 fails test, I think because of (self >= threshold), but then this worked before hmm:

======================================================================
ERROR: test_clip_types_and_nulls (pandas.tests.test_series.TestSeries)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.2_with_system_site_packages/lib/python3.2/site-packages/pandas-0.11.1.dev_f4ed17b-py3.2-linux-x86_64.egg/pandas/tests/test_series.py", line 2851, in test_clip_types_and_nulls
    l = s.clip_lower(thresh)
  File "/home/travis/virtualenv/python3.2_with_system_site_packages/lib/python3.2/site-packages/pandas-0.11.1.dev_f4ed17b-py3.2-linux-x86_64.egg/pandas/core/series.py", line 2002, in clip_lower
    cond = (self >= threshold) | isnull(self) | isnull(threshold)
  File "/home/travis/virtualenv/python3.2_with_system_site_packages/lib/python3.2/site-packages/pandas-0.11.1.dev_f4ed17b-py3.2-linux-x86_64.egg/pandas/core/series.py", line 236, in wrapper
    res = na_op(values, other)
  File "/home/travis/virtualenv/python3.2_with_system_site_packages/lib/python3.2/site-packages/pandas-0.11.1.dev_f4ed17b-py3.2-linux-x86_64.egg/pandas/core/series.py", line 206, in na_op
    result = lib.scalar_compare(x, y, op)
  File "lib.pyx", line 610, in pandas.lib.scalar_compare (pandas/lib.c:10608)
TypeError: unorderable types: str() >= NoneType()
----------------------------------------------------------------------

@miketkelly
Copy link
Author

Yep. I don't have a Python 3 environment yet. Building one now...

@miketkelly
Copy link
Author

Is that really true, @jreback? It short-circuits at the element level?

@jreback
Copy link
Contributor

jreback commented Jun 24, 2013

fyi you also might be interestd in a vagrantup.com environment....pretty easy to setup

@jreback
Copy link
Contributor

jreback commented Jun 24, 2013

yes

In [6]: s = Series([None,'a','b','c'])

In [7]: pd.notnull(s) & (s>'aaa')
Out[7]: 
0    False
1    False
2     True
3     True
dtype: bool

In [8]: s>'aaa'
Out[8]: 
0    False
1    False
2     True
3     True
dtype: bool

@jreback
Copy link
Contributor

jreback commented Jun 24, 2013

@mtki you resolved the python3 issue?

@miketkelly
Copy link
Author

Builds are now passing.

@hayd
Copy link
Contributor

hayd commented Jun 25, 2013

@mtkni needs a rebase (probably due to conflict in release.rst)

@miketkelly
Copy link
Author

Rebased.

hayd added a commit that referenced this pull request Jun 25, 2013
BUG: Use Series.where rather than np.where in clip
@hayd hayd merged commit e12d6f7 into pandas-dev:master Jun 25, 2013
@hayd
Copy link
Contributor

hayd commented Jun 25, 2013

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with Series.clip with Timestamp threshold
3 participants