Skip to content

passing np.std to resample does not actually call the numpy function #3844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsudheer opened this issue Jun 11, 2013 · 15 comments
Closed

passing np.std to resample does not actually call the numpy function #3844

jsudheer opened this issue Jun 11, 2013 · 15 comments
Labels
Milestone

Comments

@jsudheer
Copy link

Hi,
Again the std issue
When I call numpy.std outside pandas I get the population std where as if I call inside it gives me sample std. How can I ask pandas to allow np.std to behave as it originally does?
wit best regards,
Sudheer

In [13]: ts[53153:53155]
Out[13]:
2007-09-15 15:30:00 30.1
2007-09-15 15:40:00 31.6
Freq: 10T

In [14]: np.std(ts[53153:53155])
Out[14]: 0.75

In [15]: ts[53153:53155].resample('D',how=np.std)
Out[15]:
2007-09-15 1.06066
Freq: D
@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2013

i believe the method name is looked up and so it calls the pandas version...marking as a bug

@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2013

@jreback i think it's a bit magical to look up np.std.__name__ and then call a cythonized method based on that. i'm assuming this is for perf reasons; what do u think about just using string for the aforementioned purpose and just calling callables...

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

Do it this way, the std is defined as is, to get another behavior, you need to provide a custom function

In [18]: s.groupby(pd.TimeGrouper('D')).apply(lambda x: np.std(x))
Out[18]: 
2007-09-15    0.75

@jreback jreback closed this as completed Jun 11, 2013
@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

@cpcloud this is not a bug

@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2013

oh ok my bad...

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

@cpcloud well there is a bug somewhat related

This should work I think, something screwy here
(but looking up and translating np.std -> pandas std is done everywhere), for good reason,
numpy doesn't handle nans

In [19]: s.resample('D',how=lambda x: np.std(x))
Out[19]: 
2007-09-15   NaN
Freq: D, dtype: float64

@cpcloud
Copy link
Member

cpcloud commented Jun 11, 2013

yep this is the thing i brought up in @jsudheer's last issue. right, i always forget about nans because pandas handles them so gracefully most of the time and so i just don't have to think about it or i have to use the stupid nan* functions (most only in scipy) or used masked arrays :'(

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

@cpcloud actually this is the only case where it fails (when its a single day), so its a bug, but edge case (e.g. when crossing multiple days it works)

@jsudheer
Copy link
Author

So even if we call how=np.std it does the pandas std? ie how=std and
how=np.std are made same by pandas.... I had the impression that I can get
population standard deviation by calling np.std which is now turned out to
be wrong...
with best regards,
Sudheer

On Tue, Jun 11, 2013 at 7:50 PM, jreback [email protected] wrote:

@cpcloud https://github.com/cpcloud actually this is the only case
where it fails (when its a single day), so its a bug, but edge case (e.g.
when crossing multiple days it works)


Reply to this email directly or view it on GitHubhttps://github.com//issues/3844#issuecomment-19264631
.

with best regards

Sudheer


Dr. Sudheer Joseph

Scientist

INDIAN NATIONAL CENTRE FOR OCEAN INFORMATION SERVICES (INCOIS)
MINISTRY OF EARTH SCIENCES, GOVERNMENT OF INDIA
"OCEAN VALLEY" PRAGATHI NAGAR (BO)
OPP.JNTU, NIZAMPET SO
Andhra Pradesh, India. PIN- 500 090.
TEl:+91-40-23044600(R),Tel:+91-9440832534(Mobile)
Tel:+91-40-23886047(O),Fax:+91-40-23892910(O)
E-mail: [email protected]; [email protected].
Web- http://oppamthadathil.tripod.com
--------------* ---------------
"The ultimate measure of a man is
not where he stands in moments of
comfort and convenience, but where
he stands at times of challenge and
controversy."
Martin Luther King, Jr.
"What we have done for ourselves alone dies with us.
What we have done for others and the world remains and is immortal."

  • Albert Pines

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

that is a feature, most numpy routines are NOT nan-safe (and slower) than pandas routines

you can also get tthe results you want by doing this. As I said many times, you should just define a function of your own (which is what this lambda is), then you can control exactly what you get

In [12]: s.groupby(pd.TimeGrouper('D')).apply(lambda x: x.std(ddof=0))
Out[12]: 
2007-09-15    0.75
dtype: float64

@jsudheer
Copy link
Author

Thank you,
I missed one of your mails which was send an hour ago..
which now I am seeing. In fact I did not understand the what you mentioned
in past mails regarding function which is clear now.
with best regards,
Sudheer

On Tue, Jun 11, 2013 at 9:04 PM, jreback [email protected] wrote:

that is a feature, most numpy routines are NOT nan-safe (and slower) than
pandas routines

you can also get tthe results you want by doing this. As I said many
times, you should just define a function of your own (which is what this
lambda is), then you can control exactly what you get

In [12]: s.groupby(pd.TimeGrouper('D')).apply(lambda x: x.std(ddof=0))
Out[12]:
2007-09-15 0.75
dtype: float64


Reply to this email directly or view it on GitHubhttps://github.com//issues/3844#issuecomment-19269613
.

with best regards

Sudheer


Dr. Sudheer Joseph

Scientist

INDIAN NATIONAL CENTRE FOR OCEAN INFORMATION SERVICES (INCOIS)
MINISTRY OF EARTH SCIENCES, GOVERNMENT OF INDIA
"OCEAN VALLEY" PRAGATHI NAGAR (BO)
OPP.JNTU, NIZAMPET SO
Andhra Pradesh, India. PIN- 500 090.
TEl:+91-40-23044600(R),Tel:+91-9440832534(Mobile)
Tel:+91-40-23886047(O),Fax:+91-40-23892910(O)
E-mail: [email protected]; [email protected].
Web- http://oppamthadathil.tripod.com
--------------* ---------------
"The ultimate measure of a man is
not where he stands in moments of
comfort and convenience, but where
he stands at times of challenge and
controversy."
Martin Luther King, Jr.
"What we have done for ourselves alone dies with us.
What we have done for others and the world remains and is immortal."

  • Albert Pines

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

np....if you can think of places in the docs where things are not clear let us know!

@jsudheer
Copy link
Author

Is this the doc or there are others
http://pandas.pydata.org/pandas-docs/dev/pandas.pdf

Also from below statement, I was expecting I will get std with 15 days and
on either side (1- 15- 31) with result printed against 15th of each month.
Is this the correct behaviour or I am wrong?
Even with below statement I get results as

2006-08-31 28.765038
2006-09-30 16.037042

ts.groupby(pd.TimeGrouper('M',closed='center',label='center')).apply(lambda
x: x.std(ddof=0))

I expected
2006-08-15
2006-09-15
etc....
On Tue, Jun 11, 2013 at 9:13 PM, jreback [email protected] wrote:

np....if you can think of places in the docs where things are not clear
let us know!


Reply to this email directly or view it on GitHubhttps://github.com//issues/3844#issuecomment-19270313
.

with best regards

Sudheer


Dr. Sudheer Joseph

Scientist

INDIAN NATIONAL CENTRE FOR OCEAN INFORMATION SERVICES (INCOIS)
MINISTRY OF EARTH SCIENCES, GOVERNMENT OF INDIA
"OCEAN VALLEY" PRAGATHI NAGAR (BO)
OPP.JNTU, NIZAMPET SO
Andhra Pradesh, India. PIN- 500 090.
TEl:+91-40-23044600(R),Tel:+91-9440832534(Mobile)
Tel:+91-40-23886047(O),Fax:+91-40-23892910(O)
E-mail: [email protected]; [email protected].
Web- http://oppamthadathil.tripod.com
--------------* ---------------
"The ultimate measure of a man is
not where he stands in moments of
comfort and convenience, but where
he stands at times of challenge and
controversy."
Martin Luther King, Jr.
"What we have done for ourselves alone dies with us.
What we have done for others and the world remains and is immortal."

  • Albert Pines

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

http://pandas.pydata.org/pandas-docs/dev/timeseries.html

you are doing something subtle, so you really need to read and understand the docs

@jsudheer
Copy link
Author

Thank you,
This is really good one.
With best regards
Sudheer
On Tuesday, June 11, 2013, jreback wrote:

http://pandas.pydata.org/pandas-docs/dev/timeseries.html

you are doing something subtle, so you really need to read and understand
the docs


Reply to this email directly or view it on GitHubhttps://github.com//issues/3844#issuecomment-19271880
.

with best regards

Sudheer


Dr. Sudheer Joseph

Scientist

INDIAN NATIONAL CENTRE FOR OCEAN INFORMATION SERVICES (INCOIS)
MINISTRY OF EARTH SCIENCES, GOVERNMENT OF INDIA
"OCEAN VALLEY" PRAGATHI NAGAR (BO)
OPP.JNTU, NIZAMPET SO
Andhra Pradesh, India. PIN- 500 090.
TEl:+91-40-23044600(R),Tel:+91-9440832534(Mobile)
Tel:+91-40-23886047(O),Fax:+91-40-23892910(O)
E-mail: [email protected]; [email protected].
Web- http://oppamthadathil.tripod.com
--------------* ---------------
"The ultimate measure of a man is
not where he stands in moments of
comfort and convenience, but where
he stands at times of challenge and
controversy."
Martin Luther King, Jr.
"What we have done for ourselves alone dies with us.
What we have done for others and the world remains and is immortal."

  • Albert Pines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants