Skip to content

apply changes the datatypes: strange behaviour/bug? #8660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
denadai2 opened this issue Oct 28, 2014 · 8 comments
Closed

apply changes the datatypes: strange behaviour/bug? #8660

denadai2 opened this issue Oct 28, 2014 · 8 comments

Comments

@denadai2
Copy link

Hello. I tried to apply a function to a dataframe and it changes the column types. See this example: http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/1118905/Calls-bug.ipynb

Am I doing something wrong? Is it a bug?

pandas 0.15.0

@jreback
Copy link
Contributor

jreback commented Oct 28, 2014

.apply coerces object dtypes; datetimes with tz are represented by timezones, so they will be coerced to naive datetime64[ns]. Using apply with a mixed dtype frame like this is not a very intuitive / normal thing to do. You should simply treat the columns separately that you need.

in the future, pls post a simple self-reproducing example. It makes it easier to see what is the problem.

@denadai2
Copy link
Author

Mhh so what if I need to have an operation that checks/sets the holidays? I have to use apply somehow..

(sorry for the usage question, but it seems almost a bug to me, if not documented)

@jreback
Copy link
Contributor

jreback commented Oct 29, 2014

why/how are you using apply?

@denadai2
Copy link
Author

Mhh because I need to compare all the dates one by one

Sent from my iPhone

On 29 Oct 2014, at 21:55, jreback [email protected] wrote:

why/how are you using apply?


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor

jreback commented Oct 29, 2014

and exactly how are you doing that?

@denadai2
Copy link
Author

Something naive like this:

import datetime

holidays = [.......]

def mergeWithHolidays(s):
    dateString = "%04d-%02d-%02d" % (s['datetime'].year, s['datetime'].month, s['datetime'].day)
    weekday = 6
    if dateString not in holidays:
        weekday = s['datetime'].weekday
    s['weekDay'] = weekday
    return s

df = df.apply(mergeWithHolidays, 1)
df.head()

But of course pandas converts me the dates in UTC, so everything is wrong. http://stackoverflow.com/questions/26593908/how-to-set-holidays-with-pandas-in-a-timezone-aware-way

@jreback
Copy link
Contributor

jreback commented Oct 30, 2014

Well not sure what you are expecting as output, but that is not going to work. (and will be as slow as can be)

try something like this to set the holidays to Nat (or you can keep a boolean array or whatever)

s[~s.isin(list_of_holidays)] = pd.NaT

See here:
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-date-components
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#holidays-holiday-calendars

You might want to create a custom business index with holidays to handle this.

Apply on object by definition tries to convert. At some point timezone-aware will be supported fully, so this is not preservable at the moment., see #8260

As a further comment. Using DST in dates is really odd when dealing with DAILY holidays. I have never seen this done, thought I guess conceptually it should work. Prob better to attach the tz after. Against because it doesn't matter for the holiday determination.

@jreback jreback closed this as completed Oct 30, 2014
@denadai2
Copy link
Author

thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants