Skip to content

Strange and possibly buggy behaviour when using series.dt with series.apply(strftime) #12793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phil20686 opened this issue Apr 4, 2016 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype

Comments

@phil20686
Copy link

It seems like Series.dt timezone properties to not interact well with series.apply(lambda x : x.strftime("..")) leading to some odd behaviour. Here is a script demonstrating the problem:

import pandas as pd
import logging
import pytz
from pytz import timezone

date_format_string = "%a %b %d %Y %H:%M"

def convert_datetime(x) :
    return x.strftime(date_format_string)

def longer_convert(x, tz = timezone('Europe/London')) :
    return convert_datetime(pytz.utc.localize(x).astimezone(tz))


if __name__=="__main__" :
    rng = pd.date_range('6/6/2011', periods=6, freq='H')
    rng2 = pd.date_range('6/6/2011', periods=6, freq='D')

    series = pd.Series(data=rng, index=rng2)
    series.name = "Original"

    converted_series = series.dt.tz_localize('UTC').dt.tz_convert('Europe/London')
    converted_series.name = "Converted"

    formatted_series = converted_series.apply(convert_datetime)
    formatted_series.name = "Formatted"

    longer_converted = series.apply(longer_convert)
    longer_converted.name = "Using pytz"

    df=pd.concat([series,converted_series,formatted_series, longer_converted], axis=1)
    print df

Which yields the output

                  Original                 Converted  
2011-06-06 2011-06-06 00:00:00     2011-06-06 01:00:00+01:00   
2011-06-07 2011-06-06 01:00:00     2011-06-06 02:00:00+01:00   
2011-06-08 2011-06-06 02:00:00     2011-06-06 03:00:00+01:00   
2011-06-09 2011-06-06 03:00:00     2011-06-06 04:00:00+01:00   
2011-06-10 2011-06-06 04:00:00     2011-06-06 05:00:00+01:00   
2011-06-11 2011-06-06 05:00:00     2011-06-06 06:00:00+01:00   

                        Formatted             Using pytz  
2011-06-06  Mon Jun 06 2011 00:00      Mon Jun 06 2011 01:00  
2011-06-07  Mon Jun 06 2011 01:00      Mon Jun 06 2011 02:00  
2011-06-08  Mon Jun 06 2011 02:00      Mon Jun 06 2011 03:00  
2011-06-09  Mon Jun 06 2011 03:00      Mon Jun 06 2011 04:00  
2011-06-10  Mon Jun 06 2011 04:00      Mon Jun 06 2011 05:00  
2011-06-11  Mon Jun 06 2011 05:00      Mon Jun 06 2011 06:00  

In which the application of the format string using apply caused the times to revert to UTC (Compare the hours in Formatted vs Converted), whereas I expected to get output similar to the "Using pytz" column, where I have displayed the time in local time with no timezone adjustment.

Is this expected behaviour? Can I format to a string directly through Series.dt?

@phil20686 phil20686 changed the title Strange and possibly bugging behaviour when using series.dt with series.apply(strftime) Strange and possibly buggy behaviour when using series.dt with series.apply(strftime) Apr 4, 2016
@jreback
Copy link
Contributor

jreback commented Apr 4, 2016

pls pd.show_versions()

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype Duplicate Report Duplicate issue or pull request labels Apr 4, 2016
@jreback
Copy link
Contributor

jreback commented Apr 4, 2016

this is a dupe of #11757 not yet fixed. pull-requests are welcome.

@jreback jreback closed this as completed Apr 4, 2016
@phil20686
Copy link
Author

pd.show_versions()

@phil20686
Copy link
Author

pd.show_versions():

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.4
pip: 8.1.1
setuptools: 20.2.2
Cython: None
numpy: 1.9.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 2.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: 2.3.1
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.7.3
boto: None

converted_series.dt.strftime("...") works correctly so I can use that. Didn't really think of it till just now.

@jreback jreback added this to the No action milestone Apr 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants