Skip to content

Bug: Timestamp removes timezone localization #15777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mroeschke opened this issue Mar 22, 2017 · 7 comments
Closed

Bug: Timestamp removes timezone localization #15777

mroeschke opened this issue Mar 22, 2017 · 7 comments
Labels
Bug Datetime Datetime data dtype Timezones Timezone data dtype
Milestone

Comments

@mroeschke
Copy link
Member

In [2]: tz = pd.DatetimeIndex(['2013-01-01 06:00'], tz='US/Pacific').tz

In [3]: tz
Out[3]: <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

In [4]: pd.Timestamp('2013-01-01 06:00', tz=tz).tz
Out[4]: <DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>

Problem description

Localization of the timezone should be maintained when creating a Timestamp. I also believe this is the root issue of #13238 since resample() calls groupby() which (I think) reconstructs the index with Timestamps.

In [10]: s = pd.Series(index=pd.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00'],
                                     tz='America/Los_Angeles'),
              data=[1, 2, 3])
In [13]: s.index.tz
Out[13]: <DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>

In [14]: s.resample('D').max().index.tz
Out[14]: <DstTzInfo 'America/Los_Angeles' PST-1 day, 16:00:00 STD>

Expected Output

<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 92239f5
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0+644.g92239f5.dirty
pytest: 3.0.6
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 22, 2017

xref #7825
xref #11481

that lists a bunch of relatives issues and cases all stemming from this
the constructor is off when doing this - it should be changed to work like DTI

fixing this would be great! the

@jreback jreback added Datetime Datetime data dtype Timezones Timezone data dtype Bug Difficulty Intermediate labels Mar 22, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 22, 2017
@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

@mroeschke some additional DST crossing examples in #15823

@mroeschke
Copy link
Member Author

Thanks @jreback.

I believe I found the issue here: https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslib.pyx#L220. In tz_convert_single there's a check (_is_tzlocal) and codepath for localize timezones. The check seems to be specifically for dateutil objects while pytz objects can pass through here as well. Including a check for localized pytz objects (i.e. tz._tzname == 'LMT', might be a better check than this) seemed to fix this issue. Will investigate further tonight.

@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

yeah the treatment is prob a bit off. see also _localize_tso which does handle this correctly.

note that _is_fixed_offset picks this up correctly. so maybe this is only a problem when _is_tzlocal is called and NOT _is_fixed_offset).

happy to have this restructued btw. odd that more things are not failing.

@mroeschke
Copy link
Member Author

mroeschke commented Apr 4, 2017

Unfortunately my initial suggestion did not fix the issue after playing around with it.

On a related note, it seems like localizing a naive Timestamp (which should be the same as constructing a Timestamp with a tz) operates the same as pytz.

In [3]: d = datetime(2017, 1, 1)

In [4]: tz = pytz.timezone('US/Pacific')

In [5]: tz
Out[5]: <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

#localize with pytz
In [6]: tz.localize(d).tzinfo
Out[6]: <DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>

#localize with pd.Timestamp (should be same as Timestamp(d, tz=tz))
In [8]: pd.Timestamp(d).tz_localize(tz).tz
Out[8]: <DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>

#localize with pd.DatetimeIndex
In [9]: pd.DatetimeIndex([d]).tz_localize(tz).tz
Out[9]: <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

'US/Pacific' LMT and 'US/Pacific' PST have different UTC offsets (~7.8 hours vs. ~8 hours respectively). I am not a timezone expert so I am not sure if these three examples should fundamentally agree or if Pandas has a different strategy for storing time zones? It would be nice if these all fundamentally agreed though.

@jreback
Copy link
Contributor

jreback commented Apr 4, 2017

these conceptually HAVE to be different. The timezone on a single localized timestamp is defined exactly. However, the string tz on a DatetimeIndex is something like 'US/Pacific', it cannot itself be localized because it doesn't have a reference date. It actually has many reference dates (e.g. each point in the index). So which one shall you pick?

so the tzinfo on a DTI is just today's I think.

Note in practice this doesn't actually make any difference, its just a display thing.

@jreback jreback modified the milestones: Next Minor Release, Next Major Release Apr 4, 2017
@mroeschke
Copy link
Member Author

Ah okay that makes sense that a DatetimeIndex has multiple reference dates while a Timestamp has a definitive reference date, and that this is a display thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants