Skip to content

Confusing exception message when re-sampling index containing NaT #16356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pwaller opened this issue May 15, 2017 · 5 comments
Closed

Confusing exception message when re-sampling index containing NaT #16356

pwaller opened this issue May 15, 2017 · 5 comments
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Resample resample method

Comments

@pwaller
Copy link
Contributor

pwaller commented May 15, 2017

Example

import pandas as pd
from pytz import UTC
pd.DataFrame([1,2], index=[pd.NaT, pd.datetime(2017, 5, 15, tzinfo=UTC)]).resample("W").sum()

Problem description

If a DataFrame's index contains NaT, when you try to resample it you get the error: ValueError: Passed item and index have different timezone. This is confusing, because while technically correct, it leads the user to investigate timezones, and not the presence of missing times.

The error message lead me down a rabbit hole of trying to understand how a wrong timezone had crept in, but the timezones were consistent - it was the NaT which is a problem. I was not expecting my (large, >500k) index to contain NaT. There was a single NaT entry buried right in the middle of it and I had to bisect the dataset to figure out why it wasn't working.

Searching for the error message shows up very few hits (currently 3, mostly source code).

Expected Output

Either to work and omit the value, or an error message which makes it clear that the problem is a NaT value, not a timezone issue.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1013-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.2
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: 0.0.9
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 15, 2017

pls follow the issue submission format, IOW include pd.show_versions()

@pwaller
Copy link
Contributor Author

pwaller commented May 15, 2017

Apologies @jreback - and thanks for your (and all others!) incredible efforts to develop maintain this library. I have updated the issue body into the standard format.

@jreback
Copy link
Contributor

jreback commented May 15, 2017

This is fundamentally caused by this #16357, which is a bug. The error message should actually not happen at all.

@jreback jreback added Difficulty Intermediate Error Reporting Incorrect or improved errors from pandas Resample resample method Datetime Datetime data dtype labels May 17, 2017
@jreback jreback added this to the Next Major Release milestone May 17, 2017
@ron819
Copy link

ron819 commented Dec 10, 2018

@jreback
#16357 was resolved? So this one could be closed as well?

@jreback jreback closed this as completed Dec 11, 2018
@jreback
Copy link
Contributor

jreback commented Dec 11, 2018

yep this is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Resample resample method
Projects
None yet
Development

No branches or pull requests

3 participants