-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG or DOC: pd.read_csv with parse_dates does not recognize timezone #22256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with your first impression. Patch and PR are welcome! |
Can you try this on master? @mroeschke had a PR recently that I think should have fixed this. |
@jbrockmendel : Unfortunately, no luck. I can reproduce this on |
On master, if
I think we use On that note, what should be the expected behavior?
|
This change fixed this issue specifically but not sure how it will affect other tests.
|
@mroeschke Thanks for pointing |
@swyoon I look forward to seeing you! ;) |
- use box=True for to_datetime(), and adjust downstream processing to the change. - resolve pandas-dev#22256
- use box=True for to_datetime(), and adjust downstream processing to the change. - resolve pandas-dev#22256
- use box=True for to_datetime(), and adjust downstream processing to the change. - resolve pandas-dev#22256
When parsing a timezone-aware datetime in a csv file with
pd.read_csv
+parse_dates
, it returns naive timestampes converted to UTC, and it was a surprise for me.Example
Consider we are reading the following data. Let's say its name is
pandas_read_csv_bug.csv
.It is a simple timeseries data with timezone (UTC+09:00) specified.
I want to read it with
pd.read_csv
usingparse_dates
keyword argument activated.If working properly, this seems to be the most elegant solution.
However, the result is a data frame
df
with strange timestamps.Problem description
My surprise was,
df['dt'].iloc[0].tz is None == True
My first impression was that it shouldn't be the best possible behavior.
However, as an UTC offset does not uniquely corresponds to a single timezone, this could be the safest/most reasonable behavior.
In that case, the documentation should mention this behavior.
Output of
pd.show_versions()
pandas: 0.23.4
pytest: 3.3.1
pip: 9.0.3
setuptools: 38.5.1
Cython: None
numpy: 1.15.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: