-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Inconsistent parsing for timestamp with timezone with read_sql_query #15119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can you create a test using the existing test infrastructure (and either push a commit or paste here) |
Working on it. I'll try to come up with something today. |
Turns out there is already a test for it, but it's too lenient on purpose: someone obviously thought about this before. Should I try to fix this? I think it should always return datetime64tz, but maybe it's the best that can be done at the moment. diff --git a/pandas/io/tests/test_sql.py b/pandas/io/tests/test_sql.py
index cb08944e8..d8eff4ccf 100644
--- a/pandas/io/tests/test_sql.py
+++ b/pandas/io/tests/test_sql.py
@@ -1310,9 +1310,7 @@ class _TestSQLAlchemy(SQLAlchemyMixIn, PandasSQLTest):
# even with the same versions of psycopg2 & sqlalchemy, possibly a
# Postgrsql server version difference
col = df.DateColWithTz
- self.assertTrue(is_object_dtype(col.dtype) or
- is_datetime64_dtype(col.dtype) or
- is_datetime64tz_dtype(col.dtype),
+ self.assertTrue(is_datetime64tz_dtype(col.dtype),
"DateCol loaded with incorrect type -> {0}"
.format(col.dtype)) |
@thrasibule well sometimes we seemed to get object dtypes back and sometimes it was correctly returned as datetime64, so this is lenient on purpose. never could narrow down why sometimes it didn't work.. |
@thrasibule See also the discussion in #11216 (comment). This is a complex topic, so I should dive into it again, but I think we want to coerce all tz aware datetimes to UTC (so to keep the tz-aware part, but to have a consistent output). Based on the discussions in #7364 and #11216, I would also think it is this way, but apparently not (at least not in your case). Certainly welcome to look into it! |
I encountered similar problem but on the to_sql() side of parsing
Both section of the above code will raise exception if timezone-aware index/data is used.
Based on #7364 I also agree the correct behavior is to coerce to UTC.
|
Code Sample, a copy-pastable example if possible
I'm creating a test table in postgresql as follows:
Notice that the dates are across the dst date, so that they have different offsets to utc.
Problem description
Expected Output
The current behavior if I understand correctly is that if there is only one timezone offset: timestamps are parsed automatically to timezone aware timestamps.
If dates are across different offsets, they are not parsed automatically. We can force parsing using parse_dates, but then they are not timezone aware. I can make the conversion using tz_localize, but it's not very consistent. I would expect to have everything parsed automatically to timezone aware timestamps.
This is related to issue #7364. If there is a consensus on what should be done, I can try to make the required changes to make it more regular.
Output of
pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 33.0.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0rc2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.5
sqlalchemy: 1.1.4
pymysql: 0.7.9.None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: