BUG: formating integers datetimes using sql GH17855 #17882

drorata · 2017-10-15T20:24:02Z

closes Failing to parse date given as integers from a (MS)SQL query #17855
tests passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2017-10-15T20:54:53Z

Codecov Report

Merging #17882 into master will decrease coverage by 0.01%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master   #17882      +/-   ##
==========================================
- Coverage   91.23%   91.21%   -0.02%     
==========================================
  Files         163      163              
  Lines       50102    50104       +2     
==========================================
- Hits        45712    45704       -8     
- Misses       4390     4400      +10

Flag	Coverage Δ
#multiple	`89.02% <0%> (-0.01%)`	⬇️
#single	`40.31% <66.66%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/sql.py	`94.66% <66.66%> (-0.15%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aed9b92...0498dd1. Read the comment docs.

codecov · 2017-10-15T20:54:56Z

Codecov Report

Merging #17882 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17882      +/-   ##
==========================================
- Coverage   91.36%   91.34%   -0.02%     
==========================================
  Files         164      164              
  Lines       49730    49732       +2     
==========================================
- Hits        45435    45429       -6     
- Misses       4295     4303       +8

Flag	Coverage Δ
#multiple	`89.14% <0%> (ø)`	⬆️
#single	`39.65% <100%> (-0.04%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/sql.py	`94.79% <100%> (-0.01%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️
pandas/core/resample.py	`96.34% <0%> (+0.18%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 509e03c...987f588. Read the comment docs.

gfyoung · 2017-10-16T00:43:22Z

@drorata : Will need to add tests for the new behavior.

gfyoung · 2017-10-16T00:44:16Z

pandas/io/sql.py

@@ -109,7 +109,11 @@ def _handle_date_column(col, utc=None, format=None):
              issubclass(col.dtype.type, np.integer)):
            # parse dates as timestamp
            format = 's' if format is None else format
-            return to_datetime(col, errors='coerce', unit=format, utc=utc)
+            if '%' in format:


Add a comment as to why you are doing this logic branching (and reference issue number).

On a second thought, I think we can do this a bit cleaner like this:

if format is None and (issubclass(col.dtype.type, np.floating) or issubclass(col.dtype.type, np.integer)): format = 's' if format in ['D', 'd', 'h', 'm' 's', 'ms', 'us', 'ns']: return to_datetime(col, errors='coerce', unit=format, utc=utc) elif is_datetime64tz_dtype(col): ... else: return to_datetime(col, errors='coerce', format=format, utc=utc)

So first check for the specific case of numeric values and no format -> parse as seconds. Then the format arg is checked for all possible values for unit. Once this check is passed, we don't need to check if '%' is in format anymore, as it can never be a valid unit (this has already been checked)

@jorisvandenbossche But what about the case where the column consists of integers of the format YYYYMMDD or something similar? This is not a valid unit and has to be formatted using % (e.g. %Y%m%d).

If the format string contains % it means that the user knows something about the data and this knowledge has to be used.

If you specify format="%Y%m%d", the column will be parsed with that format in the snippet above (only the specific recognized units specifiers are passed to unit, otherwise format is used)

jreback · 2017-10-28T00:30:48Z

can you rebase and update according to comments

drorata · 2017-10-28T18:12:46Z

@jreback It's on my agenda and I'll do it as soon as I have time.

jorisvandenbossche · 2017-11-21T09:04:33Z

@drorata there went something wrong with updating the branch (see all the commits included here on github).
To fix this, doing exactly the following should normally work (assuming 'usptream' is pandas-dev/pandas, and 'origin' is drorata/pandas):

git pull upstream master
git push origin fix-17882 -f

…17855

jorisvandenbossche · 2017-11-21T12:15:22Z

That worked!
Can you further add some tests for this? (and update for my comment above? #17882 (comment))

Cleaned the fix and implemented tests

jorisvandenbossche

Looks good!

drorata · 2017-11-21T12:23:01Z

pandas/tests/io/test_sql.py

        assert issubclass(df.IntDateCol.dtype.type, np.datetime64)

+        df = sql.read_sql_query("SELECT * FROM types_test_data", self.conn,
+                                parse_dates={'IntDateOnlyCol': '%Y%m%d'})
+        assert issubclass(df.IntDateOnlyCol.dtype.type, np.datetime64)


@jreback I included a test. I also noticed that the existing tests are not enough. It might be that the parsing returns a NaT which satisfies the classing condition but the values are wrong. This is the reason I added an explicit test that checks that the resulting values are correct. As a matter of fact while implementing the improvement suggested by @jorisvandenbossche the tests passed but the returned values were NaTs (the reason was a missing , in the code @jorisvandenbossche suggested)

Would you agree there's a problem with the tests?

Yes, those are indeed not very thorough. Do you want to add such a similar check to the others as well?

@jorisvandenbossche Within this same PR?

I think that would be good yes (because as you said, with your changes you could actually silently 'break' them now)

@jorisvandenbossche I have included more explicit tests.

drorata · 2017-11-21T12:24:54Z

I guess before merging I still have to squash all commits, is that correct? What is the canonical way to do so?

jorisvandenbossche · 2017-11-21T12:25:44Z

I guess before merging I still have to squash all commits, is that correct? What is the canonical way to do so?

No need to squash in the PR. We will squash when merging (github provides that), so you can keep the history in the PR branch itself as it is.

drorata · 2017-11-21T12:49:29Z

Seems like some tests fail on ci/circleci. Why is it? Locally everything passes (or skipped which I assumed is OK)

Included explicit values comparison

jorisvandenbossche · 2017-11-21T13:43:07Z

The error on circle ci seems relevant, as something in the setup of sql tests is failing

jorisvandenbossche · 2017-11-21T13:44:34Z

The log says there is a sql syntax error:

E       sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'"IntDateOnlyCol" INTEGER,\n                    `FloatCol` DOUBLE,\n               \' at line 5') [SQL: 'CREATE TABLE types_test_data (\n                    `TextCol` TEXT,\n                    `DateCol` DATETIME,\n                    `IntDateCol` INTEGER,\n                    "IntDateOnlyCol" INTEGER,\n                    `FloatCol` DOUBLE,\n                    `IntCol` INTEGER,\n                    `BoolCol` BOOLEAN,\n                    `IntColWithNull` INTEGER,\n                    `BoolColWithNull` BOOLEAN\n                )']

../miniconda3/envs/pandas/lib/python3.6/site-packages/pymysql/err.py:107: ProgrammingError

jorisvandenbossche · 2017-11-21T13:45:02Z

pandas/tests/io/test_sql.py

@@ -98,6 +99,7 @@
                    `TextCol` TEXT,
                    `DateCol` DATETIME,
                    `IntDateCol` INTEGER,
+                    "IntDateOnlyCol" INTEGER,


you are using a different quoting here, that's probably the reason for the failure

Thanks! I'll fix it.

drorata · 2017-11-21T15:56:44Z

The travis test failed due to timeout (link). Any idea? @jorisvandenbossche @jreback

jorisvandenbossche · 2017-11-21T16:05:45Z

I restarted the build. All builds seemed rather slow. I don't directly see something in the diff here that could cause a slowdown, so maybe it is a travis issue. Let's see when it finishes.

drorata · 2017-11-21T18:28:42Z

Who can help me review the fail of AppVeyor's build?

jorisvandenbossche · 2017-11-22T10:58:58Z

It was also something unrelated I think (it couldn't reach github), so restarted the build

drorata · 2017-11-22T18:57:03Z

@gfyoung are you comfortable with this PR?

jorisvandenbossche · 2017-11-22T19:24:25Z

It seems that finally everything is green! (not sure why the CIs were so flaky here)

jorisvandenbossche · 2017-11-22T19:24:56Z

@drorata Thanks for the fix!

jorisvandenbossche · 2017-11-22T19:32:19Z

Ah, forgot to ask you to add a whatsnew, did that in cf90995

(cherry picked from commit cf90995)

(cherry picked from commit bc95629)

(cherry picked from commit cf90995)

(cherry picked from commit bc95629)

BUG: formating integers datetimes using sql GH17855

0498dd1

gfyoung added IO SQL to_sql, read_sql, read_sql_query Datetime Datetime data dtype labels Oct 16, 2017

gfyoung reviewed Oct 16, 2017

View reviewed changes

jorisvandenbossche added the Bug label Oct 16, 2017

drorata added 2 commits November 20, 2017 15:27

BUG: formating integers datetimes using sql GH17855

385ad78

Merge branch 'fix-17855' of github.com:drorata/pandas into fix-17855

e3a438b

Merge branch 'master' of git://github.com/pandas-dev/pandas into fix-…

26454e7

…17855

Improve fix as per discussion

102ab25

Cleaned the fix and implemented tests

jorisvandenbossche approved these changes Nov 21, 2017

View reviewed changes

drorata commented Nov 21, 2017

View reviewed changes

Improved testing

5b56686

Included explicit values comparison

jorisvandenbossche reviewed Nov 21, 2017

View reviewed changes

Fixed typo in SQL data setting

987f588

jorisvandenbossche merged commit bc95629 into pandas-dev:master Nov 22, 2017

jorisvandenbossche added this to the 0.21.1 milestone Nov 22, 2017

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Nov 22, 2017

DOC: add whatsnew for pandas-dev#17882

d602f86

jorisvandenbossche mentioned this pull request Nov 22, 2017

DOC: add whatsnew for #17882 #18433

Merged

jorisvandenbossche added a commit that referenced this pull request Nov 22, 2017

DOC: add whatsnew for #17882 (#18433)

cf90995

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

DOC: add whatsnew for pandas-dev#17882 (pandas-dev#18433)

542533a

(cherry picked from commit cf90995)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

BUG: formating integers datetimes using sql GH17855 (pandas-dev#17882)

be26b7e

(cherry picked from commit bc95629)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

DOC: add whatsnew for #17882 (#18433)

47b7e6a

(cherry picked from commit cf90995)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

BUG: formating integers datetimes using sql GH17855 (#17882)

7128cfa

(cherry picked from commit bc95629)

drorata deleted the fix-17855 branch January 19, 2018 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: formating integers datetimes using sql GH17855 #17882

BUG: formating integers datetimes using sql GH17855 #17882

drorata commented Oct 15, 2017

codecov bot commented Oct 15, 2017

codecov bot commented Oct 15, 2017 •

edited

Loading

gfyoung commented Oct 16, 2017

gfyoung Oct 16, 2017

jorisvandenbossche Oct 16, 2017

drorata Nov 21, 2017 •

edited

Loading

jorisvandenbossche Nov 21, 2017

jreback commented Oct 28, 2017

drorata commented Oct 28, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche left a comment

drorata Nov 21, 2017

jorisvandenbossche Nov 21, 2017

drorata Nov 21, 2017

jorisvandenbossche Nov 21, 2017

drorata Nov 21, 2017

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche Nov 21, 2017

drorata Nov 21, 2017

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 22, 2017

drorata commented Nov 22, 2017

jorisvandenbossche commented Nov 22, 2017

jorisvandenbossche commented Nov 22, 2017

jorisvandenbossche commented Nov 22, 2017

BUG: formating integers datetimes using sql GH17855 #17882

BUG: formating integers datetimes using sql GH17855 #17882

Conversation

drorata commented Oct 15, 2017

codecov bot commented Oct 15, 2017

Codecov Report

codecov bot commented Oct 15, 2017 • edited Loading

Codecov Report

gfyoung commented Oct 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drorata Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 28, 2017

drorata commented Oct 28, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 21, 2017

drorata commented Nov 21, 2017

jorisvandenbossche commented Nov 22, 2017

drorata commented Nov 22, 2017

jorisvandenbossche commented Nov 22, 2017

jorisvandenbossche commented Nov 22, 2017

jorisvandenbossche commented Nov 22, 2017

codecov bot commented Oct 15, 2017 •

edited

Loading

drorata Nov 21, 2017 •

edited

Loading