Skip to content

BUG: formating integers datetimes using sql GH17855 #17882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 22, 2017
6 changes: 5 additions & 1 deletion pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,11 @@ def _handle_date_column(col, utc=None, format=None):
issubclass(col.dtype.type, np.integer)):
# parse dates as timestamp
format = 's' if format is None else format
return to_datetime(col, errors='coerce', unit=format, utc=utc)
if '%' in format:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment as to why you are doing this logic branching (and reference issue number).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, I think we can do this a bit cleaner like this:

if format is None and (issubclass(col.dtype.type, np.floating) or
        issubclass(col.dtype.type, np.integer)):
    format = 's'

if format in ['D', 'd', 'h', 'm' 's', 'ms', 'us', 'ns']:
 return to_datetime(col, errors='coerce', unit=format, utc=utc)
elif is_datetime64tz_dtype(col):
    ...
else:
    return to_datetime(col, errors='coerce', format=format, utc=utc)

So first check for the specific case of numeric values and no format -> parse as seconds. Then the format arg is checked for all possible values for unit. Once this check is passed, we don't need to check if '%' is in format anymore, as it can never be a valid unit (this has already been checked)

Copy link
Contributor Author

@drorata drorata Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche But what about the case where the column consists of integers of the format YYYYMMDD or something similar? This is not a valid unit and has to be formatted using % (e.g. %Y%m%d).

If the format string contains % it means that the user knows something about the data and this knowledge has to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you specify format="%Y%m%d", the column will be parsed with that format in the snippet above (only the specific recognized units specifiers are passed to unit, otherwise format is used)

return to_datetime(
col, errors='coerce', format=format, utc=utc)
else:
return to_datetime(col, errors='coerce', unit=format, utc=utc)
elif is_datetime64tz_dtype(col):
# coerce to UTC timezone
# GH11216
Expand Down