Skip to content

ENH: ISO8601-compliant datetime string conversion in iterrows() and Series construction. #19762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Feb 25, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
518ab47
add require_iso8601 parameter and documentation in dataframe method i…
minggli Feb 19, 2018
156adbb
remove blank line
minggli Feb 19, 2018
6d06cf1
expose require_iso8601 parameter
minggli Feb 19, 2018
f2617dd
expose require_iso8601 parameter
minggli Feb 19, 2018
09ae4e5
expose require_8601 parameter
minggli Feb 19, 2018
7ea24ec
remove redundant TODO
minggli Feb 19, 2018
fac665b
revert pandas.core.frame
minggli Feb 20, 2018
068fde2
revert pandas.core.series
minggli Feb 20, 2018
8ceeb62
update documentation for typo and versionadded tag
minggli Feb 20, 2018
d105732
change default behaviour to require iso8601 and revert unnecessary ch…
minggli Feb 20, 2018
26fd14f
add whatsnew documentation for `require_iso8601` parameter in to_date…
minggli Feb 20, 2018
ab5214a
new test case in test_maybe_infer_to_datetimelike for non-iso8601 str…
minggli Feb 20, 2018
37aa8dd
comment with issue number
minggli Feb 20, 2018
7d9b27d
Merge branch 'master' into bugfixs/19671
minggli Feb 20, 2018
389a9d9
example for to_datetime api
minggli Feb 21, 2018
959ae62
reference to iso8601 standard
minggli Feb 22, 2018
700fa38
blank line before issue comment
minggli Feb 22, 2018
f8159c2
test datetime require iso8601 parameter
minggli Feb 22, 2018
3708f4b
add wikipedia reference to ISO 8601 standard
minggli Feb 22, 2018
cb798d2
add wikipedia reference to ISO 8601 standard
minggli Feb 22, 2018
2e27f22
fix url
minggli Feb 22, 2018
75268a8
Merge branch 'master' into bugfixs/19671
minggli Feb 22, 2018
8384d5e
private argument _require_iso8601 and remove example and param doc
minggli Feb 23, 2018
1665922
remove whatsnew entry
minggli Feb 23, 2018
21f7c15
modified kwarg
minggli Feb 23, 2018
5dc7a37
modified kwarg
minggli Feb 23, 2018
f9240b5
use kwargs to hide require_iso8601
minggli Feb 23, 2018
6e67070
revert core.tools.datetimes
minggli Feb 24, 2018
9e6e2a7
remove test case
minggli Feb 24, 2018
27fdfac
replace to_datetime call with internal conversion func
minggli Feb 24, 2018
6aea33d
revert test_tools
minggli Feb 24, 2018
998920c
Merge branch 'master' into PR_TOOL_MERGE_PR_19762
jreback Feb 24, 2018
14946f8
use DTI constructor
jreback Feb 24, 2018
9e11b43
test case for issue 19671, iterrows
minggli Feb 24, 2018
2fe7057
using klass for construction
minggli Feb 25, 2018
910f759
test DataFrame only
minggli Feb 25, 2018
0b72b72
fix a typo
minggli Feb 25, 2018
acdec06
Merge branch 'master' into PR_TOOL_MERGE_PR_19762
jreback Feb 25, 2018
e69f4ab
correction
jreback Feb 25, 2018
5b12cfc
fix test_iterrows
minggli Feb 25, 2018
a9d85ae
Merge remote-tracking branch 'upstream/master' into bugfixs/19671
minggli Feb 25, 2018
a5a1f57
whatsnew entry
minggli Feb 25, 2018
793ea23
imperative xfail in test
minggli Feb 25, 2018
08d2718
doc
jreback Feb 25, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 15 additions & 7 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -860,7 +860,9 @@ def maybe_castable(arr):
return arr.dtype.name not in _POSSIBLY_CAST_DTYPES


def maybe_infer_to_datetimelike(value, convert_dates=False):
def maybe_infer_to_datetimelike(value,
convert_dates=False,
require_iso8601=False):
"""
we might have a array (or single object) that is datetime like,
and no dtype is passed don't change the value unless we find a
Expand All @@ -875,6 +877,8 @@ def maybe_infer_to_datetimelike(value, convert_dates=False):
convert_dates : boolean, default False
if True try really hard to convert dates (such as datetime.date), other
leave inferred dtype 'date' alone
require_iso8601 : boolean, default False
If True, only try to infer ISO8601-compliant datetime string.

"""

Expand All @@ -901,18 +905,19 @@ def maybe_infer_to_datetimelike(value, convert_dates=False):
if not len(v):
return value

def try_datetime(v):
def try_datetime(v, require_iso8601=require_iso8601):
# safe coerce to datetime64
try:
v = tslib.array_to_datetime(v, errors='raise')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont' need to add this require_iso8601 anywhere here, except for in the actual to_datetime() call, where it should be True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

v = tslib.array_to_datetime(v,
require_iso8601=require_iso8601,
errors='raise')
except ValueError:

# we might have a sequence of the same-datetimes with tz's
# if so coerce to a DatetimeIndex; if they are not the same,
# then these stay as object dtype
try:
from pandas import to_datetime
return to_datetime(v)
return to_datetime(v, require_iso8601=require_iso8601)
except Exception:
pass

Expand Down Expand Up @@ -957,7 +962,8 @@ def try_timedelta(v):
return value


def maybe_cast_to_datetime(value, dtype, errors='raise'):
def maybe_cast_to_datetime(value, dtype, require_iso8601=False,
errors='raise'):
""" try to cast the array/value to a datetimelike dtype, converting float
nan to iNaT
"""
Expand Down Expand Up @@ -1074,7 +1080,9 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
# conversion
elif not (is_array and not (issubclass(value.dtype.type, np.integer) or
value.dtype == np.object_)):
value = maybe_infer_to_datetimelike(value)
value = \
maybe_infer_to_datetimelike(value,
require_iso8601=require_iso8601)

return value

Expand Down
13 changes: 11 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -710,10 +710,16 @@ def iteritems(self):
for i, k in enumerate(self.columns):
yield k, self._ixs(i, axis=1)

def iterrows(self):
def iterrows(self, require_iso8601=False):
"""
Iterate over DataFrame rows as (index, Series) pairs.

Parameters
----------
require_iso8601 : boolean, default False
If True, only try to infer ISO8601-compliant datetime string in
iterated rows.

Notes
-----

Expand Down Expand Up @@ -755,7 +761,10 @@ def iterrows(self):
columns = self.columns
klass = self._constructor_sliced
for k, v in zip(self.index, self.values):
s = klass(v, index=columns, name=k)
s = klass(v,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to add this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

index=columns,
name=k,
require_iso8601=require_iso8601)
yield k, s

def itertuples(self, index=True, name="Pandas"):
Expand Down
21 changes: 14 additions & 7 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
'from_csv', 'valid'])

def __init__(self, data=None, index=None, dtype=None, name=None,
copy=False, fastpath=False):
copy=False, fastpath=False, require_iso8601=False):

# we are called internally, so short-circuit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this anywhere here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

if fastpath:
Expand Down Expand Up @@ -236,7 +236,8 @@ def __init__(self, data=None, index=None, dtype=None, name=None,
data = data.copy()
else:
data = _sanitize_array(data, index, dtype, copy,
raise_cast_failure=True)
raise_cast_failure=True,
require_iso8601=require_iso8601)

data = SingleBlockManager(data, index, fastpath=True)

Expand Down Expand Up @@ -3129,7 +3130,7 @@ def _sanitize_index(data, index, copy=False):


def _sanitize_array(data, index, dtype=None, copy=False,
raise_cast_failure=False):
raise_cast_failure=False, require_iso8601=False):
""" sanitize input data to an ndarray, copy if specified, coerce to the
dtype if specified
"""
Expand All @@ -3145,15 +3146,17 @@ def _sanitize_array(data, index, dtype=None, copy=False,
else:
data = data.copy()

def _try_cast(arr, take_fast_path):
def _try_cast(arr, take_fast_path, require_iso8601=require_iso8601):

# perf shortcut as this is the most common case
if take_fast_path:
if maybe_castable(arr) and not copy and dtype is None:
return arr

try:
subarr = maybe_cast_to_datetime(arr, dtype)
subarr = maybe_cast_to_datetime(arr,
dtype,
require_iso8601=require_iso8601)
if not is_extension_type(subarr):
subarr = np.array(subarr, dtype=dtype, copy=copy)
except (ValueError, TypeError):
Expand Down Expand Up @@ -3211,7 +3214,9 @@ def _try_cast(arr, take_fast_path):
else:
subarr = maybe_convert_platform(data)

subarr = maybe_cast_to_datetime(subarr, dtype)
subarr = maybe_cast_to_datetime(subarr,
dtype,
require_iso8601=require_iso8601)

elif isinstance(data, range):
# GH 16804
Expand All @@ -3233,7 +3238,9 @@ def _try_cast(arr, take_fast_path):
dtype, value = infer_dtype_from_scalar(value)
else:
# need to possibly convert the value here
value = maybe_cast_to_datetime(value, dtype)
value = maybe_cast_to_datetime(subarr,
dtype,
require_iso8601=require_iso8601)

subarr = construct_1d_arraylike_from_scalar(
value, len(index), dtype)
Expand Down
12 changes: 6 additions & 6 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ def _convert_and_box_cache(arg, cache_array, box, errors, name=None):

def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
utc=None, box=True, format=None, exact=True,
unit=None, infer_datetime_format=False, origin='unix',
cache=False):
unit=None, infer_datetime_format=False, require_iso8601=False,
origin='unix', cache=False):
"""
Convert argument to datetime.

Expand Down Expand Up @@ -167,6 +167,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
datetime strings, and if it can be inferred, switch to a faster
method of parsing them. In some cases this can increase the parsing
speed by ~5-10x.
require_iso8601 : boolean, default False
If True, only try to infer ISO8601-compliant datetime string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag (0.23.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string -> strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

origin : scalar, default is 'unix'
Define the reference date. The numeric values would be parsed as number
of units (defined by `unit`) since this reference date.
Expand Down Expand Up @@ -273,7 +275,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

tz = 'utc' if utc else None

def _convert_listlike(arg, box, format, name=None, tz=tz):
def _convert_listlike(arg, box, format, name=None, tz=tz,
require_iso8601=require_iso8601):

if isinstance(arg, (list, tuple)):
arg = np.array(arg, dtype='O')
Expand Down Expand Up @@ -313,11 +316,8 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
'1-d array, or Series')

arg = _ensure_object(arg)
require_iso8601 = False

if infer_datetime_format and format is None:
format = _guess_datetime_format_for_array(arg, dayfirst=dayfirst)

if format is not None:
# There is a special fast-path for iso8601 formatted
# datetime strings, so in those cases don't use the inferred
Expand Down