ENH: ISO8601-compliant datetime string conversion in `iterrows()` and Series construction. #19762

minggli · 2018-02-19T09:19:19Z

closes auto convert from string to datetime64 in iterrows. #19671
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…terrows

pep8speaks · 2018-02-19T09:19:25Z

Hello @minggli! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 25, 2018 at 23:05 Hours UTC

jreback · 2018-02-19T15:46:20Z

pandas/core/frame.py

@@ -755,7 +761,10 @@ def iterrows(self):
        columns = self.columns
        klass = self._constructor_sliced
        for k, v in zip(self.index, self.values):
-            s = klass(v, index=columns, name=k)
+            s = klass(v,


you don't need to add this here

jreback · 2018-02-19T15:46:47Z

pandas/core/dtypes/cast.py

        # safe coerce to datetime64
        try:
-            v = tslib.array_to_datetime(v, errors='raise')


you dont' need to add this require_iso8601 anywhere here, except for in the actual to_datetime() call, where it should be True.

jreback · 2018-02-19T15:47:03Z

pandas/core/series.py

@@ -146,7 +146,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
         'from_csv', 'valid'])

    def __init__(self, data=None, index=None, dtype=None, name=None,
-                 copy=False, fastpath=False):
+                 copy=False, fastpath=False, require_iso8601=False):

        # we are called internally, so short-circuit


you don't need this anywhere here

jreback · 2018-02-19T15:47:29Z

pandas/core/tools/datetimes.py

@@ -167,6 +167,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
        datetime strings, and if it can be inferred, switch to a faster
        method of parsing them. In some cases this can increase the parsing
        speed by ~5-10x.
+    require_iso8601 : boolean, default False
+        If True, only try to infer ISO8601-compliant datetime string.


add a versionadded tag (0.23.0)

string -> strings

…anges

…time

…ing conversion during construction

codecov · 2018-02-20T19:59:15Z

Codecov Report

Merging #19762 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19762      +/-   ##
==========================================
+ Coverage   91.67%   91.69%   +0.02%     
==========================================
  Files         150      150              
  Lines       48936    48938       +2     
==========================================
+ Hits        44860    44872      +12     
+ Misses       4076     4066      -10

Flag	Coverage Δ
#multiple	`90.06% <100%> (+0.02%)`	⬆️
#single	`41.81% <83.33%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/internals.py	`95.53% <ø> (ø)`	⬆️
pandas/core/dtypes/cast.py	`87.68% <100%> (-0.3%)`	⬇️
pandas/plotting/_converter.py	`66.95% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f1dfa7...08d2718. Read the comment docs.

minggli · 2018-02-20T20:57:24Z

@jreback changes implemented and added test and whatsnew doc.

jreback · 2018-02-22T00:23:29Z

pandas/tests/dtypes/test_cast.py

@@ -299,6 +299,9 @@ def test_maybe_infer_to_datetimelike(self):
        result = DataFrame(np.array([[NaT, 'a', 0],
                                     [NaT, 'b', 1]]))
        assert result.size == 6
+        # GH19671


ican you make a new test in pandas/tests/indexes/datetimes/test_tools.py there are a ton of tests already there, see if you can find a good place.

ok to leave this on as well.

jreback · 2018-02-22T00:23:40Z

pandas/tests/dtypes/test_cast.py

@@ -299,6 +299,9 @@ def test_maybe_infer_to_datetimelike(self):
        result = DataFrame(np.array([[NaT, 'a', 0],
                                     [NaT, 'b', 1]]))
        assert result.size == 6
+        # GH19671


add a blank line before the comment

jreback · 2018-02-22T00:24:15Z

pandas/core/tools/datetimes.py

+    0   1809-01-01
+    1   1701-01-01
+    2   2013-01-01
+    dtype: datetime64[ns]


you can basically add these examples as tests

jreback · 2018-02-22T00:25:13Z

pandas/core/tools/datetimes.py

@@ -167,6 +167,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
        datetime strings, and if it can be inferred, switch to a faster
        method of parsing them. In some cases this can increase the parsing
        speed by ~5-10x.
+    require_iso8601 : boolean, default False
+        If True, only try to infer ISO8601-compliant datetime strings.


can you add a reference to ISO8601 (prob from wikipedia)

jorisvandenbossche · 2018-02-22T14:04:42Z

General question: is this important enough to add this to the public API? (the user can already specify the format, although that might not be as flexible as such a keyword)
Because if we need it internally for eg iterrows, we can keep the keyword internal to solve that specific problem without exposing it in the public to_datetime ?

Further, it does not seem to fully work:

In [15]: pd.to_datetime("2018/02/22 12:14:23", require_iso8601=True)
Out[15]: Timestamp('2018-02-22 12:14:23')

The above is not a ISO8601 format, so should not be allowed. If we make the keyword public, the name should reflect what it does correctly (for internal usage the above might be OK?)

jreback · 2018-02-22T14:59:27Z

from @jorisvandenbossche comments: #19762 (comment)

ok to add this as a private variable instead, we don't really have a convention for this, maybe use
_require_iso8601=False as the arg

jreback · 2018-02-24T20:35:48Z

I changed the construction logic slightly. can you also add the original test (e.g .the iterrows) example?

minggli · 2018-02-24T22:00:18Z

@jreback done.

minggli · 2018-02-25T11:36:58Z

@jreback SparseDataFrame.values seem to have lost dtype along the way, breaking test_iterrows(). Dense dataframe (s.to_dense().values) doesn't have this problem. is it normal?

s = SparseDataFrame(
... {'non_iso8601': ['M1701', 'M1802', 'M1903'],
... 'iso8601': to_datetime(['2016-09-01', '2017-01-01', '2018-01-23'])})

s.values
array([[1472688000000000000, 'M1701'],
[1483228800000000000, 'M1802'],
[1516665600000000000, 'M1903']], dtype=object)

s.to_dense().values
array([[Timestamp('2016-09-01 00:00:00'), 'M1701'],
[Timestamp('2017-01-01 00:00:00'), 'M1802'],
[Timestamp('2018-01-23 00:00:00'), 'M1903']], dtype=object)

s._data.as_array()
array([[1472688000000000000, 1483228800000000000, 1516665600000000000],
['M1701', 'M1802', 'M1903']], dtype=object)

minggli · 2018-02-25T14:51:47Z

looked into above, but SparseFrame doesn't have the granularity that dense DataFrame has in terms of handling different block types.

DatetimeLikeBlockMixin provides Timestamp, Timedelta constructions in Datetimelike Blocks.

This is a pre-existing condition which I think is outside the scope of this PR. moved test case for DataFrame only.

jreback · 2018-02-25T15:35:59Z

@minggli code/tests lgtm. I just pushed a tiny update. Can you add a whatsnew note (bug fix, reshaping). ping on green.

minggli · 2018-02-25T15:50:01Z

Hi @jreback, thanks for the update. I initially put test in SharedwithSparse like you just did and it fails on tests.sparse.frame due to reasons mentioned above.

jreback · 2018-02-25T16:04:08Z

Hi @jreback, thanks for the update. I initially put test in SharedwithSparse like you just did and it fails on tests.sparse.frame due to reasons mentioned above.

ahh ok. can you put an xfail on it then for the Sparse (I think there are some examples like that)

jreback · 2018-02-25T21:17:38Z

pandas/tests/frame/test_api.py

@@ -214,6 +214,18 @@ def test_iterrows(self):
            exp = self.mixed_frame.loc[k]
            self._assert_series_equal(v, exp)

+        s = self.klass(


make this a separate test.

change logic to be

if isinstance(self.klass, SparseDataFrame): pytest.xfail("....give a nice message here")

you might also be able to use the decorator form (which is preferred),e .g.

@pytest.mark.xfail(isinstance(klass, SparseDataFrame), reason=''.....

I've tried xfail decorator but because this test on SparseDataFrame is inherited from SharedWithSparse, not sure how to specify the boolean condition in tests/frame/test_api.py unless it's located in tests/sparse/frame/test_frame.py where klass is declared as SparseDataFrame.

no problem using xfail imperatively inside test.

jreback · 2018-02-25T23:05:34Z

thanks @minggli very nice! keep em coming!

minggli · 2018-02-25T23:17:37Z

happy to help!!

… Series construction. (pandas-dev#19762)

minggli added 5 commits February 19, 2018 16:12

add require_iso8601 parameter and documentation in dataframe method i…

518ab47

…terrows

remove blank line

156adbb

expose require_iso8601 parameter

6d06cf1

expose require_iso8601 parameter

f2617dd

expose require_8601 parameter

09ae4e5

remove redundant TODO

7ea24ec

jreback requested changes Feb 19, 2018

View reviewed changes

jreback added Datetime Datetime data dtype Compat pandas objects compatability with Numpy or Python functions labels Feb 19, 2018

minggli added 7 commits February 21, 2018 02:20

revert pandas.core.frame

fac665b

revert pandas.core.series

068fde2

update documentation for typo and versionadded tag

8ceeb62

change default behaviour to require iso8601 and revert unnecessary ch…

d105732

…anges

add whatsnew documentation for require_iso8601 parameter in to_date…

26fd14f

…time

new test case in test_maybe_infer_to_datetimelike for non-iso8601 str…

ab5214a

…ing conversion during construction

comment with issue number

37aa8dd

Merge branch 'master' into bugfixs/19671

7d9b27d

example for to_datetime api

389a9d9

jreback requested changes Feb 22, 2018

View reviewed changes

jreback reviewed Feb 22, 2018

View reviewed changes

jorisvandenbossche changed the title ~~[19671] Expose require_iso8601 flag to specify ISO8601-compliant datetime string conversion.~~ ENH: Expose require_iso8601 flag to specify ISO8601-compliant datetime string conversion Feb 22, 2018

reference to iso8601 standard

959ae62

minggli added 2 commits February 22, 2018 22:20

blank line before issue comment

700fa38

test datetime require iso8601 parameter

f8159c2

test case for issue 19671, iterrows

9e11b43

using klass for construction

2fe7057

minggli force-pushed the bugfixs/19671 branch from 42d0874 to 2fe7057 Compare February 25, 2018 10:37

minggli added 2 commits February 25, 2018 14:28

test DataFrame only

910f759

fix a typo

0b72b72

minggli changed the title ~~ENH: Expose require_iso8601 flag to specify ISO8601-compliant datetime string conversion~~ ENH: ISO8601-compliant datetime string conversion in iterrows() and Series construction. Feb 25, 2018

jreback added 2 commits February 25, 2018 10:33

Merge branch 'master' into PR_TOOL_MERGE_PR_19762

acdec06

correction

e69f4ab

jreback added this to the 0.23.0 milestone Feb 25, 2018

minggli added 2 commits February 25, 2018 19:38

fix test_iterrows

5b12cfc

Merge remote-tracking branch 'upstream/master' into bugfixs/19671

a9d85ae

minggli force-pushed the bugfixs/19671 branch 2 times, most recently from 7c7e87c to a9d85ae Compare February 25, 2018 19:43

whatsnew entry

a5a1f57

jreback requested changes Feb 25, 2018

View reviewed changes

minggli and others added 2 commits February 25, 2018 21:43

imperative xfail in test

793ea23

doc

08d2718

jreback approved these changes Feb 25, 2018

View reviewed changes

jreback merged commit d40fb54 into pandas-dev:master Feb 25, 2018

minggli deleted the bugfixs/19671 branch February 25, 2018 23:08

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

ENH: ISO8601-compliant datetime string conversion in iterrows() and…

b9149b0

… Series construction. (pandas-dev#19762)

Uh oh!

ENH: ISO8601-compliant datetime string conversion in iterrows() and Series construction. #19762

ENH: ISO8601-compliant datetime string conversion in iterrows() and Series construction. #19762

Uh oh!

Conversation

minggli commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on February 25, 2018 at 23:05 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

minggli commented Feb 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Feb 22, 2018

Uh oh!

jreback commented Feb 22, 2018

Uh oh!

jreback commented Feb 24, 2018

Uh oh!

minggli commented Feb 24, 2018

Uh oh!

minggli commented Feb 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minggli commented Feb 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Feb 25, 2018

Uh oh!

minggli commented Feb 25, 2018

Uh oh!

jreback commented Feb 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Feb 25, 2018

Uh oh!

ENH: ISO8601-compliant datetime string conversion in `iterrows()` and Series construction. #19762

ENH: ISO8601-compliant datetime string conversion in `iterrows()` and Series construction. #19762

minggli commented Feb 19, 2018 •

edited

Loading

pep8speaks commented Feb 19, 2018 •

edited

Loading

codecov bot commented Feb 20, 2018 •

edited

Loading

minggli commented Feb 25, 2018 •

edited

Loading

minggli commented Feb 25, 2018 •

edited

Loading