Skip to content

ENH: ISO8601-compliant datetime string conversion in iterrows() and Series construction. #19762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Feb 25, 2018

Conversation

minggli
Copy link
Contributor

@minggli minggli commented Feb 19, 2018

@pep8speaks
Copy link

pep8speaks commented Feb 19, 2018

Hello @minggli! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 25, 2018 at 23:05 Hours UTC

@@ -755,7 +761,10 @@ def iterrows(self):
columns = self.columns
klass = self._constructor_sliced
for k, v in zip(self.index, self.values):
s = klass(v, index=columns, name=k)
s = klass(v,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to add this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

# safe coerce to datetime64
try:
v = tslib.array_to_datetime(v, errors='raise')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont' need to add this require_iso8601 anywhere here, except for in the actual to_datetime() call, where it should be True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -146,7 +146,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
'from_csv', 'valid'])

def __init__(self, data=None, index=None, dtype=None, name=None,
copy=False, fastpath=False):
copy=False, fastpath=False, require_iso8601=False):

# we are called internally, so short-circuit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this anywhere here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

@@ -167,6 +167,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
datetime strings, and if it can be inferred, switch to a faster
method of parsing them. In some cases this can increase the parsing
speed by ~5-10x.
require_iso8601 : boolean, default False
If True, only try to infer ISO8601-compliant datetime string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag (0.23.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string -> strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback added Datetime Datetime data dtype Compat pandas objects compatability with Numpy or Python functions labels Feb 19, 2018
@codecov
Copy link

codecov bot commented Feb 20, 2018

Codecov Report

Merging #19762 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19762      +/-   ##
==========================================
+ Coverage   91.67%   91.69%   +0.02%     
==========================================
  Files         150      150              
  Lines       48936    48938       +2     
==========================================
+ Hits        44860    44872      +12     
+ Misses       4076     4066      -10
Flag Coverage Δ
#multiple 90.06% <100%> (+0.02%) ⬆️
#single 41.81% <83.33%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals.py 95.53% <ø> (ø) ⬆️
pandas/core/dtypes/cast.py 87.68% <100%> (-0.3%) ⬇️
pandas/plotting/_converter.py 66.95% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f1dfa7...08d2718. Read the comment docs.

@minggli
Copy link
Contributor Author

minggli commented Feb 20, 2018

@jreback changes implemented and added test and whatsnew doc.

@@ -299,6 +299,9 @@ def test_maybe_infer_to_datetimelike(self):
result = DataFrame(np.array([[NaT, 'a', 0],
[NaT, 'b', 1]]))
assert result.size == 6
# GH19671
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ican you make a new test in pandas/tests/indexes/datetimes/test_tools.py there are a ton of tests already there, see if you can find a good place.

ok to leave this on as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -299,6 +299,9 @@ def test_maybe_infer_to_datetimelike(self):
result = DataFrame(np.array([[NaT, 'a', 0],
[NaT, 'b', 1]]))
assert result.size == 6
# GH19671
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line before the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

0 1809-01-01
1 1701-01-01
2 2013-01-01
dtype: datetime64[ns]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can basically add these examples as tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -167,6 +167,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
datetime strings, and if it can be inferred, switch to a faster
method of parsing them. In some cases this can increase the parsing
speed by ~5-10x.
require_iso8601 : boolean, default False
If True, only try to infer ISO8601-compliant datetime strings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a reference to ISO8601 (prob from wikipedia)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jorisvandenbossche jorisvandenbossche changed the title [19671] Expose require_iso8601 flag to specify ISO8601-compliant datetime string conversion. ENH: Expose require_iso8601 flag to specify ISO8601-compliant datetime string conversion Feb 22, 2018
@jorisvandenbossche
Copy link
Member

General question: is this important enough to add this to the public API? (the user can already specify the format, although that might not be as flexible as such a keyword)
Because if we need it internally for eg iterrows, we can keep the keyword internal to solve that specific problem without exposing it in the public to_datetime ?

Further, it does not seem to fully work:

In [15]: pd.to_datetime("2018/02/22 12:14:23", require_iso8601=True)
Out[15]: Timestamp('2018-02-22 12:14:23')

The above is not a ISO8601 format, so should not be allowed. If we make the keyword public, the name should reflect what it does correctly (for internal usage the above might be OK?)

@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

from @jorisvandenbossche comments: #19762 (comment)

ok to add this as a private variable instead, we don't really have a convention for this, maybe use
_require_iso8601=False as the arg

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

I changed the construction logic slightly. can you also add the original test (e.g .the iterrows) example?

@minggli
Copy link
Contributor Author

minggli commented Feb 24, 2018

@jreback done.

@minggli
Copy link
Contributor Author

minggli commented Feb 25, 2018

@jreback SparseDataFrame.values seem to have lost dtype along the way, breaking test_iterrows(). Dense dataframe (s.to_dense().values) doesn't have this problem. is it normal?

s = SparseDataFrame(
... {'non_iso8601': ['M1701', 'M1802', 'M1903'],
... 'iso8601': to_datetime(['2016-09-01', '2017-01-01', '2018-01-23'])})

s.values
array([[1472688000000000000, 'M1701'],
[1483228800000000000, 'M1802'],
[1516665600000000000, 'M1903']], dtype=object)

s.to_dense().values
array([[Timestamp('2016-09-01 00:00:00'), 'M1701'],
[Timestamp('2017-01-01 00:00:00'), 'M1802'],
[Timestamp('2018-01-23 00:00:00'), 'M1903']], dtype=object)

s._data.as_array()
array([[1472688000000000000, 1483228800000000000, 1516665600000000000],
['M1701', 'M1802', 'M1903']], dtype=object)

@minggli minggli changed the title ENH: Expose require_iso8601 flag to specify ISO8601-compliant datetime string conversion ENH: ISO8601-compliant datetime string conversion in iterrows() and Series construction. Feb 25, 2018
@minggli
Copy link
Contributor Author

minggli commented Feb 25, 2018

looked into above, but SparseFrame doesn't have the granularity that dense DataFrame has in terms of handling different block types.

DatetimeLikeBlockMixin provides Timestamp, Timedelta constructions in Datetimelike Blocks.

This is a pre-existing condition which I think is outside the scope of this PR. moved test case for DataFrame only.

@jreback
Copy link
Contributor

jreback commented Feb 25, 2018

@minggli code/tests lgtm. I just pushed a tiny update. Can you add a whatsnew note (bug fix, reshaping). ping on green.

@jreback jreback added this to the 0.23.0 milestone Feb 25, 2018
@minggli
Copy link
Contributor Author

minggli commented Feb 25, 2018

Hi @jreback, thanks for the update. I initially put test in SharedwithSparse like you just did and it fails on tests.sparse.frame due to reasons mentioned above.

@jreback
Copy link
Contributor

jreback commented Feb 25, 2018

Hi @jreback, thanks for the update. I initially put test in SharedwithSparse like you just did and it fails on tests.sparse.frame due to reasons mentioned above.

ahh ok. can you put an xfail on it then for the Sparse (I think there are some examples like that)

@minggli minggli force-pushed the bugfixs/19671 branch 2 times, most recently from 7c7e87c to a9d85ae Compare February 25, 2018 19:43
@@ -214,6 +214,18 @@ def test_iterrows(self):
exp = self.mixed_frame.loc[k]
self._assert_series_equal(v, exp)

s = self.klass(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a separate test.

change logic to be

if isinstance(self.klass, SparseDataFrame):
    pytest.xfail("....give a nice message here")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might also be able to use the decorator form (which is preferred),e .g.

@pytest.mark.xfail(isinstance(klass, SparseDataFrame), reason=''.....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried xfail decorator but because this test on SparseDataFrame is inherited from SharedWithSparse, not sure how to specify the boolean condition in tests/frame/test_api.py unless it's located in tests/sparse/frame/test_frame.py where klass is declared as SparseDataFrame.

no problem using xfail imperatively inside test.

@jreback jreback merged commit d40fb54 into pandas-dev:master Feb 25, 2018
@jreback
Copy link
Contributor

jreback commented Feb 25, 2018

thanks @minggli very nice! keep em coming!

@minggli minggli deleted the bugfixs/19671 branch February 25, 2018 23:08
@minggli
Copy link
Contributor Author

minggli commented Feb 25, 2018

happy to help!!

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants