-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: ISO8601-compliant datetime string conversion in iterrows()
and Series construction.
#19762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @minggli! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 25, 2018 at 23:05 Hours UTC |
pandas/core/frame.py
Outdated
@@ -755,7 +761,10 @@ def iterrows(self): | |||
columns = self.columns | |||
klass = self._constructor_sliced | |||
for k, v in zip(self.index, self.values): | |||
s = klass(v, index=columns, name=k) | |||
s = klass(v, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to add this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
pandas/core/dtypes/cast.py
Outdated
# safe coerce to datetime64 | ||
try: | ||
v = tslib.array_to_datetime(v, errors='raise') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you dont' need to add this require_iso8601 anywhere here, except for in the actual to_datetime()
call, where it should be True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pandas/core/series.py
Outdated
@@ -146,7 +146,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame): | |||
'from_csv', 'valid']) | |||
|
|||
def __init__(self, data=None, index=None, dtype=None, name=None, | |||
copy=False, fastpath=False): | |||
copy=False, fastpath=False, require_iso8601=False): | |||
|
|||
# we are called internally, so short-circuit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this anywhere here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
pandas/core/tools/datetimes.py
Outdated
@@ -167,6 +167,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, | |||
datetime strings, and if it can be inferred, switch to a faster | |||
method of parsing them. In some cases this can increase the parsing | |||
speed by ~5-10x. | |||
require_iso8601 : boolean, default False | |||
If True, only try to infer ISO8601-compliant datetime string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded tag (0.23.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string -> strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…ing conversion during construction
Codecov Report
@@ Coverage Diff @@
## master #19762 +/- ##
==========================================
+ Coverage 91.67% 91.69% +0.02%
==========================================
Files 150 150
Lines 48936 48938 +2
==========================================
+ Hits 44860 44872 +12
+ Misses 4076 4066 -10
Continue to review full report at Codecov.
|
@jreback changes implemented and added test and whatsnew doc. |
pandas/tests/dtypes/test_cast.py
Outdated
@@ -299,6 +299,9 @@ def test_maybe_infer_to_datetimelike(self): | |||
result = DataFrame(np.array([[NaT, 'a', 0], | |||
[NaT, 'b', 1]])) | |||
assert result.size == 6 | |||
# GH19671 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ican you make a new test in pandas/tests/indexes/datetimes/test_tools.py there are a ton of tests already there, see if you can find a good place.
ok to leave this on as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/dtypes/test_cast.py
Outdated
@@ -299,6 +299,9 @@ def test_maybe_infer_to_datetimelike(self): | |||
result = DataFrame(np.array([[NaT, 'a', 0], | |||
[NaT, 'b', 1]])) | |||
assert result.size == 6 | |||
# GH19671 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a blank line before the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/tools/datetimes.py
Outdated
0 1809-01-01 | ||
1 1701-01-01 | ||
2 2013-01-01 | ||
dtype: datetime64[ns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can basically add these examples as tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/tools/datetimes.py
Outdated
@@ -167,6 +167,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, | |||
datetime strings, and if it can be inferred, switch to a faster | |||
method of parsing them. In some cases this can increase the parsing | |||
speed by ~5-10x. | |||
require_iso8601 : boolean, default False | |||
If True, only try to infer ISO8601-compliant datetime strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a reference to ISO8601 (prob from wikipedia)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
require_iso8601
flag to specify ISO8601-compliant datetime string conversion. require_iso8601
flag to specify ISO8601-compliant datetime string conversion
General question: is this important enough to add this to the public API? (the user can already specify the format, although that might not be as flexible as such a keyword) Further, it does not seem to fully work:
The above is not a ISO8601 format, so should not be allowed. If we make the keyword public, the name should reflect what it does correctly (for internal usage the above might be OK?) |
from @jorisvandenbossche comments: #19762 (comment) ok to add this as a private variable instead, we don't really have a convention for this, maybe use |
I changed the construction logic slightly. can you also add the original test (e.g .the iterrows) example? |
@jreback done. |
42d0874
to
2fe7057
Compare
@jreback SparseDataFrame.values seem to have lost dtype along the way, breaking test_iterrows(). Dense dataframe (s.to_dense().values) doesn't have this problem. is it normal?
s._data.as_array() |
require_iso8601
flag to specify ISO8601-compliant datetime string conversioniterrows()
and Series construction.
looked into above, but SparseFrame doesn't have the granularity that dense DataFrame has in terms of handling different block types. DatetimeLikeBlockMixin provides Timestamp, Timedelta constructions in Datetimelike Blocks. This is a pre-existing condition which I think is outside the scope of this PR. moved test case for DataFrame only. |
@minggli code/tests lgtm. I just pushed a tiny update. Can you add a whatsnew note (bug fix, reshaping). ping on green. |
Hi @jreback, thanks for the update. I initially put test in SharedwithSparse like you just did and it fails on tests.sparse.frame due to reasons mentioned above. |
ahh ok. can you put an xfail on it then for the Sparse (I think there are some examples like that) |
7c7e87c
to
a9d85ae
Compare
pandas/tests/frame/test_api.py
Outdated
@@ -214,6 +214,18 @@ def test_iterrows(self): | |||
exp = self.mixed_frame.loc[k] | |||
self._assert_series_equal(v, exp) | |||
|
|||
s = self.klass( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a separate test.
change logic to be
if isinstance(self.klass, SparseDataFrame):
pytest.xfail("....give a nice message here")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might also be able to use the decorator form (which is preferred),e .g.
@pytest.mark.xfail(isinstance(klass, SparseDataFrame), reason=''.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried xfail decorator but because this test on SparseDataFrame is inherited from SharedWithSparse, not sure how to specify the boolean condition in tests/frame/test_api.py unless it's located in tests/sparse/frame/test_frame.py where klass is declared as SparseDataFrame.
no problem using xfail imperatively inside test.
thanks @minggli very nice! keep em coming! |
happy to help!! |
… Series construction. (pandas-dev#19762)
git diff upstream/master -u -- "*.py" | flake8 --diff