Skip to content

BUG: fix datetimes.should_cache() error for deque (GH 29403) #29846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 15, 2019

Conversation

nathalier
Copy link
Contributor

@nathalier nathalier commented Nov 26, 2019

itertools.islice() should be used to get slice of a deque. itertools.islice() also can be used (and is efficient) for other collections. So unique(arg[:check_count]) was replaced with set(islice(arg, check_count)) which is also much faster on test data.

@pep8speaks
Copy link

pep8speaks commented Nov 26, 2019

Hello @nathalier! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-14 16:01:44 UTC

)
def test_no_slicing_errors_in_should_cache(listlike):
# GH 29403
assert tools.should_cache(listlike) is True
Copy link
Member

@mroeschke mroeschke Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, could you use the example in the original issues as the test? #29403 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke For this particular bug the example from the original issue could easily stop being indicative if the empirically chosen constants for caching are changed. Let's say if it increased from 50 to 55 or 70 or 100 then the original issue example won't run this code pass, whereas these tests will show that something changed as they would fail. Also they are faster than the original example. And as we definitely know what was the reason of the failure I think it's quite reasonable to use small and fast tests for should_cache() instead.
The original example can be added to ensure the deque works for datetimes. Would be grateful if you could advise to which test file I should add that. Can it be pandas/tests/arrays/test_datetimes.py ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke For this particular bug the example from the original issue could easily stop being indicative if the empirically chosen constants for caching are changed. Let's say if it increased from 50 to 55 or 70 or 100 then the original issue example won't run this code pass, whereas these tests will show that something changed as they would fail. Also they are faster than the original example. And as we definitely know what was the reason of the failure I think it's quite reasonable to use small and fast tests for should_cache() instead.
The original example can be added to ensure deque works for timeseries. Would be grateful if you could advise to which test file I should add that. Can it be pandas/tests/arrays/test_datetimes.py ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the threshold; we can keep the test you have and add the example in the original issue. I think the pandas/tests/arrays/test_datetimes.py is a good place for this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current location is good for this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I've found a very relevant test test_to_datetime_cache() where I included deque in the list of constructors. Actually, with this test, tests I added before for the bug are not much needed any more. What do you think - should I drop them (test_no_slicing_errors_in_should_cache() and test_to_datetime_from_deque()) now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay keeping both tests. These tests run relatively quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, these do look quite duplicative, though these are hard-coded to the results which is fine.

can you co-located these tests near the above one.

@mroeschke mroeschke added the Datetime Datetime data dtype label Nov 26, 2019
@jbrockmendel
Copy link
Member

pls run black pandas/tests/indexes/datetimes/test_tools.py and isort pandas/tests/indexes/datetimes/test_tools.py

@jreback jreback added this to the 1.0 milestone Dec 4, 2019
@jreback jreback added the Bug label Dec 4, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note (bug fixes / datetimes), otherwise lgtm. ping on green.

@nathalier nathalier force-pushed the gh-29403 branch 2 times, most recently from b0a9658 to ca98e77 Compare December 7, 2019 17:43
@nathalier
Copy link
Contributor Author

pls run black pandas/tests/indexes/datetimes/test_tools.py and isort pandas/tests/indexes/datetimes/test_tools.py

@jbrockmendel Thanks a lot for help!

@nathalier
Copy link
Contributor Author

@jreback Now it's green, but possibly some of added tests are redundant.

@@ -679,6 +679,8 @@ Datetimelike
- Bug in :meth:`Series.var` failing to raise ``TypeError`` when called with ``timedelta64[ns]`` dtype (:issue:`28289`)
- Bug in :meth:`DatetimeIndex.strftime` and :meth:`Series.dt.strftime` where ``NaT`` was converted to the string ``'NaT'`` instead of ``np.nan`` (:issue:`29578`)
- Bug in :attr:`Timestamp.resolution` being a property instead of a class attribute (:issue:`29910`)
- Bug in :func:`pandas.core.tools.datetimes.should_cache` raising ``TypeError`` when trying to get a slice of a deque (:issue:`29403`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reword to make more user facing? i.e. to_datetime failing for deques when using the cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint. Reworded.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, lgtm. ping on green.

@@ -679,6 +679,7 @@ Datetimelike
- Bug in :meth:`Series.var` failing to raise ``TypeError`` when called with ``timedelta64[ns]`` dtype (:issue:`28289`)
- Bug in :meth:`DatetimeIndex.strftime` and :meth:`Series.dt.strftime` where ``NaT`` was converted to the string ``'NaT'`` instead of ``np.nan`` (:issue:`29578`)
- Bug in :attr:`Timestamp.resolution` being a property instead of a class attribute (:issue:`29910`)
- Bug in :func:`pandas.core.tools.datetimes.should_cache` failing for `deques` with ``TypeError`` when using the cache (:issue:`29403`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this to to_datetime which is the user visibile function
when using cache=True (the default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Tests moved, whatsnew reworded. All tests have passed.

)
def test_no_slicing_errors_in_should_cache(listlike):
# GH 29403
assert tools.should_cache(listlike) is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, these do look quite duplicative, though these are hard-coded to the results which is fine.

can you co-located these tests near the above one.

itertools.islice() should be used to get slice of a deque. itertools.islice() also can be used (and is efficient) for other collections. So
unique(arg[:check_count])
was replaced with
set(islice(arg, check_count))
@jreback jreback merged commit 37dfcc1 into pandas-dev:master Dec 15, 2019
@jreback
Copy link
Contributor

jreback commented Dec 15, 2019

thanks @nathalier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.to_datetime should_cache errors with deque
5 participants