Skip to content

BUG: Use isinstance for is_iterator on Cygwin #46477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

DWesl
Copy link

@DWesl DWesl commented Mar 22, 2022

Fixes #45158

Keep original implementation if not on Cygwin, and choose the implementation at compile time to avoid performance hits from the extra branch.

Test results on Cygwin are available are available here

Fixes pandas-dev#45158 

Keep original implementation if not on Cygwin, and choose the implementation at compile time to avoid performance hits from the extra branch.

[Test results on Cygwin are available are available here](https://github.com/DWesl/pandas/runs/5648361139?check_suite_focus=true)
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not really sure what you are trying to fix here, and how this accomplishes it.

@DWesl
Copy link
Author

DWesl commented Mar 23, 2022

As noted in #45158, pandas._libs.lib.is_iterator doesn't agree with list, next, or collections.abs.Iterator on Cygwin.

>>> import pandas as pd
>>> stamp = pd.Timestamp("2010-01-03T07:23:48")
>>> pd._libs.lib.is_iterator(stamp)
True
>>> list(stamp)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Timestamp' object is not iterable
>>> pd.__version__
'1.3.4'
>>> import collections.abc
>>> isinstance(stamp, collections.abc.Iterator)
False
>>> next(stamp)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Timestamp' object is not an iterator

Since there are multiple places in the code-base that apply list to iterators (several of them in the indexing code), this causes rather a lot of crashes in code that works on other platforms. I would like those to stop crashing, so I can use recent pandas with DatetimeIndexes on my laptop.

pandas._libs.lib.is_instance relies on PyIter_Check, so this is likely a bug in python/cpython; however, I don't know how to test changes to CPython, but I can make a workaround for pandas and show that it works. To see what's broken in the current version of pandas, you can check the log outputs here as noted in the discussion on #45158. I can't figure out the bug-report process on the CPython issue tracker, and don't really feel like creating an account and learning the interface for a website that won't be active in two days when I can just wait two weeks and report it on their GitHub.

As for how this works, the

IF ...:
    ...
ELSE:
    ...

construct here is a Cython compile-time conditional that "works in a similar way to the #if preprocessor directive in C". The way I used it here means that, on Cygwin, this code will use collections.abc.Iterable's definition of an iterable, which agrees with that of list on Cygwin (see the first code block in this comment, above), which is most of what pandas uses it for. On other platforms, the PyIter_Check definition will remain, since it agrees with list there and presumably there's some speed benefit to a C macro call over a few Python calls (at least, I suspect this is why this function was added). That is to say, on platforms other than Cygwin (Windows, MSYS, WSL, OSX, Linux, BSD, ...) this should do nothing (besides add a little bit of compile time; the CI tests should check if the behavior changed), while on Cygwin it will change to an implementation that agrees with the functions assumed to agree with it (see the first code block in this comment, copied from #45158, and note that the error in the test logs from a test run without this patch does not occur in a test run with this patch).

@jreback
Copy link
Contributor

jreback commented Mar 23, 2022

@DWesl i get your explanation. We don't have any compile time defs like this for cygwin, nor do we test for it at all.

This really should be fixed upstream.

@DWesl
Copy link
Author

DWesl commented Mar 23, 2022

Alternately, you could just declare that Cygwin is an unsupported platform and close as WONTFIX, then perhaps suggest that anyone who needs pandas on Cygwin should use a version from before #31294 went in (probably pandas<1.0.0) which just happened to work, or fix the underlying CPython bug on Cygwin.

EDIT: you replied before I finished. I'll report it to CPython once the issue tracker migration is complete.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 23, 2022
@mroeschke
Copy link
Member

Thanks for the PR, but agreed that this should probably be fixed upstream. Since this is unlikely to move forward. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd._libs.lib.is_iterator reports Timestamp as iterable on Cygwin
3 participants