Skip to content

BUG: fix skiprows callable infinite loop #45586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 6, 2022

Conversation

bram2000
Copy link
Contributor

@bram2000 bram2000 commented Jan 24, 2022

Not sure what to do about the whatsnew entry. Please advise (I've read the contributing doc but could not find it mentioned).

@bram2000
Copy link
Contributor Author

bram2000 commented Jan 24, 2022

I'm not sure why that "Test experimental data manager" test fails; it doesn't appear to be related to my code.

EDIT: Also I cannot run this test locally; it requires numba which is not compatible with my M1 CPU.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments

please add a whatsnew

@bram2000
Copy link
Contributor Author

Small comments

please add a whatsnew

Is the whatsnew process documented anywhere? I couldn't figure out what to do!

@jreback jreback added Bug IO CSV read_csv, to_csv IO Excel read_excel, to_excel labels Jan 24, 2022
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -227,6 +227,7 @@ MultiIndex
I/O
^^^
- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`)
- Bug in :meth:`read_excel` with `engine=python` results in an infinite loop with certain `skiprows` callables (:issue:`45585`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does engine=python mean? I do not think that python is valid. Please also add ticks around the string, e.g. engine="python"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it was suggested here #45586 (comment)

I've just removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let the reviewer resolve conversations

I think I get what jeff meant. read_csv with engine="python" and read_excel.

Please also add a test for read_csv

@jreback jreback added this to the 1.5 milestone Jan 28, 2022
@jreback
Copy link
Contributor

jreback commented Jan 28, 2022

@phofl @rhshadrach ok here?

@phofl
Copy link
Member

phofl commented Jan 28, 2022

Tests are failing.

  • adjustment of whatsnew
  • adding a test for read csv

are open points

@bram2000
Copy link
Contributor Author

bram2000 commented Jan 28, 2022

The merge of main seems to have broken something, I can no longer run my test...

> python -m pytest pandas/tests/io/excel/test_readers.py -vv -s
ImportError while loading conftest '/Users/bram/Code/nca/pandas/pandas/conftest.py'.
pandas/conftest.py:608: in <module>
    params=[
pandas/conftest.py:611: in <listcomp>
    if not isinstance(indices_dict[key], MultiIndex) and indices_dict[key].is_unique
pandas/_libs/properties.pyx:37: in pandas._libs.properties.CachedProperty.__get__
    val = self.fget(obj)
pandas/core/indexes/base.py:2297: in is_unique
    return self._engine.is_unique
pandas/_libs/properties.pyx:37: in pandas._libs.properties.CachedProperty.__get__
    val = self.fget(obj)
pandas/core/indexes/base.py:873: in _engine
    return libindex.ExtensionEngine(target_values)
E   AttributeError: module 'pandas._libs.index' has no attribute 'ExtensionEngine'```

@phofl
Copy link
Member

phofl commented Jan 28, 2022

You have to build pandas again
python setup.py build_ext --inplace -j 4

@pep8speaks
Copy link

pep8speaks commented Jan 28, 2022

Hello @bram2000! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-31 10:21:49 UTC

@bram2000
Copy link
Contributor Author

I've added a similar test for csv... it passes but it doesn't appear to go through my new code anyway. Does it still make sense to mention read_csv in the whatsnew?

@phofl
Copy link
Member

phofl commented Jan 28, 2022

No, don't have to mention in this case

pre-commit is failing

@bram2000
Copy link
Contributor Author

precommit is failing with the following, but I'm not sure why as I've copied the form of the other notes in the file.

rst ``code`` is two backticks...................................................................................Failed
- hook id: rst-backticks
- exit code: 1

doc/source/whatsnew/v1.5.0.rst:287:- Bug in :meth:`read_excel` results in an infinite loop with certain `skiprows` callables (:issue:`45585`)

@phofl
Copy link
Member

phofl commented Jan 28, 2022

:func:read_excel please

pre-commit is failing because of
skiprows -> skiprows

Edit: Two backticks before and after skiprows

@rhshadrach
Copy link
Member

rhshadrach commented Jan 28, 2022

I've added a similar test for csv... it passes but it doesn't appear to go through my new code anyway. Does it still make sense to mention read_csv in the whatsnew?

You have to add engine="python". When I do this, I'm seeing _next_line being hit.

@phofl

Edit: Two backticks before and after skiprows

I struggled with this before. In GitHub (and maybe elsewhere?), you can use two different number of backticks to make backticks show up, e.g. `backtick` or ``backtick``

@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

@bram2000 can you merge master and address comments

@bram2000
Copy link
Contributor Author

I've merged master.

Even if i manually add engine="python" to the read_excel call it still doesn't hit my debugger in the affected while loop. The test iterates through the four engine types anyway, so don't think it's necessary to explicitly set engine.

@rhshadrach
Copy link
Member

rhshadrach commented Jan 31, 2022

it still doesn't hit my debugger in the affected while loop

Ah, I see - yes, I'm seeing the same. However doesn't the same bug exist in L691 down below?

@bram2000
Copy link
Contributor Author

bram2000 commented Feb 1, 2022

it still doesn't hit my debugger in the affected while loop

Ah, I see - yes, I'm seeing the same. However doesn't the same bug exist in L691 down below?

No it doesn't seem to. I've tried various combinations of skiprow functions and I can't get it to fail.

@rhshadrach
Copy link
Member

@bram2000 - thanks for checking! Sorry for my confusion, it all makes sense now.

@rhshadrach
Copy link
Member

@jreback @phofl - good here?

@rhshadrach
Copy link
Member

@bram2000 - failure look unrelated. Can you merge main.

@mroeschke mroeschke merged commit de6b11d into pandas-dev:main Feb 6, 2022
@mroeschke
Copy link
Member

Thanks @bram2000

phofl pushed a commit to phofl/pandas that referenced this pull request Feb 14, 2022
* BUG: fix skiprows callable infinite loop

* BUG: seperate new test

* add whatsnew entry

* make note user-centric

* add engine

* move note to IO section

* modify comment

* add test for csv

* lint

* formatting

* fix whatsnew

* test not valid for pyxlsb

Co-authored-by: Jeff Reback <[email protected]>
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* BUG: fix skiprows callable infinite loop

* BUG: seperate new test

* add whatsnew entry

* make note user-centric

* add engine

* move note to IO section

* modify comment

* add test for csv

* lint

* formatting

* fix whatsnew

* test not valid for pyxlsb

Co-authored-by: Jeff Reback <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_excel skiprows callable can result in infinite loop
6 participants