Skip to content

TST: Refactor slow tests #53891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 28, 2023
Merged

TST: Refactor slow tests #53891

merged 5 commits into from
Jun 28, 2023

Conversation

mroeschke
Copy link
Member

No description provided.

@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Jun 27, 2023
@mroeschke mroeschke added this to the 2.1 milestone Jun 27, 2023
@mroeschke mroeschke requested a review from WillAyd as a code owner June 27, 2023 22:04
@@ -44,32 +44,6 @@ def test_buffer_overflow(c_parser_only, malformed):
parser.read_csv(StringIO(malformed))


def test_buffer_rd_bytes(c_parser_only):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the original issue, this sounded like a PY2 bug specifically so I don't think this needs testing anymore

@@ -118,6 +118,8 @@ cdef:
float64_t NEGINF = -INF
int64_t DEFAULT_CHUNKSIZE = 256 * 1024

DEFAULT_BUFFER_HEURISTIC = 2 ** 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be set as a property of the TextReader? It is a bit ambiguous in a pyx file but in the global namespace all caps I would expect this to be a compile time constant; attaching it as a property makes things clearer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if this is just for testing maybe you can patch buffer_lines directly? The naming here is a bit unclear when scoped outside of the initializer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the same concern about DEFAULT_CHUNKSIZE above too?

Ideally I think these magic numbers should at least be made obvious so they can be configured or removed #53781 and it being in the TextReader would make that less obvious?

Copy link
Member

@WillAyd WillAyd Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am mistaken with how Cython generates the code comparing this to DEFAULT_CHUNKSIZE is exactly the problem; that is a compile time constant whereas this value can be modified at runtime, but they both look the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you suggest how I could make DEFAULT_BUFFER_HEURISTIC a property on the cdef TextReader class? I'm having trouble defining it in a way that I could monkeypatch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm does declaring it cpdef help at all? Not worth going down a rabbit hole over if its a hold up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my unsuccessful attempts so far just to double check

    cdef public:
        int64_t leading_cols, table_width, DEFAULT_BUFFER_HEURISTIC=2**20
                                          ^
------------------------------------------------------------

pandas/_libs/parsers.pyx:365:43: Cannot assign default value to fields in cdef classes, structs or unions
    cpdef DEFAULT_BUFFER_HEURISTIC=2**20
                                 ^
------------------------------------------------------------

pandas/_libs/parsers.pyx:377:34: Cannot assign default value to fields in cdef classes, structs or unions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for checking. Let's not get hung up on it for now then - I think just a wart between the C/Python and how they get expressed in the Cython global namespace. Can always come back and refactor if we establish a better pattern

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming!

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke merged commit ee8e01e into pandas-dev:main Jun 28, 2023
@mroeschke mroeschke deleted the tst/slow branch June 28, 2023 18:26
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* Address more slow tests

* Parameterize slow test

* Reduce data size in multi_thread

* Use constant data for test_int64_overflow_groupby_large_df_shuffled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants