TST: Refactor slow tests #53891

mroeschke · 2023-06-27T22:04:39Z

No description provided.

mroeschke · 2023-06-27T22:05:05Z

pandas/tests/io/parser/test_c_parser_only.py

@@ -44,32 +44,6 @@ def test_buffer_overflow(c_parser_only, malformed):
        parser.read_csv(StringIO(malformed))


-def test_buffer_rd_bytes(c_parser_only):


Based on the original issue, this sounded like a PY2 bug specifically so I don't think this needs testing anymore

pandas/tests/io/parser/common/test_chunksize.py

WillAyd · 2023-06-27T22:30:58Z

pandas/_libs/parsers.pyx

@@ -118,6 +118,8 @@ cdef:
    float64_t NEGINF = -INF
    int64_t DEFAULT_CHUNKSIZE = 256 * 1024

+DEFAULT_BUFFER_HEURISTIC = 2 ** 20


Can this be set as a property of the TextReader? It is a bit ambiguous in a pyx file but in the global namespace all caps I would expect this to be a compile time constant; attaching it as a property makes things clearer

Or if this is just for testing maybe you can patch buffer_lines directly? The naming here is a bit unclear when scoped outside of the initializer

Do you have the same concern about DEFAULT_CHUNKSIZE above too?

Ideally I think these magic numbers should at least be made obvious so they can be configured or removed #53781 and it being in the TextReader would make that less obvious?

Unless I am mistaken with how Cython generates the code comparing this to DEFAULT_CHUNKSIZE is exactly the problem; that is a compile time constant whereas this value can be modified at runtime, but they both look the same

Could you suggest how I could make DEFAULT_BUFFER_HEURISTIC a property on the cdef TextReader class? I'm having trouble defining it in a way that I could monkeypatch

Hmm does declaring it cpdef help at all? Not worth going down a rabbit hole over if its a hold up

Here are my unsuccessful attempts so far just to double check

cdef public: int64_t leading_cols, table_width, DEFAULT_BUFFER_HEURISTIC=2**20 ^ ------------------------------------------------------------ pandas/_libs/parsers.pyx:365:43: Cannot assign default value to fields in cdef classes, structs or unions

cpdef DEFAULT_BUFFER_HEURISTIC=2**20 ^ ------------------------------------------------------------ pandas/_libs/parsers.pyx:377:34: Cannot assign default value to fields in cdef classes, structs or unions

Cool thanks for checking. Let's not get hung up on it for now then - I think just a wart between the C/Python and how they get expressed in the Cython global namespace. Can always come back and refactor if we establish a better pattern

Thanks for confirming!

WillAyd

lgtm

* Address more slow tests * Parameterize slow test * Reduce data size in multi_thread * Use constant data for test_int64_overflow_groupby_large_df_shuffled

mroeschke added 4 commits June 26, 2023 19:13

Address more slow tests

dc154e2

Parameterize slow test

e1edffc

Reduce data size in multi_thread

cf9ea4e

Use constant data for test_int64_overflow_groupby_large_df_shuffled

c2d7132

mroeschke added the Testing pandas testing functions or related to the test suite label Jun 27, 2023

mroeschke added this to the 2.1 milestone Jun 27, 2023

mroeschke requested a review from WillAyd as a code owner June 27, 2023 22:04

mroeschke commented Jun 27, 2023

View reviewed changes

WillAyd reviewed Jun 27, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into tst/slow

2318e3b

WillAyd approved these changes Jun 28, 2023

View reviewed changes

mroeschke merged commit ee8e01e into pandas-dev:main Jun 28, 2023

mroeschke deleted the tst/slow branch June 28, 2023 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Refactor slow tests #53891

TST: Refactor slow tests #53891

mroeschke commented Jun 27, 2023

mroeschke Jun 27, 2023

WillAyd Jun 27, 2023

WillAyd Jun 27, 2023

mroeschke Jun 27, 2023

WillAyd Jun 28, 2023 •

edited

Loading

mroeschke Jun 28, 2023

WillAyd Jun 28, 2023

mroeschke Jun 28, 2023

WillAyd Jun 28, 2023

mroeschke Jun 28, 2023

WillAyd left a comment

		@@ -44,32 +44,6 @@ def test_buffer_overflow(c_parser_only, malformed):
		parser.read_csv(StringIO(malformed))


		def test_buffer_rd_bytes(c_parser_only):

TST: Refactor slow tests #53891

TST: Refactor slow tests #53891

Conversation

mroeschke commented Jun 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Jun 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Jun 28, 2023 •

edited

Loading