-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Refactor slow tests #53891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Refactor slow tests #53891
Conversation
@@ -44,32 +44,6 @@ def test_buffer_overflow(c_parser_only, malformed): | |||
parser.read_csv(StringIO(malformed)) | |||
|
|||
|
|||
def test_buffer_rd_bytes(c_parser_only): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the original issue, this sounded like a PY2 bug specifically so I don't think this needs testing anymore
@@ -118,6 +118,8 @@ cdef: | |||
float64_t NEGINF = -INF | |||
int64_t DEFAULT_CHUNKSIZE = 256 * 1024 | |||
|
|||
DEFAULT_BUFFER_HEURISTIC = 2 ** 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be set as a property of the TextReader? It is a bit ambiguous in a pyx file but in the global namespace all caps I would expect this to be a compile time constant; attaching it as a property makes things clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if this is just for testing maybe you can patch buffer_lines
directly? The naming here is a bit unclear when scoped outside of the initializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have the same concern about DEFAULT_CHUNKSIZE
above too?
Ideally I think these magic numbers should at least be made obvious so they can be configured or removed #53781 and it being in the TextReader would make that less obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am mistaken with how Cython generates the code comparing this to DEFAULT_CHUNKSIZE
is exactly the problem; that is a compile time constant whereas this value can be modified at runtime, but they both look the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you suggest how I could make DEFAULT_BUFFER_HEURISTIC
a property on the cdef TextReader class? I'm having trouble defining it in a way that I could monkeypatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm does declaring it cpdef
help at all? Not worth going down a rabbit hole over if its a hold up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my unsuccessful attempts so far just to double check
cdef public:
int64_t leading_cols, table_width, DEFAULT_BUFFER_HEURISTIC=2**20
^
------------------------------------------------------------
pandas/_libs/parsers.pyx:365:43: Cannot assign default value to fields in cdef classes, structs or unions
cpdef DEFAULT_BUFFER_HEURISTIC=2**20
^
------------------------------------------------------------
pandas/_libs/parsers.pyx:377:34: Cannot assign default value to fields in cdef classes, structs or unions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks for checking. Let's not get hung up on it for now then - I think just a wart between the C/Python and how they get expressed in the Cython global namespace. Can always come back and refactor if we establish a better pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Address more slow tests * Parameterize slow test * Reduce data size in multi_thread * Use constant data for test_int64_overflow_groupby_large_df_shuffled
No description provided.