You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are several places where pandas has hidden heuristics/thresholds dictating certain behavior that is not immediately obvious or configurable to the user. IIRC, there have been bugs in rolling and to_datetime where buggy behavior was encountered when data had a particular value or the data was a certain size for example which can be hard to diagnose.
Ideally we should:
Not change behavior due to some data characteristic introspection
At lease expose the option to the user to control the heuristic
is the idea that making these configurable will help in bug hunting? or more of an "anything that can be configured should be configurable"? Because the latter im wary of.
Personally, more to help with bug hunting, but I also think it's a better user experience if behavior doesn't change based on a silent heuristic. Additionally, I've been diving into slow tests recently, and a lot of the slow tests have to generate large data to trip and test the heuristic path.
There are several places where pandas has hidden heuristics/thresholds dictating certain behavior that is not immediately obvious or configurable to the user. IIRC, there have been bugs in
rolling
andto_datetime
where buggy behavior was encountered when data had a particular value or the data was a certain size for example which can be hard to diagnose.Ideally we should:
CSV reading tokenizer chunksize
pandas/pandas/_libs/parsers.pyx
Line 119 in bb0403b
CSV line buffer size
pandas/pandas/_libs/parsers.pyx
Line 587 in bb0403b
Number of elements when to auto use numexpr
pandas/pandas/core/computation/expressions.py
Line 42 in bb0403b
TDA iter chunk size processing
pandas/pandas/core/arrays/timedeltas.py
Line 387 in bb0403b
Something pytables related
pandas/pandas/core/computation/pytables.py
Line 101 in bb0403b
pandas/pandas/io/pytables.py
Line 1887 in bb0403b
Number of element to automatically use caching in to_datetime
pandas/pandas/core/tools/datetimes.py
Line 124 in bb0403b
Chunk size to use when writing csv
pandas/pandas/io/formats/csvs.py
Line 166 in bb0403b
Number of regexes to store when time parsing
pandas/pandas/_libs/tslibs/strptime.pyx
Line 576 in bb0403b
Rank tolerance
pandas/pandas/_libs/algos.pyx
Line 61 in bb0403b
isin algo determination
pandas/pandas/core/algorithms.py
Line 521 in bb0403b
Value formatting
pandas/pandas/io/formats/format.py
Line 1562 in bb0403b
Number of elements to populate hash table
pandas/pandas/_libs/index.pyx
Line 99 in bb0403b
The text was updated successfully, but these errors were encountered: