-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Series construction: _try_cast typo, function often taking slow route? #28145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there something from a user-facing perspective that you see impacted by this? The linked code you have just points to the range section which may or may not be what your question is limited to - would be helpful to clarify that |
Hey @WillAyd thanks for your prompt response! It appears that
I would say there are 2 other related cases that will benefit
(If this isn't a bug, my apologies! :) |
Cool thanks. I'm personally not super familiar with this part of the code so can't say for certain but seems reasonable. You can try it and run the test suite to see if nothing breaks and if not submit a PR Also would be worth confirming with an ASV. I only see one for a datetime constructor in asv_bench/benchmarks/series_methods.py which might not be applicable, but could add one if not covered |
Okay, thanks for the quick feedback--I'll try adding a On the other hand, it seems like this behavior is a couple years old, so I wouldn't be surprised if I'm missing something. |
Alright, I'm having a bit of trouble running the full suite of tests (stacktrace below)
(master)*$ pytest pandas --cov=pandas -r sxX --strict
============================================================= test session starts ==============================================================
platform darwin -- Python 3.6.7, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
hypothesis profile 'ci' -> deadline=timedelta(milliseconds=500), suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('/Users/machow/repo/pandas/.hypothesis/examples')
rootdir: /Users/machow/repo/pandas, inifile: setup.cfg, testpaths: pandas
plugins: xdist-1.29.0, forked-1.0.2, hypothesis-4.34.0, cov-2.7.1, mock-1.10.4
collecting 40023 items / 1 skipped / 40022 selected Fatal Python error: Aborted
Current thread 0x00007fffabb85380 (most recent call first): However, I was able to run a subset of them..
There were 7 failures, related to the change stopping pandas/pandas/core/construction.py Line 521 in 5d9fd7e
pandas/pandas/core/dtypes/cast.py Lines 932 to 934 in 5d9fd7e
I'll take another pass at the tests, hopefully in the next couple days! |
Changing |
Code Sample, a copy-pastable example if possible
Problem description
During series construction, a function,
sanitize_array
attempts to use_try_cast
, to cast the input to a better type._try_cast
is fairly slow to run, so it tries to avoid casting in common cases. However, due to a missingnot
keyword, it appears_try_cast
runs for the cases it wants to avoid (like the one above).Here are the relevant lines of
_try_catch
:https://github.com/pandas-dev/pandas/blob/master/pandas/core/construction.py#L511-L513
Should this be
not maybe_castable(arr)
?It is very surprising that lines like this would intentially create an array, and then try to cast it, even when the dtype option passed is None.
https://github.com/pandas-dev/pandas/blob/master/pandas/core/construction.py#L429-L432
Expected Output
_try_cast
not run duringsanitize_array
for common types (e.g. int64). However, from looking at it with%%prune
, and running pdb, I can see functions likemaybe_cast_to_datetime
are called.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.1
numpy : 1.16.1
pytz : 2018.9
dateutil : 2.8.0
pip : 19.1.1
setuptools : 39.0.1
Cython : None
pytest : 4.4.2
hypothesis : None
sphinx : 2.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.2 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: