-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pyarrow
uses is_sparse()
, which is deprecated in development code, causing warnings on I/O routines with pandas
#52899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Arrow is testing on our nightlies, so they should see this soonish. That said opening an issue there might make sense |
I opened apache/arrow#35329 That said, I am missing a bit of rationale of why we are deprecating all those helper functions. Has been this been discussed somewhere? There is the issue about not using them internally in hot paths for performance reasons (#27224), but that is not necessarily a reason to deprecate them for external usage? |
Wouldn't external users also run into the same performance issues? If so, sounds fine to have deprecated? OK to close and take this forwards in arrow? |
IMHO, I don't think you should close, but do something in the pandas code to suppress the warning if people are using the I/O routines that depend on arrow that use |
yeah makes sense, there's places where numpy warnings are silenced after all |
It depends a lot on what you are using this for. There is no performance issue at all if you don't use this in any perf sensitive code path (like an inner loop in groupby in or in array construction code). If you use this for example in a high-level workflow checking some data types of your data, that's not an issue. |
These functions also don't add much value. An actual isinstance check is clearer compared to some wrapper function? |
That also depends on the use case I think (and on how we intend them to work, didn't actually check in detail here). For example, an "is_integer_dtype" can be useful to abstract over different "types" of dtypes (numpy int, nullable Int, pyarrow int), if that is how we want this to work and be used. Now, |
We discussed this somewhere (not sure where right now) that we only want to deprecate functions that can be replaced by an easy isinstance check, the more complicated ones should stay and maybe get refactored if possible |
Is the solution here to change these to DeprecationWarning? |
Is this closeable? It looks like pyarrow has removed |
With version 2.1 and current pandas-stubs, we're not seeing the error any more, so I think it is good to close. |
See tests from pandas-stubs: https://github.com/pandas-dev/pandas-stubs/actions/runs/4789491663/jobs/8517432719
to_feather()
,to_orc()
andto_parquet()
call pyarrow, which is usingapi.is_sparse()
, which is now marked as deprecated.Created pandas-stubs issue:
pandas-dev/pandas-stubs#657
Comment here by @mroeschke in #52642 (comment) indicates it is reasonable for pyarrow to change their code - checking if we have created an issue there, and whether we should do something about the warning that a user would see if using those methods from pandas.
@jbrockmendel
The text was updated successfully, but these errors were encountered: