Skip to content

pyarrow uses is_sparse(), which is deprecated in development code, causing warnings on I/O routines with pandas #52899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dr-Irv opened this issue Apr 24, 2023 · 12 comments
Labels
Arrow pyarrow functionality Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 24, 2023

See tests from pandas-stubs: https://github.com/pandas-dev/pandas-stubs/actions/runs/4789491663/jobs/8517432719

to_feather(), to_orc() and to_parquet() call pyarrow, which is using api.is_sparse(), which is now marked as deprecated.

Created pandas-stubs issue:
pandas-dev/pandas-stubs#657

Comment here by @mroeschke in #52642 (comment) indicates it is reasonable for pyarrow to change their code - checking if we have created an issue there, and whether we should do something about the warning that a user would see if using those methods from pandas.

@jbrockmendel

@phofl
Copy link
Member

phofl commented Apr 24, 2023

Arrow is testing on our nightlies, so they should see this soonish. That said opening an issue there might make sense

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas Arrow pyarrow functionality Needs Discussion Requires discussion from core team before further action labels Apr 25, 2023
@jorisvandenbossche
Copy link
Member

I opened apache/arrow#35329

That said, I am missing a bit of rationale of why we are deprecating all those helper functions. Has been this been discussed somewhere?

There is the issue about not using them internally in hot paths for performance reasons (#27224), but that is not necessarily a reason to deprecate them for external usage?

@MarcoGorelli
Copy link
Member

Wouldn't external users also run into the same performance issues? If so, sounds fine to have deprecated?

OK to close and take this forwards in arrow?

@MarcoGorelli MarcoGorelli added the Closing Candidate May be closeable, needs more eyeballs label Apr 25, 2023
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 25, 2023

OK to close and take this forwards in arrow?

IMHO, I don't think you should close, but do something in the pandas code to suppress the warning if people are using the I/O routines that depend on arrow that use is_sparse()

@MarcoGorelli MarcoGorelli removed the Closing Candidate May be closeable, needs more eyeballs label Apr 25, 2023
@MarcoGorelli
Copy link
Member

yeah makes sense, there's places where numpy warnings are silenced after all

@jorisvandenbossche
Copy link
Member

Wouldn't external users also run into the same performance issues? If so, sounds fine to have deprecated?

It depends a lot on what you are using this for. There is no performance issue at all if you don't use this in any perf sensitive code path (like an inner loop in groupby in or in array construction code). If you use this for example in a high-level workflow checking some data types of your data, that's not an issue.

@phofl
Copy link
Member

phofl commented Apr 25, 2023

These functions also don't add much value. An actual isinstance check is clearer compared to some wrapper function?

@jorisvandenbossche
Copy link
Member

That also depends on the use case I think (and on how we intend them to work, didn't actually check in detail here). For example, an "is_integer_dtype" can be useful to abstract over different "types" of dtypes (numpy int, nullable Int, pyarrow int), if that is how we want this to work and be used.

Now, is_integer_dtype is an example we didn't actually (yet?) deprecate. But this can be true for others as well.

@phofl
Copy link
Member

phofl commented Apr 25, 2023

We discussed this somewhere (not sure where right now) that we only want to deprecate functions that can be replaced by an easy isinstance check, the more complicated ones should stay and maybe get refactored if possible

@jbrockmendel
Copy link
Member

Is the solution here to change these to DeprecationWarning?

@lithomas1
Copy link
Member

Is this closeable?

It looks like pyarrow has removed is_sparse, and no one else from downstream seems to have complained about these removals.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 28, 2023

Is this closeable?

It looks like pyarrow has removed is_sparse, and no one else from downstream seems to have complained about these removals.

With version 2.1 and current pandas-stubs, we're not seeing the error any more, so I think it is good to close.

@phofl phofl closed this as completed Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

7 participants