Skip to content

CLN: clean-up internal sparse imports + restructure sparse submodule #28516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Sep 19, 2019

Updating our internal imports to use the path were they actually live, not the old core.sparse.api one.

While doing this, I was thinking now that for the tests, we could actually also use the top-level pd. imports.

@TomAugspurger in the core/sparse/ module, the only actual code living there is the scipy_sparse code. I moved that to arrays module (but could also be moved within the sparse.py file, or create a sparse subdirectory, ..).

Do we want to raise a deprecation warning on pandas.core.sparse.api?

@jorisvandenbossche jorisvandenbossche added Clean Sparse Sparse Data Type labels Sep 19, 2019
@TomAugspurger
Copy link
Contributor

Thanks! I had two other followup items:

  1. I forgot to add a note to 1.0.0.rst that DataFrame.to_sparse was removed
  2. There's a file pandas/tests/sparse/test_combine_concat.py that can be moved to pandas/tests/arrays/sparse

Do you want to do those here?

Do we want to raise a deprecation warning on pandas.core.sparse.api?

Either doing it now, or as part of a larger pandas.core deprecation sounds fine to me.

@jbrockmendel
Copy link
Member

Does scipy_sparse belong in core.arrays? I've been thinking that SparseAccessor might not belong in the array file, could go in core.sparse

@TomAugspurger
Copy link
Contributor

How about making pandas/core/arrays/sparse a directory? And then splitting into

  • /sparse/init.py
  • /sparse/dtype.py
  • /sparse/array_.py
  • /sparse/acessor.py
  • /sparse/scipy_.py

@jorisvandenbossche
Copy link
Member Author

It is correct that the accessor (which also is a DataFrame accessor) is not directly related to arrays, and more something on the "dataframe level". But since we do not have a proper home right now for such functionality, I think keeping the type-specific accessors close to their type implementations (array / dtype) makes somewhat sense.

Given that, having a sparse directory with several files seems fine to me.

@TomAugspurger
Copy link
Contributor

I think keeping the type-specific accessors close to their type implementations (array / dtype) makes somewhat sense.

FWIW, I slightly prefer having the accessor close to the array implementation. This more closely mirrors what 3rd party EA authors would do.

@jbrockmendel
Copy link
Member

FWIW, I slightly prefer having the accessor close to the array implementation. This more closely mirrors what 3rd party EA authors would do.

This isn't a particularly big deal, but I want to make sure my reasoning is written down somewhere:

For the most part, a reader can understand core.arrays without needing to understand the rest of core (except for core.dtypes). I view this as a desirable characteristic, the more de-coupling the better.

In the case of SparseAccessor, I know that nothing else in core.arrays depends on it so I can just pretend it isn't there when trying to reason about the arrays in isolation. But to a new reader that might not be immediately obvious. So that is both a) not a big deal and b) not a pattern I want to encourage.

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel and is the alternative that you prefer having a pandas/core/sparse with only the accessor? Or gather them in pandas/core/accessors or something?

For having it in arrays/sparse, the only additional thing that it relies upon from the rest of core is PandasDelegate, so it is not that it brings in a lot of the rest of core. And on the other hand, to understand the SparseAccessor, you need to understand the SparseArray, as is mostly just delegates methods to the SparseArray.

@jbrockmendel
Copy link
Member

and is the alternative that you prefer having a pandas/core/sparse with only the accessor? Or gather them in pandas/core/accessors or something?

At the time I formed this preference, sparse.frame and sparse.series still existed, so moving to sparse.accessor was my preferred alternative. Haven't thought about an updated preference.

the only additional thing that it relies upon from the rest of core is PandasDelegate

There are runtime imports of DataFrame, Series, Index.

@jorisvandenbossche
Copy link
Member Author

I had two other followup items:

Done

How about making pandas/core/arrays/sparse a directory?

Went ahead with that then.

@jorisvandenbossche jorisvandenbossche changed the title CLN: clean-up internal sparse imports CLN: clean-up internal sparse imports + restructure sparse submodule Sep 26, 2019
@jreback jreback added this to the 1.0 milestone Sep 26, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. merge when ready.

@jorisvandenbossche
Copy link
Member Author

@TomAugspurger @jbrockmendel OK?

@jbrockmendel
Copy link
Member

lgtm

@jorisvandenbossche jorisvandenbossche merged commit cc7367c into pandas-dev:master Sep 27, 2019
@jorisvandenbossche jorisvandenbossche deleted the cleanup-sparse branch September 27, 2019 14:59
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants