Skip to content

REF: make dependency structure more DAG-like #25203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks
jbrockmendel opened this issue Feb 7, 2019 · 3 comments
Closed
5 tasks

REF: make dependency structure more DAG-like #25203

jbrockmendel opened this issue Feb 7, 2019 · 3 comments
Labels
Refactor Internal refactoring of code

Comments

@jbrockmendel
Copy link
Member

There are a handful of places where the dependency structure is almost DAG-like. If these can be smoothed out, reasoning about the code can be made easier. I'll update this list as I encounter these places.

For reference, I'm treating as the "default" dependency structure that enforced by isort:

  • _libs, 'errors', compat, and util._* are considered "upstream" of everything else.
    • Within _libs, tslibs is considered upstream of the rest of _libs
  • Within core, core.dtypes is considered upstream of the rest of core.
  • plotting is considered downstream of core
  • io is a hodge-podge
  • tseries is mostly destined to be refactored into tslibs

Some places where this can be made more coherent:

  • compat.pickle_compat imports pandas, so really isn't upstream of anything.
  • compat.numpy.function imports from util._validators. AFAICT the imported util._validators functions aren't used anywhere else, so these could be made self-contained. Both of these modules import from core.dtypes.common, though these imports could come directly from _libs.lib.
  • core.dtypes is a mix. base, generic, inference have no non-upstream imports. But common has a couple, cast and concat have a bunch, with dtypes and missing in between. It may be worth splitting this directory into two pieces, one of which is strictly upstream from the rest of core and one which is not.
  • Important parts of io could be made independent/upstream of core if config was made further upstream.
  • util is split between modules like _decorators that are low-dependency and testing which is depends-on-everything.
@gfyoung gfyoung added Refactor Internal refactoring of code Dependencies Required and optional dependencies labels Feb 7, 2019
@jorisvandenbossche jorisvandenbossche removed the Dependencies Required and optional dependencies label Feb 7, 2019
@TrigonaMinima
Copy link

@jbrockmendel I'd like to work on this, but I don't understand what you mean by making them more coherent.

Just to clarify, by upstream you mean there are no more package (pandas) imports in the upstream modules and other downstream modules import from these top level (upstream) modules?

A few questions from each of the points you mentioned-

  • Where should the compat.pickle_compat go?
  • Making util._validators self contained meaning moving them to compat.numpy and then using them directly in compat.numpy.function?
  • Create 2 directories like code.dtypes.core and code.dtypes.non_core for the lack of better directory names
  • Where would the config move to when made upstream? How to decide the important parts of io
  • I don't understand what to do with _decorators and testing here.

@jbrockmendel
Copy link
Member Author

@TrigonaMinima this topic needs discussion before being implemented. The most helpful thing to do here is offer a thoughtful opinion on whether this is a good or bad idea (e.g. how users or developers of other projects might be affected).

Just to clarify, by upstream you mean there are no more package (pandas) imports in the upstream modules and other downstream modules import from these top level (upstream) modules?

Right. pandas.foo is upstream from pandas.bar if bar imports from foo and foo does not import from bar.

@TrigonaMinima
Copy link

I have only been a user of pandas till now - mostly with jupyter notebooks. I am not familiar with the pandas code base and how it's being used by other project devs. From my very naive point of view, I think, we should divide it into 2 parts-

  1. Parts which are completely internal and can be refactored. By internal I mean the files which will not be used directly by the users. For example _decorators looks like an internal file.
  2. Parts where we have to think about not breaking things for the users.

1st part we can work upon now, meanwhile 2nd part can be discussed further.

But I think to me the bigger question is, why to even do this activity? Is spending time on this important? Forgive me if this is something obvious. Is it to simplify the pandas code base so that it's more intuitive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

4 participants