Skip to content

DEV: CLI concept for Pandas #47700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
noatamir opened this issue Jul 13, 2022 · 11 comments
Closed

DEV: CLI concept for Pandas #47700

noatamir opened this issue Jul 13, 2022 · 11 comments
Labels
Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@noatamir
Copy link
Member

CLI concept for Pandas

Inspired by the new SciPy CLI I propose a do.it + rich.click CLI for Pandas.

The goal is to create an one-stop tool for contributors that includes useful developer commands and tools distributed across our documentation.

Maintainers possibly build their own local solutions over time, but for new contributors it takes time to discover and get used to all the functionalities and we can facilitate a smoother ramp up, and in-terminal documentation via help functionality.

I would be happy to keep maintaining the CLI as the functionalities included change over time.

In a series of PRs I can introduce the do.it CLI, and propose that it includes the following functionalities (the proposed workflows are examples for illustration only, what is possible and maintainable will be determined during implementation):

  • build and install package on path
  • build the docs (full and partial)
    • illustration:
    $ python do.py docs
    Building the docs locally.
    entering direcory /Users/username/pandas/docs
    Running Sphinx v4.5.0
    loading pickled environment... done
    ...
    entering directory /Users/username/pandas
    Building the docs locally completed. Please note the build processes errors and warnings. All generated output files are located in /Users/username/pandas/docs/build.
    
  • run the test suite (full and partial, e.g. for specific modules, tests)
    • illustration:
    $ python do.py run-test-suite
    ================================================================================ test session starts ================================================================================
    platform darwin -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
    ...
    
    $ python do.py run-test-module test_multiindex.py
    
    • latter example might need full path. I will assess this during development.
  • install pre commit
    • illustration:
    $ python do.py pre-commit
    Installing pre-commit.
    Collecting pre-commit
      Using cached pre_commit-2.19.0-py2.py3-none-any.whl (199 kB)
    ...
    pre-commit installed at .git/hooks/pre-commit
    Installing pre-commit completed. You can now run pre-commit checks. 
    
  • run pre commit checks (local or upstream/main)
    • illustration:
    $ python do.py pre-commit --file_name1 --file_name2
    running pre-commit on local files: file_name1, file_name2
    absolufy-imports........................................................................................Passed
    vulture.................................................................................................Passed
    black...................................................................................................Passed
    codespell...............................................................................................Passed
    ...
    
    § python do.py pre-commit --upstream
    > pre-commit run --from-ref=upstream/main --to-ref=HEAD --all-files
    running pre-commit on all upstream-main files
    ...
    
  • re-build C extensions
    • illustration:
    $ python do.py c-extesions
    building the C extensions
    running build_ext
    copying build/lib.macosx-11.0-arm64-cpython-38/pandas/_libs/algos.cpython-38-darwin.so -> pandas/_libs
    copying build/lib.macosx-11.0-arm64-cpython-38/pandas/_libs/arrays.cpython-38-darwin.so -> pandas/_libs
    ...
    building the C extensions is complete.
    

Clarifications:

  • Is there anything else from the SciPy example you would like to have? opening shells/ipython, creating release notes, running benchmarks?
  • Why do.it?
    • I can explore other tools if you would like. I saw that SciPy had a good experience developing with this one and considering the similar requirements, I figured there isn't a need to spend too much time on tool selection. Also using similar tools as other folks in the echosystem might be usful in the long run if we need similar features, face similar bugs, etc.
  • The do.it CLI support colors which will enable us to highlight important takeaways and keywords.
@noatamir noatamir added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 13, 2022
@MarcoGorelli MarcoGorelli added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 13, 2022
@MarcoGorelli
Copy link
Member

Hey @noatamir

⏱ Reduction in the time spent navigating documentation

Nice, I like the sound of that, the pandas docs are huge (and arguably should be cut down, but that's a separate issue)

I also like python do.py pre-commit --upstream - I'd suggested using the --from-ref --to-ref command here, but it was pointed out that it looked too complicated/intimidating
Having it as a simple command as part of this dev cli (which could make its way into the PR template) might make it more appealing

Maybe a command to address #35685 (bisecting) could make its way into it too?

@twoertwein
Copy link
Member

twoertwein commented Jul 13, 2022

Potentially an alternative to do.it. pandas-stubs embraced the poetry universe: for the dependencies but also to run all the tests (through poe). Poe makes it also easy to quickly discover and run tests (I must admit I just use mypy/pyright/pytest/pre-commit directly):

$ poe --help
[...]
CONFIGURED TASKS
  test_all             Run all tests
    -c, --clean_cache  remove cache folders (mypy and pytest)
  test_src             Run local tests (includes 'mypy_src', 'pyright_src', 'pytest', and 'style')
    -c, --clean_cache  remove cache folders (mypy and pytest)
  test_dist            Run tests on the installed stubs (includes 'mypy_dist' and 'pyright_dist')
    -c, --clean_cache  remove cache folders (mypy and pytest)
  pytest               Run pytest
  style                Run pre-commit
  mypy_src             Run mypy on 'tests' (using the local stubs) and on the local stubs
  mypy_dist            Run mypy on 'tests' using the installed stubs
  pyright_src          Run pyright on 'tests' (using the local stubs) and on the local stubs
  pyright_dist         Run pyright on 'tests' using the installed stubs

@mroeschke
Copy link
Member

IMO an ideal goal if adopting a CLI workflow for development is if our CI utilizes the same CLI workflow. I think making the jump between local development and CI testing small is a good goal because ultimately the CI system gates contributions.

There was a proposal to use docker-compose (#46532) as a "CLI" workflow too.

@jreback
Copy link
Contributor

jreback commented Jul 13, 2022

+1 on cli generally

(-1 on using docker-compose generally) this just adds a whole layer of complication on top

@jorisvandenbossche
Copy link
Member

How I understand the scope of the proposal is for a CLI that helps you manage some tasks within your development environment (within that environment, rebuild the cython extensions or build the docs or ...), and (for now) not to help you set up that development environment.
So whether you create a conda env, or installed all dependencies with pip, or are working in a docker container, ..., regardless of that you can use the CLI to perform some routine tasks.

So in that sense, I would leave out things like docker-compose for now (which is for setting up an environment), and I would personally also don't care too much about exact consistency with CI (but we can of course start using the CLI to run certain things on CI as well).
I think one of the main problems when something fails on CI and you want to check / reproduce this locally, is to be able to recreate the same environment as on CI. For that, something like #46532 could indeed help. And once we have a way to better recreate a CI environment, then that would be great to integrate that in a CLI (that's actually exactly what Arrow (mentioned in #46532) does: the internal CLI tool is called archery and you can use it to interact with the docker-compose based images).

@jorisvandenbossche
Copy link
Member

Potentially an alternative to do.it. pandas-stubs embraced the poetry universe: for the dependencies but also to run all the tests (through poe).

That would require first having the discussion whether we want to use poetry in the pandas repo, so right now that makes poe less directly usable.
(and I am personally not really interested in a poetry discussion)

@jorisvandenbossche
Copy link
Member

Any more feedback? If not, I assume the best would be for @noatamir to open a first PR with an initial version (with subset of proposed functionality) to have a more concrete feedback on the proposed approach.

@phofl
Copy link
Member

phofl commented Aug 4, 2022

+1 on starting this and gathering some feedback through an initial version. We don't have to be perfect from the beginning, especially since new contributors will probably be the best people to give us feedback on this

@noatamir
Copy link
Member Author

noatamir commented Aug 4, 2022

Thanks for the feedback! I got started on this and hope to open the initial PR next week 🤞🏽

@noatamir
Copy link
Member Author

Quick update: the Gitpod issue took me a bit longer than I originally estimated. I'm going on short vacation and will still be working on that one when I'm back. I haven't forgotten about this, and I can't wait to get back into it! I'll ping back here when I pick this up again.

@MarcoGorelli
Copy link
Member

It's been a couple of years, shall we close this issue for now to (slightly) reduce the issue queue?

Can always reopen if there's renewed interest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants