Skip to content

CI: add minimal requirements file #48828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MarcoGorelli opened this issue Sep 28, 2022 · 11 comments
Open

CI: add minimal requirements file #48828

MarcoGorelli opened this issue Sep 28, 2022 · 11 comments
Labels

Comments

@MarcoGorelli
Copy link
Member

A common complaint in contributor sprints is that setting up a development environment takes too long

Since the docs have moved from recommending conda to mamba, this has improved, but I think it could still be better.

For most contributors (especially casual ones at sprints), most dependencies are irrelevant. We could have minimal environment and requirements files which contain the bare minimum to build pandas locally so people can get started quickly

I think just cython numpy python-dateutil pytz pytest pytest-asyncio should be enough - we could have a script which creates this from environment.yml and takes version numbers from there

@YvanCywan
Copy link
Contributor

Would you mind if I took a look?

@phofl
Copy link
Member

phofl commented Sep 28, 2022

This has the disadvantage that you can't run most of the tests

@YvanCywan
Copy link
Contributor

We could always add more test dependencies to the minimal_environment.yml, to at least get more of the tests working properly.

I am assuming that the packages under # test dependencies in the environment.yml is all that is needed for the majority of them, I could be wrong however.

@MarcoGorelli
Copy link
Member Author

I'll try this out at the next contributor sprint - if it's enough for people to be productive, maybe we can consider adding it to the docs, or it can be something that's only ever part of instructions for sprints

@YvanCywan YvanCywan mentioned this issue Sep 28, 2022
1 task
@phofl
Copy link
Member

phofl commented Sep 28, 2022

If we want to add this to the docs, we have to add a couple of clarifications, that this is not sufficient to pass all tests and some things might fail unexpectedly

@mroeschke
Copy link
Member

mroeschke commented Sep 28, 2022

This somewhat assumes "minimal" contributions will be bug fixs/enhancements, but doc changes might be common contributions that should ensure the doc dependencies are available too?

An alternative idea would be to provide conda lock files for a variety of platforms such that users aren't running the slow solve step but still get all the dependencies to make any type of contribution: https://github.com/conda-incubator/conda-lock (these can also be used in the CI too)

@asishm
Copy link
Contributor

asishm commented Sep 28, 2022

This would be a great change to have. The current environment installs a lot of things. For example pytorch is listed as a dependency (of a downstream pacakge) -

- pytorch
and as a pandas user who might want to do some minor bugfixes, it's a bit confusing as to why I also need to have pytorch installed.

I also ran into #47305 when using -j 2 option to speed things up. It happened twice consistently after which I gave up and switched back to -j 1.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 29, 2022

Probably have to add things used in the pre-commit, like black, flake8, isort and others.

@YvanCywan
Copy link
Contributor

@Dr-Irv Well, that depends on if someone installs pre-commit for the project or not. Otherwise, the precommit CI should still function as normal when the pull request is made.

But to have some pre-PR checks, it might be worth adding it regardless.

@WillAyd
Copy link
Member

WillAyd commented Oct 13, 2022

This could also be a use case to publish a pandas-dev image on DockerHub

@MarcoGorelli
Copy link
Member Author

With regards to this particular issue, I've realised that the 311-dev job actually has exactly what I was looking for

If we just move that those requirements into their own file, then that gives a minimal installation with which you can build pandas and run the vast majority of tests

This would be really useful when running tasks on Colab/Kaggle (for example, bisecting regressions)

#50339 would do this

@MarcoGorelli MarcoGorelli changed the title DOC: add minimal_environment.yml and minimal_requirements.txt DOC: add minimal requirements file Dec 19, 2022
@MarcoGorelli MarcoGorelli changed the title DOC: add minimal requirements file CI: add minimal requirements file Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants