Skip to content

Package project #714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Conversation

slarse
Copy link

@slarse slarse commented Feb 22, 2019

Hello! This turned into a bit of an essay, sorry :)

This PR fixes #708, and is mostly meant to give the maintainers of this project an idea of what we are doing on our end. We realize that this may prove too disruptive to the project as a whole, especially in such a short time, but think that this is the right direction to take it in the long run.

The intention of this PR is to move the project toward a proper Python package, which was the only way my group and I could find to make #309 possible. It does the following:

  • Move all files into a single top-level directory called thealgorithms (529dc79)
    • Also adds __init__.py files to all directories (some already had them), making them Python packages/subpackages
    • This is responsible for the 300+ file changes, you will find that all but a handful of pre-existing files have just been moved
  • Add a test that finds all modules in thealgorithms and tries to import them (5fb67c2)
  • Add a setup.py file such that the package can be installed, along with currently known dependencies (ebbb816)
    • This also allows Travis to run tests on the whole package
    • Note that setup.py currently requires Python 2.7 or 3.6+ to install. This is easily tuned by changing python_requires in setup.py.
  • Update the .travis.yml file to run every test in the tests directory.
  • Minimal changes to one of the graph algorithms just to pass Flake8 (had undefined names) (e7d121b)

Gathering everything in a package like this sacrifices some accessibility for absolute beginners, but has many benefits in our eyes. It allows for unified testing of the whole project in CI, making it much easier for maintainers to quickly check the quality of pull requests. Given that the project is well over 10000 lines of code by now, maintainability is important. It also makes the more complicated and dependency-heavy parts of the library easier to use, as the setup.py file allows one to install the whole package with dependencies using pip. Instructions for how to install should reasonably be added to README.md, but we kept it in setup.py for now.

Note that, as a package, this is currently broken. Some parts support Python 3 only, some parts Python 2 only, and a few parts successfully support both. Only about 2/3 of the modules can be imported "properly" from the root of the project (and thus, from the package) at this time. For the reasons mentioned in #710, my group and I will primarily work on getting Python 3 support up and running, and migrating any existing tests into the tests directory. Our goal is to have every module be importable on a package, and having at least some functional tests running on Travis, by Tuesday next week.

@poyea
Copy link
Member

poyea commented Feb 23, 2019

Is it possible to maintain the current directory structure (or with slight renaming of current directories) while having tests? Or is that logical to do so? The main concern is really the accessibility.

@slarse
Copy link
Author

slarse commented Feb 23, 2019

It should definitely be possible, but comes with two primary problems:

  1. Whenever a new directory is added, it will need to be added manually to coverage measures and import (essentially sanity check) tests. One could also just automatically go into all top-level directories, but then you run the risk of going into e.g. a local virtual environment (would not be a problem on Travis, but for local tests).
  2. Each top-level directory will be its own package, so name clashes could easily occur with the standard library. For example, if you have a directory called queue, and install the package, that will overshadow the standard library queue.

Also, we realized we must merge the directories that differ only by capitalization (e.g. Graphs and graphs), as they cause issues on file systems that don't differentiate files by capitalization alone.

I would say that, in general, having that many separate packages would be bad practice. However, since this is educational, I am inclined to agree with you that accessibility may come before engineering best practices here.

We will try to make it happen!

@slarse
Copy link
Author

slarse commented Feb 23, 2019

@poyea So, we have reverted back to the original directory structure and managed to get it to work well with Travis.

TL;DR: We can do either approach, both work. With multiple top-level directories, you essentially get multiple projects from a Pythonic point of view. I can't really judge if this is better or worse for beginners. But the unified test suite will work for either approach, and with the work I just put in, neither is more difficult for us.

  • Packages (i.e. top level directories) are dynamically discovered and all modules are collected for the import test.
  • A script using a similar approach discovers all top level packages and generates the argument to pytest needed to trace coverage on them (6b97f3c)
    • We need coverage for our university class, but it will be beneficial for you as well as you can see which parts of the project are not tested!

So point 1. I was worried about above is essentially a non-issue, as e.g. a virtual environment will not be a Python package. If you add a new package (top level directory), that will be discovered automatically as long as it has an __init__.py file in it.

The real hit to accessibility is that scripts that import stuff from other parts of the project must be run as modules. For example, if graphs/graph.py imports graphs/node.py with the import statement from . import node (explicit relative import) or from graphs import node (correct absolute import), then you need to run the module from the root directory with python -m graphs.graph.

Any standalone script will still be runnable by just cding into the directory and running it as usual.

That's the price to pay for being able to test everything with one test suite!

@slarse
Copy link
Author

slarse commented Feb 23, 2019

Oh, I can mention that we have almost finished work on making every module importable by the test suite (with Python 3). It essentially just consisted of:

  • Adding external dependencies.
  • Wrapping blocking code in an if __name__ == "__main__": block
  • Fixing mixed absolute/relative imports

But we thought that'd be better for a different PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce a proper package structure
4 participants