STY/TST: what standard to follow to import top-level objects in tests #33203

jorisvandenbossche · 2020-04-01T10:59:39Z

So this discussion regularly comes up (latest one being #33184), so let's try to decide which one of the two we prefer.

In test code, do we do:

from pandas import Series
...
Series(...)

or

import pandas as pd
...
pd.Series(..)

Personally, I prefer the second, and from recent discussions I have the impression @WillAyd and @datapythonista do as well (is that correct?)

To be clear: consistency within a single file is most important (so not mixing Series and pd.Series in a single file).
But when someone does a PR to fix such inconsistencies in a file, it would be good if there is agreement on whether such a PR should replace pd.Series with Series or rather replace Series with pd.Series.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-04-01T14:30:01Z

I prefer the second.

jreback · 2020-04-01T14:33:14Z

1st is much better for testing & validation.

jbrockmendel · 2020-04-01T14:33:33Z

I'm going to refer to the two options as the specific approach and the concise approach, respectively.

I've come to prefer the more-specific-imports approach.

The test suite is big, and it can be difficult to know "what is tested where?". With specific imports, I can look at the top of a file and rule out a lot of possibilities. This is especially helpful for e.g. tests.tslibs or tests.scalar where we have a carefully-constructed dependency structure and would like the tests to reflect that.

Similarly if Series is imported and DataFrame is not, that is useful information.

If pd is in the namespace, all bets are off.

jorisvandenbossche · 2020-04-01T15:27:07Z

1st is much better for testing & validation.

@jreback Can you explain why? Testing works perfectly fine with both approaches I think.

If pd is in the namespace, all bets are off.

You can still search for pd. in a file / set of files? That will of course give more (duplicated) results, so certainly not as easy to process as the imports, but with a decent search UI it might still be OK?

I fully agree that a good and logical organization of the test suite is important, and you are doing a lot of work to clean this up. But personally, I think adding file docstrings explaining the purpose of a file + documentation explaining the general layout of how the tests are organized would be more useful / accessible for other contributors to understand this rather than expecting them to look at imports.

@WillAyd reasoning from #29275 (with which I agree):

This is getting into personal development preferences but as a counter argument I sometimes find these imports very annoying. When debugging the pd namespace is almost always available, yet when copy pasting blocks of code that use this import machinery you then have to go up and see what is imported from top of the module.

jbrockmendel · 2020-04-01T15:30:08Z

yet when copy pasting blocks of code that use this import machinery you then have to go up and see what is imported from top of the module.

I'm sympathetic to this argument, but if we're going to make copy/paste-ability a criterion, fixtures are a bigger hurdle than imports.

jreback · 2020-04-01T15:33:29Z

being more explicit is way better and provides stronger consistency. allowing one to do pd.* is fine for user code but we want as strict a check in tests as possible.

I don't see any reason to use pd.* in tests unless of course we are actually tesing the namespace or functions which only exist there.

jorisvandenbossche · 2020-04-01T15:35:56Z

I'm sympathetic to this argument, but if we're going to make copy/paste-ability a criterion, fixtures are a bigger hurdle than imports.

I fully agree on that ;) And we both argumented for that elsewhere. Now, for this specific issue, it's indeed an argument to say that the import is not the most important copy paste issue, but there are still a lot of tests that are not using fixtures (or not for the data being created).

being more explicit is way better and provides stronger consistency.

I don't think that any of both approaches is "more explicit" than the other.
(in fact, I would say that rather Series(..) is less explicit, as you don't see from which module it comes. Everybody contributing to pandas probably knows where it is coming from, though ;))

datapythonista · 2020-04-01T15:57:31Z

My preference is just importing the module, mainly to make the code explicit. With a silly example, if there is a line add(2, 2), I need to check the rest of the file to know whether there is a def add(...), or a from mylib import add (and what "mylib" is).

That being said, I think in pandas code having Series and DataFrame shouldn't be a problem. And if I understand correctly, @jbrockmendel and @jreback find useful seeing a from pandas import Series at the top of the module, since they know that only Series is being tested. Seems like a fair point. I don't find it useful for myself at this point, but if it makes things clearer for some people, I'm ok adding a check to avoid using the pattern import pandas in the tests directory, and getting the cases fixed, if that's agreed.

simonjayhawkins · 2020-04-01T16:34:46Z

Consistency in a single module may be more important than following a "foolish" consistency across the whole codebase.

However, recently in #32459 there was a similar discussion for imports in cython modules. The outcome seemed to be that the 2nd pattern above was preferred in that context.

for me, if there is no clear advantage of one method over the other for testing, I would prefer to adopt the 2nd pattern for consistency.

TomAugspurger · 2020-04-06T13:54:08Z

So is the agreement to

Import the objects (Series, DataFrame, etc.) in object-specific tests like tests/series/test*.py
import pandas as pd elsewhere?

jbrockmendel · 2020-04-06T16:30:30Z

Import the objects (Series, DataFrame, etc.) in object-specific tests like tests/series/test*.py

The fewer things are tested in a module, the more I care about specific imports (recall above Im calling the two options "specific"/"concise"). So in tests.tslibs or tests.scalar I care a lot, in tests.indexes i care medium, and in tests.frame and in tests.io i dont care at all.

@TomAugspurger is this anything like what you meant with your last comment?

TomAugspurger · 2020-04-06T19:03:15Z

Something like that, though your preferences may be a bit stronger. So the wording may be "prefer specific imports, unless you're testing general things"?

jbrockmendel · 2020-04-06T19:18:55Z

though your preferences may be a bit stronger

Yah, strength of preferences in my previous comment should all be normalized by "eh, not that big a deal".

So the wording may be "prefer specific imports, unless you're testing general things"?

Which form of imports to use depends on what classes/functions X are being tested
in a file and where those lie in the dependency structure.

If X is something like `pd.concat` or `pd.read_csv` where the implementation
depends on essentially all of `pandas.core`,  then use `import pandas as pd`.

If X is something like `pd.NaT` or `tslibs.ccalendar` that have very few intra-pandas
dependencies, use specific imports to make it clear to a reader that higher-level 
objects are not tested in this file.

In between is something like TimedeltaIndex, where some test files require
higher-level objects (Series, DataFrame) and others rely on only TimedeltaIndex.
For those, authors will have to use their best judgement.

mroeschke · 2020-04-14T05:07:20Z

xref #23018 as discussion points made on the same topic

jorisvandenbossche mentioned this issue Apr 1, 2020

REF/CLN: test_get_dummies #33184

Merged

simonjayhawkins added Needs Discussion Requires discussion from core team before further action Code Style Code style, linting, code_checks Testing pandas testing functions or related to the test suite labels Apr 1, 2020

mroeschke mentioned this issue Apr 14, 2020

CLN: Use consistent imports in tests #23018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STY/TST: what standard to follow to import top-level objects in tests #33203

STY/TST: what standard to follow to import top-level objects in tests #33203

jorisvandenbossche commented Apr 1, 2020

TomAugspurger commented Apr 1, 2020

jreback commented Apr 1, 2020

jbrockmendel commented Apr 1, 2020

jorisvandenbossche commented Apr 1, 2020

jbrockmendel commented Apr 1, 2020

jreback commented Apr 1, 2020

jorisvandenbossche commented Apr 1, 2020

datapythonista commented Apr 1, 2020

simonjayhawkins commented Apr 1, 2020

TomAugspurger commented Apr 6, 2020

jbrockmendel commented Apr 6, 2020 •

edited

Loading

TomAugspurger commented Apr 6, 2020

jbrockmendel commented Apr 6, 2020

mroeschke commented Apr 14, 2020

STY/TST: what standard to follow to import top-level objects in tests #33203

STY/TST: what standard to follow to import top-level objects in tests #33203

Comments

jorisvandenbossche commented Apr 1, 2020

TomAugspurger commented Apr 1, 2020

jreback commented Apr 1, 2020

jbrockmendel commented Apr 1, 2020

jorisvandenbossche commented Apr 1, 2020

jbrockmendel commented Apr 1, 2020

jreback commented Apr 1, 2020

jorisvandenbossche commented Apr 1, 2020

datapythonista commented Apr 1, 2020

simonjayhawkins commented Apr 1, 2020

TomAugspurger commented Apr 6, 2020

jbrockmendel commented Apr 6, 2020 • edited Loading

TomAugspurger commented Apr 6, 2020

jbrockmendel commented Apr 6, 2020

mroeschke commented Apr 14, 2020

jbrockmendel commented Apr 6, 2020 •

edited

Loading