Skip to content

TST: use env variable instead of pytest option for testing ArrayManager #40222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

I noticed a problem with the current --array-manager pytest option to test the ArrayManager, while checking the window tests.

The problem with the option is that conftest.py files are read first, before handling the option in pytest_sessionstart. This means that any DataFrame or Series that gets constructed on import of a conftest.py file, has already been created before the setting is changed, and will thus be created with the wrong manager.
Example of this is a DataFrame that is not created inside a (fixture) function, but created top-level in the conftest.py file (or eg used in parametrization).

Luckily we don't have too many of those, but one example in window/conftest.py:

@pytest.fixture(
params=[
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1, 0]),
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1, 1]),
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=["C", "C"]),
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1.0, 0]),
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[0.0, 1]),
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=["C", 1]),
DataFrame([[2.0, 4.0], [1.0, 2.0], [5.0, 2.0], [8.0, 1.0]], columns=[1, 0.0]),
DataFrame([[2, 4.0], [1, 2.0], [5, 2.0], [8, 1.0]], columns=[0, 1.0]),
DataFrame([[2, 4], [1, 2], [5, 2], [8, 1.0]], columns=[1.0, "X"]),
]
)
def pairwise_frames(request):
"""Pairwise frames test_pairwise"""
return request.param

or one example in the toplevel conftest.py:

pandas/pandas/conftest.py

Lines 597 to 608 in 74a6898

def _create_series(index):
""" Helper for the _series dict """
size = len(index)
data = np.random.randn(size)
return pd.Series(data, index=index, name="a")
_series = {
f"series-with-{index_id}-index": _create_series(index)
for index_id, index in indices_dict.items()
}

So to avoid needing to rewrite those (and continue to ensure such cases don't get added), I don't see another way as using the environment variable approach instead of the pytest option (although the option is much easier to use interactively ..)

cc @jreback

@jorisvandenbossche jorisvandenbossche added the Testing pandas testing functions or related to the test suite label Mar 4, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Mar 4, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, looks fine to me

@@ -478,14 +479,19 @@ def use_inf_as_na_cb(key):
_use_inf_as_na(key)


# Get the default from an environment variable, if set, otherwise defaults to "block"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would just inline this on L494 (and add a 1-liner comment before) and/or update the option description with the text on how to control this

@jreback
Copy link
Contributor

jreback commented Mar 4, 2021

Luckily we don't have too many of those, but one example in window/conftest.py:

should we just change / check for these?

@jorisvandenbossche
Copy link
Member Author

should we just change / check for these?

I think that's a bit fragile to ensure that in the future (and in theory there is also nothing wrong with it, just not working with this option that influences the dataframe creation). Would there be an easy way to check this?

@jreback
Copy link
Contributor

jreback commented Mar 4, 2021

should we just change / check for these?

I think that's a bit fragile to ensure that in the future (and in theory there is also nothing wrong with it, just not working with this option that influences the dataframe creation). Would there be an easy way to check this?

agree its fragile, but would be good to check it, yeah not sure its easy.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jorisvandenbossche jorisvandenbossche merged commit fad96e1 into pandas-dev:master Mar 4, 2021
@jorisvandenbossche jorisvandenbossche deleted the am-fix-testing branch March 4, 2021 15:28
@jorisvandenbossche
Copy link
Member Author

If someone comes up with a good approach for keeping it with the pytest option, we can always change back. But for now I want to ensure I am correctly testing ArrayManager ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants