Skip to content

Random seed appears to also seed other random() calls #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
medley56 opened this issue Mar 9, 2023 · 10 comments
Closed

Random seed appears to also seed other random() calls #531

medley56 opened this issue Mar 9, 2023 · 10 comments

Comments

@medley56
Copy link

medley56 commented Mar 9, 2023

Python Version

3.9.0

pytest Version

7.2.2

Package Version

3.12.0

Description

In our tests, we regularly create mocked AWS S3 buckets with random 16 character names. e.g.

bucket_name = ''.join(random.choice(string.ascii_letters) for i in range(16))

Those buckets are created via a fixture that yields a function called create_mock_bucket() and the scope of that fixture is set to the default (per function).

@pytest.fixture
def mock_s3_context():
    """Everything under/inherited by this runs in the mock_s3 context manager"""
    with mock_s3():
        # Yield the (mocked) s3 Resource object
        # (see boto3 docs: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html)
        yield boto3.resource('s3')


@pytest.fixture
def create_mock_bucket(mock_s3_context):
    """Returns a function that allows dynamic creation of s3 buckets with option to specify the name.

    Note: if the bucket already exists, this doesn't overwrite it. Previous contents will remain.
    Caution: If you create multiple objects at the same location, you may get conflicts"""
    s3 = mock_s3_context

    def _create_bucket(bucket_name: str = None):
        """Creates a mock bucket, optionally with a custom name.
        """
        if not bucket_name:
            bucket_name = ''.join(random.choice(string.ascii_letters) for i in range(16))
            print(f"Generated random bucket name: {bucket_name}")
        bucket = s3.Bucket(bucket_name)
        if not bucket.creation_date:  # If bucket doesn't already exist
            bucket.create()
            print(f"Created mock S3 bucket {bucket}.")
        return bucket
    yield _create_bucket

When pytest-randomly is enabled, those bucket names all end up being exactly the same string for a given pytest execution. When run with -p no:randomly, the bucket names appear random as expected and all the tests pass. I suspect that the random seed for pytest-randomly is somehow leaking over into seeding our call to random.choice but that is purely speculation.

NB: This has exposed another weakness in our testing, namely that our mocked buckets are persisting between tests. I'm running that down separately but the behavior described above seems unintentional.

@adamchainz
Copy link
Member

It's not "leaking over", setting the global random seed is one of pytest-randomly's features. README:

Resets the global random.seed() at the start of every test case and test to a fixed number - this defaults to time.time() from the start of your test run, but you can pass in --randomly-seed to repeat a randomness-induced failure.

You can disable this feature or use a local random.Random() instance.

@medley56
Copy link
Author

@adamchainz Thank you for the quick response. Understood that this is not a bug. I assumed that pytest-randomly would not affect other randomness but in hindsight I guess it's kind of obvious.

Thanks for the suggestion on the local random.Random() instance. It sounds like that's the correct solution for us.

@MichaelSoegaard
Copy link

I just got hit by this one as well. I didn't put two and two together, when I read the documentation, to realize that Randomly fixates random when we run our full test suite.

I get the solution of using random.Random(), but as we are many developers it's not a feasible solution to rely on every developer to remember not to use random in our code base.

Would it be possible for Randomly to base the seed on an instance of random: random.Random() instead or use another package for randomization?

@adamchainz
Copy link
Member

I get the solution of using random.Random(), but as we are many developers it's not a feasible solution to rely on every developer to remember not to use random in our code base.

There's also --randomly-dont-reset-seed to disable this behaviour altogether.

Would it be possible for Randomly to base the seed on an instance of random: random.Random() instead or use another package for randomization?

I don't understand what you're suggesting. The seed is already from an instance of random.Random():

default_seed = random.Random().getrandbits(32)
.

@medley56
Copy link
Author

@adamchainz I think the problem is that developer's probably don't expect the random seed generated by pytest to affect all other random calls in their test suite. @MichaelSoegaard, This might be unavoidable, since the whole point of pytest-randomly is to set a global random seed that can be reproduced. If pytest-randomly doesn't affect the global scope, then we lose that reproducibility, which is more important IMO.

@MichaelSoegaard
Copy link

Yeah, I get it now. -and you're right @medley56, it hadn't crossed my mind that it would actually also affect our calls to random inside our code base. In some (ours) cases it breaks the code because items would get the same names or ID's.

If I could choose, I would like to just have the tests randomized while I don't care about getting the global seed locked. It would be possible if all the tests were just randomized without locking the seed and then index the order of the tests for reproducibility.

I apparently just misunderstood the core purpose of the package.

@adamchainz
Copy link
Member

If I could choose, I would like to just have the tests randomized while I don't care about getting the global seed locked. It would be possible if all the tests were just randomized without locking the seed and then index the order of the tests for reproducibility.

Use --randomly-dont-reset-seed.

I apparently just misunderstood the core purpose of the package.

The tagline is "Pytest plugin to randomly order tests and control random.seed." Happy to take suggestions to improve that.

@MichaelSoegaard
Copy link

Thank you @adamchainz

I apparently just misunderstood the core purpose of the package.

The tagline is "Pytest plugin to randomly order tests and control random.seed." Happy to take suggestions to improve that.

-and my comment wasn't meant as being sarcastic. I meant it when I said I just misunderstood it. :-)

Keep up the great work.

@brycedrennan
Copy link

brycedrennan commented Jan 11, 2024

Just encountered this unexpectedly and found it surprising.

I wonder if we could change the behavior so that each test gets a unique (but deterministic) seed. The test node id could be used as part of the input to create the seed. This would preserve the reproducibility of test runs while also preventing scenarios like "my randomly generated test bucket was the same as the bucket used in another test"

@adamchainz
Copy link
Member

Please can you write a new issue rather than commenting on this old one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants