Skip to content

gRPC: add more unit tests for Stream and Datastore #1935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Oct 14, 2018

Conversation

var-const
Copy link
Contributor

No description provided.

Shutdown();
datastore.reset();

EXPECT_NO_THROW(credentials.InvokeGetToken());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently fails, I'll fix before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this actually looks like a non-trivial issue.

The root of the problem is that there's an implicit dependency between grpc::CompletionQueue and grpc::ByteBuffer's lifetimes. The smallest repro is just:

{
  grpc::Slice slice{"foo"};
  grpc::ByteBuffer b{&slice, 1}; // Buffer must be non-empty
  grpc::CompletionQueue cq; // Assuming it's the only gRPC-related object around
} // Once the scope ends, assertion will be triggered, because cq was destroyed before b

Details:

In gRPC, C core is initialized once and shut down once. All C++ classes that need the core to be initialized inherit from GrpcLibraryCodegen. GrpcLibraryCodegen essentially makes C core reference-counted; each constructor increments, and each destructor decrements, the number of references to C core, and once the last reference is destroyed, the C core is shut down.

In this case, destroying Datastore destroys grpc::CompletionQueue, which happens to be the last reference to C core, so the line 154 (datastore.reset()) leads to global shutdown. The global shutdown, among other things, shuts down ExecCtx.

When EmptyCredentialsProvider::GetToken is called, the TokenListener (a std::function) is passed by value, so at the end of the call the destructor of TokenListener is called, which leads to the destruction of a lambda created by Datastore that contains a grpc::ByteBuffer. When a grpc::ByteBuffer is destroyed, it creates an ExecCtx, which fails because global shutdown has already been called on ExecCtx, leading to an assertion failure and a crash.

(Note that the fact that GetToken takes its argument by value isn't really an issue here; if the argument were taken by reference, the problem would surface when the credentials provider is destroyed. The root of the problem seems that the ByteBuffer-containing lambda may outlive gRPC core).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilhuff Re. the above:

  1. As far as gRPC is concerned, do you feel it might be a bug? (and hence, worth reporting)
  2. I presume we care about this case (Auth outlives Firestore) and don't want a crash there -- let me know if I misunderstand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted an issue to gRPC repo: grpc/grpc#16875

Copy link
Contributor

@wilhuff wilhuff Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re (1): this could be a bug, but realistically I think it's one we'll have to work around. Possibly this means that we should not be using ByteBuffers for anything except directly sending into/out of gRPC calls such that the construction order you're describing never happens. However, it's also possible I'm misunderstanding, because it seems like we really shouldn't get into a state where we've destroyed the completion queue before the last byte buffer we might have submitted into it.

Re (2): I don't think the issue is that we care so much about auth outliving firestore as we want to handle races where Firestore may be asked to shutdown while an auth request is pending. We should not crash in this circumstance.

In our public API shutdown is asynchronous, so we could work around this by performing teardown in two passes: a first pass to quiesce the system, inhibiting new requests and waiting for any outstanding ones and then tearing things down.

Alternatively, for any request that might outlive the system add some way to disconnect it such that when it calls back it doesn't attempt any action on the already destroyed system.

}

/*
TEST_F(DatastoreTest, AuthWhenDatastoreHasBeenShutDown) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will currently fail; I left it mainly for discussion. I don't know if it can be an issue -- it would depend on how likely it is for Datastore to be shut down but not destroyed, so that Auth has a chance to invoke its callback in-between.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't know how likely that is, but I suppose you could have Datastore record the fact that it's shutdown, and then check that in the callbacks that auth invokes? That way, if we ever end up in that situation, we can either (a) abort, or (b) do something intelligent, rather than just blinding proceeding as if the Datastore is still active.

Copy link
Contributor

@zxu123 zxu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a sync? Some changes I've saw in your other merged PR and do not belong to this PR.

@var-const
Copy link
Contributor Author

Can you do a sync? Some changes I've saw in your other merged PR and do not belong to this PR.

@zxu123 Hmm, looks correct to me. Can you point me to the duplicate changes? (perhaps you mean very similar tests between grpc_stream_test.h and stream_test.h?)

}

/*
TEST_F(DatastoreTest, AuthWhenDatastoreHasBeenShutDown) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't know how likely that is, but I suppose you could have Datastore record the fact that it's shutdown, and then check that in the callbacks that auth invokes? That way, if we ever end up in that situation, we can either (a) abort, or (b) do something intelligent, rather than just blinding proceeding as if the Datastore is still active.

@rsgowman rsgowman assigned var-const and unassigned rsgowman Oct 12, 2018
@var-const var-const merged commit d8bb9b3 into master Oct 14, 2018
var-const added a commit that referenced this pull request Oct 17, 2018
…eBuffer` right until the call is started (#1949)

(see [here](#1935 (comment)) and [here](grpc/grpc#16875) for context)

The problem with serializing a domain object immediately is that the resulting `ByteBuffer` is stored in a `std::function` within Auth. `ByteBuffer`s become invalid once gRPC core shuts down, so if Auth happens to outlive Firestore, once the `ByteBuffer`'s destructor is invoked, the app will crash.
@paulb777 paulb777 deleted the varconst/grpc-unit-tests-domain branch May 26, 2019 20:48
@firebase firebase locked and limited conversation to collaborators Oct 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants