-
Notifications
You must be signed in to change notification settings - Fork 1.6k
gRPC: add more unit tests for Stream
and Datastore
#1935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Shutdown(); | ||
datastore.reset(); | ||
|
||
EXPECT_NO_THROW(credentials.InvokeGetToken()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test currently fails, I'll fix before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this actually looks like a non-trivial issue.
The root of the problem is that there's an implicit dependency between grpc::CompletionQueue
and grpc::ByteBuffer
's lifetimes. The smallest repro is just:
{
grpc::Slice slice{"foo"};
grpc::ByteBuffer b{&slice, 1}; // Buffer must be non-empty
grpc::CompletionQueue cq; // Assuming it's the only gRPC-related object around
} // Once the scope ends, assertion will be triggered, because cq was destroyed before b
Details:
In gRPC, C core is initialized once and shut down once. All C++ classes that need the core to be initialized inherit from GrpcLibraryCodegen
. GrpcLibraryCodegen
essentially makes C core reference-counted; each constructor increments, and each destructor decrements, the number of references to C core, and once the last reference is destroyed, the C core is shut down.
In this case, destroying Datastore
destroys grpc::CompletionQueue
, which happens to be the last reference to C core, so the line 154 (datastore.reset()
) leads to global shutdown. The global shutdown, among other things, shuts down ExecCtx
.
When EmptyCredentialsProvider::GetToken
is called, the TokenListener
(a std::function
) is passed by value, so at the end of the call the destructor of TokenListener
is called, which leads to the destruction of a lambda created by Datastore
that contains a grpc::ByteBuffer
. When a grpc::ByteBuffer
is destroyed, it creates an ExecCtx
, which fails because global shutdown has already been called on ExecCtx
, leading to an assertion failure and a crash.
(Note that the fact that GetToken
takes its argument by value isn't really an issue here; if the argument were taken by reference, the problem would surface when the credentials provider is destroyed. The root of the problem seems that the ByteBuffer
-containing lambda may outlive gRPC core).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wilhuff Re. the above:
- As far as gRPC is concerned, do you feel it might be a bug? (and hence, worth reporting)
- I presume we care about this case (Auth outlives Firestore) and don't want a crash there -- let me know if I misunderstand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitted an issue to gRPC repo: grpc/grpc#16875
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re (1): this could be a bug, but realistically I think it's one we'll have to work around. Possibly this means that we should not be using ByteBuffers for anything except directly sending into/out of gRPC calls such that the construction order you're describing never happens. However, it's also possible I'm misunderstanding, because it seems like we really shouldn't get into a state where we've destroyed the completion queue before the last byte buffer we might have submitted into it.
Re (2): I don't think the issue is that we care so much about auth outliving firestore as we want to handle races where Firestore may be asked to shutdown while an auth request is pending. We should not crash in this circumstance.
In our public API shutdown is asynchronous, so we could work around this by performing teardown in two passes: a first pass to quiesce the system, inhibiting new requests and waiting for any outstanding ones and then tearing things down.
Alternatively, for any request that might outlive the system add some way to disconnect it such that when it calls back it doesn't attempt any action on the already destroyed system.
} | ||
|
||
/* | ||
TEST_F(DatastoreTest, AuthWhenDatastoreHasBeenShutDown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will currently fail; I left it mainly for discussion. I don't know if it can be an issue -- it would depend on how likely it is for Datastore
to be shut down but not destroyed, so that Auth has a chance to invoke its callback in-between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't know how likely that is, but I suppose you could have Datastore record the fact that it's shutdown, and then check that in the callbacks that auth invokes? That way, if we ever end up in that situation, we can either (a) abort, or (b) do something intelligent, rather than just blinding proceeding as if the Datastore is still active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a sync? Some changes I've saw in your other merged PR and do not belong to this PR.
@zxu123 Hmm, looks correct to me. Can you point me to the duplicate changes? (perhaps you mean very similar tests between |
} | ||
|
||
/* | ||
TEST_F(DatastoreTest, AuthWhenDatastoreHasBeenShutDown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't know how likely that is, but I suppose you could have Datastore record the fact that it's shutdown, and then check that in the callbacks that auth invokes? That way, if we ever end up in that situation, we can either (a) abort, or (b) do something intelligent, rather than just blinding proceeding as if the Datastore is still active.
…eBuffer` right until the call is started (#1949) (see [here](#1935 (comment)) and [here](grpc/grpc#16875) for context) The problem with serializing a domain object immediately is that the resulting `ByteBuffer` is stored in a `std::function` within Auth. `ByteBuffer`s become invalid once gRPC core shuts down, so if Auth happens to outlive Firestore, once the `ByteBuffer`'s destructor is invoked, the app will crash.
No description provided.