Firestore silently stops saving changes #2930

rogueg · 2020-04-17T02:03:40Z

[REQUIRED] Describe your environment

Operating System version: OSX Mojave
Browser version: Safari 13.1 & Brave 1.5 (Chromium 80)
Firebase SDK version: 7.8.2
Firebase Product: firestore (auth, database, storage, etc)

[REQUIRED] Describe the problem

A few times, I've seen Firestore in a state where changes stop getting written to the cloud, or even IndexDB. There are no errors in the console, and only a refresh seems to fix it. Upon refreshing the page, the user's changes are simply gone. 😢

Stepping through the code in this state, I can see that the write is enqueued in AsyncQueue, but never gets executed. The AsyncQueue has not failed, but I don't know how to determine what it's waiting on. I have found it useful to run:

  > (Date.now() - Firebase.firestore()._queue.delayedOperations[0].targetTimeMs) / 1000 / 60
  < 291.1877
  > Firebase.firestore()._queue.delayedOperations[0].timerId
  < "client_metadata_refresh"

I'm not sure that delayed operation is actually blocking, but it's a rough proxy for how long Firestore has been stuck.

Steps to reproduce:

Sadly, I haven't figured out a way to reliably reproduce. It only seems to happen when the page has been open for hours, and the machine has gone to sleep several times.

Relevant Code:

rogueg@12dbc5e

I've been using this fork for a few weeks, and haven't encountered the issue since. It's crude, and I'm pretty ignorant of Firestore's internals, so I wouldn't be surprised if it introduces some bugs...but it seems preferable to losing user data for hours on end.

I saw @schmidt-sebastian's #2879 go by (exciting! 🎉) and thought this might be worth raising while it's still relatively top-of-mind.

The text was updated successfully, but these errors were encountered:

schmidt-sebastian · 2020-04-17T02:54:15Z

Thanks for this report and all the hard research you put into this! Do you happen to have logging enabled and can provide the last log lines that Firestore printed? This is going to be quite difficult to debug, but if we knew what happened right before Firestore got stuck it might give us a clue.

Unfortunately, most of our Async operations are not idempotent, so retrying by default is not possible. I will go through the SDK for the next couple of weeks to see where we can safely retry.

rogueg · 2020-04-17T17:28:42Z

I don't, but I can modify the logging to write to a buffer and dump it whenever Firestore gets stuck.

I do record the body of op whenever a timeout happens, and most of them are this:

function () {
    return _this.syncEngine.write(mutations, deferred);
}

Are all indexDB operations retryable? Since indexDB is known to have transient failures, it seems like they would need to be, or risk losing data.

For operations that can't be retried, could the AsyncQueue skip over them and keep going? It's unfortunate that any error will halt firestore and require a restart, losing any changes that are queued up.

schmidt-sebastian · 2020-04-17T17:56:20Z

All IndexedDB transactions are retried up to three times:

firebase-js-sdk/packages/firestore/src/local/simple_db.ts

Line 278 in a789d74

while (true) {

It seems like there are a lot of cases where these retries are not helping, e.g. when a page is backgrounded and Safari blocks file system access. The retry we are building now sits on top of the IndexedDB retry and will retry IndexedDB plus its consuming code with backoff - but unlike the IndexedDB retry, it retries the entire AsyncQueue operation. We cannot add backoff only for IndexedDB as it would slow down all other client operations as we wait for the result.

rogueg · 2020-04-17T19:06:28Z

Nice! I didn't know about the retry in SimpleDB

In my fork, I have a fixed 5s wait between attempts. I log the attempt number when it fails, and can see that most operations only fail once, so I think your idea of adding a retry with backoff is a good one.

I think I'm asking: is it safe to use enqueueRetryable for every operation that touches indexDB? You said "most of our Async operations are not idempotent", does that include ones that touch indexDB? It seems like all of those operations would be susceptible to transient failures.

Also, what do you think about adding a timeout to all retryable operations? That would help in cases like mine where the underlying api hangs rather than throwing an error.

schmidt-sebastian · 2020-04-17T21:06:51Z

The goal is to use enqueueRetryable() for anything that can be delayed without compromising data integrity and to convey failures for operations that can't be delayed via the API (e.g. when we cannot add a user write to our local cache).

We can take a look at timeouts if we see more demand for those. I would think that the timeouts you are hitting are actual problems in code (or the browser). For now, I would like to prioritize finding the root cause.

google-oss-bot · 2020-04-24T01:05:08Z

Hey @rogueg. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

google-oss-bot · 2020-04-30T01:14:33Z

Hey @rogueg. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

google-oss-bot · 2020-04-30T01:14:34Z

Since there haven't been any recent updates here, I am going to close this issue.

@rogueg if you're still experiencing this problem and want to continue the discussion just leave a comment here and we are happy to re-open this.

google-oss-bot added the api: firestore label Apr 17, 2020

schmidt-sebastian self-assigned this Apr 17, 2020

schmidt-sebastian added the needs-info label Apr 22, 2020

google-oss-bot added the no-recent-activity label Apr 24, 2020

google-oss-bot added the closed-by-bot label Apr 30, 2020

google-oss-bot closed this as completed Apr 30, 2020

firebase locked and limited conversation to collaborators May 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firestore silently stops saving changes #2930

Firestore silently stops saving changes #2930

rogueg commented Apr 17, 2020

schmidt-sebastian commented Apr 17, 2020

rogueg commented Apr 17, 2020

schmidt-sebastian commented Apr 17, 2020

rogueg commented Apr 17, 2020

schmidt-sebastian commented Apr 17, 2020

google-oss-bot commented Apr 24, 2020

google-oss-bot commented Apr 30, 2020

google-oss-bot commented Apr 30, 2020

Firestore silently stops saving changes #2930

Firestore silently stops saving changes #2930

Comments

rogueg commented Apr 17, 2020

[REQUIRED] Describe your environment

[REQUIRED] Describe the problem

Steps to reproduce:

Relevant Code:

schmidt-sebastian commented Apr 17, 2020

rogueg commented Apr 17, 2020

schmidt-sebastian commented Apr 17, 2020

rogueg commented Apr 17, 2020

schmidt-sebastian commented Apr 17, 2020

google-oss-bot commented Apr 24, 2020

google-oss-bot commented Apr 30, 2020

google-oss-bot commented Apr 30, 2020