Skip to content

Firestore silently stops saving changes #2930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rogueg opened this issue Apr 17, 2020 · 8 comments
Closed

Firestore silently stops saving changes #2930

rogueg opened this issue Apr 17, 2020 · 8 comments

Comments

@rogueg
Copy link

rogueg commented Apr 17, 2020

[REQUIRED] Describe your environment

  • Operating System version: OSX Mojave
  • Browser version: Safari 13.1 & Brave 1.5 (Chromium 80)
  • Firebase SDK version: 7.8.2
  • Firebase Product: firestore (auth, database, storage, etc)

[REQUIRED] Describe the problem

A few times, I've seen Firestore in a state where changes stop getting written to the cloud, or even IndexDB. There are no errors in the console, and only a refresh seems to fix it. Upon refreshing the page, the user's changes are simply gone. 😢

Stepping through the code in this state, I can see that the write is enqueued in AsyncQueue, but never gets executed. The AsyncQueue has not failed, but I don't know how to determine what it's waiting on. I have found it useful to run:

  > (Date.now() - Firebase.firestore()._queue.delayedOperations[0].targetTimeMs) / 1000 / 60
  < 291.1877
  > Firebase.firestore()._queue.delayedOperations[0].timerId
  < "client_metadata_refresh"

I'm not sure that delayed operation is actually blocking, but it's a rough proxy for how long Firestore has been stuck.

Steps to reproduce:

Sadly, I haven't figured out a way to reliably reproduce. It only seems to happen when the page has been open for hours, and the machine has gone to sleep several times.

Relevant Code:

rogueg@12dbc5e

I've been using this fork for a few weeks, and haven't encountered the issue since. It's crude, and I'm pretty ignorant of Firestore's internals, so I wouldn't be surprised if it introduces some bugs...but it seems preferable to losing user data for hours on end.

I saw @schmidt-sebastian's #2879 go by (exciting! 🎉) and thought this might be worth raising while it's still relatively top-of-mind.

@schmidt-sebastian
Copy link
Contributor

Thanks for this report and all the hard research you put into this! Do you happen to have logging enabled and can provide the last log lines that Firestore printed? This is going to be quite difficult to debug, but if we knew what happened right before Firestore got stuck it might give us a clue.

Unfortunately, most of our Async operations are not idempotent, so retrying by default is not possible. I will go through the SDK for the next couple of weeks to see where we can safely retry.

@rogueg
Copy link
Author

rogueg commented Apr 17, 2020

I don't, but I can modify the logging to write to a buffer and dump it whenever Firestore gets stuck.

I do record the body of op whenever a timeout happens, and most of them are this:

function () {
    return _this.syncEngine.write(mutations, deferred);
}

Are all indexDB operations retryable? Since indexDB is known to have transient failures, it seems like they would need to be, or risk losing data.

For operations that can't be retried, could the AsyncQueue skip over them and keep going? It's unfortunate that any error will halt firestore and require a restart, losing any changes that are queued up.

@schmidt-sebastian schmidt-sebastian self-assigned this Apr 17, 2020
@schmidt-sebastian
Copy link
Contributor

All IndexedDB transactions are retried up to three times:

It seems like there are a lot of cases where these retries are not helping, e.g. when a page is backgrounded and Safari blocks file system access. The retry we are building now sits on top of the IndexedDB retry and will retry IndexedDB plus its consuming code with backoff - but unlike the IndexedDB retry, it retries the entire AsyncQueue operation. We cannot add backoff only for IndexedDB as it would slow down all other client operations as we wait for the result.

@rogueg
Copy link
Author

rogueg commented Apr 17, 2020

Nice! I didn't know about the retry in SimpleDB

In my fork, I have a fixed 5s wait between attempts. I log the attempt number when it fails, and can see that most operations only fail once, so I think your idea of adding a retry with backoff is a good one.

I think I'm asking: is it safe to use enqueueRetryable for every operation that touches indexDB? You said "most of our Async operations are not idempotent", does that include ones that touch indexDB? It seems like all of those operations would be susceptible to transient failures.

Also, what do you think about adding a timeout to all retryable operations? That would help in cases like mine where the underlying api hangs rather than throwing an error.

@schmidt-sebastian
Copy link
Contributor

The goal is to use enqueueRetryable() for anything that can be delayed without compromising data integrity and to convey failures for operations that can't be delayed via the API (e.g. when we cannot add a user write to our local cache).

We can take a look at timeouts if we see more demand for those. I would think that the timeouts you are hitting are actual problems in code (or the browser). For now, I would like to prioritize finding the root cause.

@google-oss-bot
Copy link
Contributor

Hey @rogueg. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

1 similar comment
@google-oss-bot
Copy link
Contributor

Hey @rogueg. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

@google-oss-bot
Copy link
Contributor

Since there haven't been any recent updates here, I am going to close this issue.

@rogueg if you're still experiencing this problem and want to continue the discussion just leave a comment here and we are happy to re-open this.

@firebase firebase locked and limited conversation to collaborators May 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants