Skip to content

Hanging Firestore queries on version 10.4.0 #7652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomasdao opened this issue Sep 27, 2023 · 19 comments
Closed

Hanging Firestore queries on version 10.4.0 #7652

thomasdao opened this issue Sep 27, 2023 · 19 comments

Comments

@thomasdao
Copy link

Operating System

Both Mac and Windows

Browser Version

Electron 24.8.3

Firebase SDK Version

10.4.0

Firebase SDK Product:

Auth, Firestore, Functions, Storage

Describe your project's tooling

Electron 24.8.3

Describe the problem

Hi, since we updated Firebase to 10.4.0, we have received multiple reports from our users that they can't retrieve the data from Firestore server. We then downgraded to 10.3.1 and these users can retrieve data normally. From the release note, we believe this is related:

Fixed an issue where Cloud Firestore was incorrectly using XHR instead of fetch for streaming response.

Updated dependency grpc-js from version ~1.8.17 to ~1.9.0.

I also noticed that on 10.4.0, our query becomes a lot slower and frequently cannot complete. Switch back to 10.3.1 and the query becomes a lot faster.

Steps and code to reproduce issue

This issue only happens for 10.4.0.

@thomasdao thomasdao added new A new issue that hasn't be categoirzed as question, bug or feature request question labels Sep 27, 2023
@thomasdao
Copy link
Author

When I set the log level to debug, there're a lot of messages in the console as below, but I'm not sure if it's related:

[2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 10
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 42
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 44
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 46
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 48
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 50
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 52
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): WatchChangeAggregator Detected inactive target 54
index.esm2017.js:78 [2023-09-27T04:11:34.774Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Get last remote snapshot version
index.esm2017.js:78 [2023-09-27T04:11:34.775Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Apply remote event
index.esm2017.js:78 [2023-09-27T04:11:34.777Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: notifyLocalViewChanges
index.esm2017.js:78 [2023-09-27T04:11:34.777Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Release target
index.esm2017.js:78 [2023-09-27T04:11:34.780Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 sending: {"database":"projects/xxxxxx/databases/(default)","removeTarget":30}
index.esm2017.js:78 [2023-09-27T04:11:34.785Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Release target
index.esm2017.js:78 [2023-09-27T04:11:34.788Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 sending: {"database":"projects/xxxxxx/databases/(default)","removeTarget":32}
index.esm2017.js:78 [2023-09-27T04:11:34.793Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Release target
index.esm2017.js:78 [2023-09-27T04:11:34.796Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 sending: {"database":"projects/xxxxxx/databases/(default)","removeTarget":34}
index.esm2017.js:78 [2023-09-27T04:11:34.802Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Release target
index.esm2017.js:78 [2023-09-27T04:11:34.805Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 sending: {"database":"projects/xxxxxx/databases/(default)","removeTarget":36}
index.esm2017.js:78 [2023-09-27T04:11:34.809Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Release target
index.esm2017.js:78 [2023-09-27T04:11:34.812Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 sending: {"database":"projects/xxxxxx/databases/(default)","removeTarget":38}
index.esm2017.js:78 [2023-09-27T04:11:34.817Z]  @firebase/firestore: Firestore (10.4.0): MemoryPersistence Starting transaction: Release target
index.esm2017.js:78 [2023-09-27T04:11:34.820Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 sending: {"database":"projects/xxxxxx/databases/(default)","removeTarget":40}
index.esm2017.js:78 [2023-09-27T04:11:34.881Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[42]}}
index.esm2017.js:78 [2023-09-27T04:11:34.882Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[44]}}
index.esm2017.js:78 [2023-09-27T04:11:34.882Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[46]}}
index.esm2017.js:78 [2023-09-27T04:11:34.882Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[48]}}
index.esm2017.js:78 [2023-09-27T04:11:34.882Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[50]}}
index.esm2017.js:78 [2023-09-27T04:11:34.882Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[52]}}
index.esm2017.js:78 [2023-09-27T04:11:34.882Z]  @firebase/firestore: Firestore (10.4.0): WebChannelConnection RPC 'Listen' stream 0x3e730da7 received: {"targetChange":{"targetChangeType":"ADD","targetIds":[54]}}

@jbalidiong jbalidiong added api: firestore needs-attention and removed api: auth new A new issue that hasn't be categoirzed as question, bug or feature request labels Sep 27, 2023
@dconeybe
Copy link
Contributor

Drive-by comment: I'm looking into another issue that has those strange "Detected inactive target" log messages: #7641. I wonder if that is related to this one.

@thomasdao: Can you provide a longer debug log than the one in your comment above? I'm quite interested to see what log messages appeared before the "Detected inactive target" log messaged.

@thomasdao
Copy link
Author

Hi @dconeybe, I can send you the whole debug log, however since it contains some private data, how do I send it to you? Thanks!

@dconeybe
Copy link
Contributor

Hi @dconeybe, I can send you the whole debug log, however since it contains some private data, how do I send it to you?

The best way would be to go to https://firebase.google.com/support, select "Bugs and Features", select "Bug Report", and open a bug. You can attach the log to that bug. Note that there is a disclaimer about PII at the bottom of that page so make sure that you're okay with that before uploading your logs. Make sure to include the URL of this GitHub issue in the issue's description as well as my username. The ticket should get routed to me, but it could take a day or two. I'm especially interested in the "addTarget" and "removeTarget" messages, in addition to the logs messages that you've already posted.

Alternately, feel free to redact the sensitive information from the log before posting it, such as the database ID, document names, and document contents, and post the logs publicly.

Could you tell me, are the documents that seem to be causing this problem "large", in that they have a lot of fields?

@thomasdao
Copy link
Author

thomasdao commented Sep 27, 2023

Hi @dconeybe, I added the log below, some data has been redacted with <<< Data received >>>. For easier comparison, I attach the log for both 10.3.1 and 10.4.0.

For 10.3.1, the query completed within a few seconds, but for 10.4.0, the query still stucks for more than 10 minutes at the time I took the log. Even if I waited longer, Firebase did not print any more log, so it looks like the query is stuck.

Could you tell me, are the documents that seem to be causing this problem "large", in that they have a lot of fields?

My document contains around 10 fields, usually less than 100KB. I used pagination to retrieve data, each page contains only 100 documents.

Log for 10.4.0: https://drive.google.com/file/d/1yHTwXPLw9FWi7M5ZWV4Ije-cNw8fnbVH/view?usp=sharing

Log for 10.3.1: https://drive.google.com/file/d/1Dq7ypEXYp1BLEwIXGOH_pYdHbMIBq3TE/view?usp=sharing

@MarkDuckworth MarkDuckworth self-assigned this Sep 27, 2023
@dconeybe
Copy link
Contributor

dconeybe commented Sep 27, 2023

Update: After comparing the logs for this issue with the logs from the potentially-related issue #7641, it appears that they are distinct issues with the same presenting problem (i.e. the callback registered with onSnapshot() is not being called).

Namely, in #7641 the root cause appears to be the server sending multiple targetChangeType=ADD messages for a single target, causing the client's internal state to become invalid and erroneously discard the document data that it is sent. In comparison, the logs for this issue do not show multiple targetChangeType=ADD messages for a single target (i.e. they show exactly 1 such message, as expected), leading me to conclude that it has a different root cause and needs to be investigated independently.

@MarkDuckworth
Copy link
Contributor

@thomasdao, This would be easiest to diagnose if we can reproduce it. Can you provide some example code that shows us what your application is doing, or preferably, can you provide a minimal reproducible example electron app?

Also, since an Electron app has both node.js and chromium processes, we inferred from the logs that the failing code is running in chromium/browser window. Can you confirm that your affected code is only running in the browser window? If so, this rules out the grpc-js version change.

@thomasdao
Copy link
Author

@MarkDuckworth I setup Firebase on Web Worker of a browser window, however the browser window was setup with extra options nodeIntegration: true andnodeIntegrationInWorker: true. In my code, I retrieve documents from server as below. This code works fine for all Firebase version except 10.4.0.

async function retrievePaginatedData (options) {
  let qry = getQuery(options)
  let snapshots = await getDocsFromServer(qry)
  const { namespace, batchSize } = options

  postBatchData(namespace, snapshots)
  if (batchSize) {
    let docs = snapshots.docs
    while (docs.length === batchSize) {
      const lastSnapshot = docs[docs.length-1]
      const newOpts = Object.assign({}, options, {
        lastSnapshot
      })
      qry = getQuery(newOpts)

      if (self.gc) {
        self.gc()
      }

      snapshots = await getDocsFromServer(qry)

      // Update result array
      postBatchData(namespace, snapshots)

      // Update docs length
      docs = snapshots.docs
    }
  }

  if (self.gc) {
    self.gc()
  }
}

function getQuery (options) {
  const user = getAuth().currentUser
  if (!user) return null

  const {
    namespace, 
    queryTime = null,
    lastQueryTime = null,
    batchSize = null, 
    lastSnapshot = null, 
    shouldQueryWithSyncedAt = false
  } = options

  const userId = user.uid

  const firestore = getFirestore()

  const colRef = collection(firestore, namespace)
  const args = [colRef]

  if (queryTime) {
    args.push(where('timestamp', '<', new Date(time)))
  }

  if (lastQueryTime) {
    args.push(where('timestamp', '>', new Date(time)))
  }

  if (batchSize) {
    args.push(orderBy('timestamp', 'desc'))
    args.push(orderBy('id'))

    if (lastSnapshot) {
      args.push(startAfter(lastSnapshot))
    }

    args.push(limit(batchSize))
  }

  return query(...args)
}

// Post data to host window
function postBatchData (namespace, querySnapshot) {
  const queryData = []

  querySnapshot.forEach(docData => {
    queryData.push(docData.data())
  })

  promWorker.postMessage({
    type: 'ON_SNAPSHOT',
    content: {
      namespace: namespace,
      result: queryData
    }
  })

  return queryData
}

@wu-hui wu-hui self-assigned this Oct 2, 2023
@MarkDuckworth MarkDuckworth removed their assignment Oct 2, 2023
@wu-hui
Copy link
Contributor

wu-hui commented Oct 3, 2023

Hi @thomasdao

This is because we introduced a fix for our memory usage earlier(#7569). That fix unfortunately changed how we use send requests from fetch to xhr which does not work with web workers.

Please avoid 10.4.0 and onwards until we fix this if your app run from web workers.

Apologies for the issue.

@wu-hui
Copy link
Contributor

wu-hui commented Oct 4, 2023

I might be wrong with my last comment, it's probably not about fetch vs xhr after all.

@thomasdao , is it possible if you could provide a full mini-reproduction? Electron and web workers are not usual environments we test our SDK with, and we need some help to reliably reproduce this. A github repo with instructions about how to reproduce would be greatly appreciated.

Please do remember to strip out your project configs (api key, project id, etc).

@thomasdao
Copy link
Author

@wu-hui hi, I've created a minimal sample project on GitHub and have invited you as collaborator. Please let me know if you can reproduce the issue on your side, thank you!

@thomasdao
Copy link
Author

Hi @MarkDuckworth @dconeybe, I just want to check if anyone is interested in the sample project so I can send an invitation? Thank you.

@dconeybe
Copy link
Contributor

@MarkDuckworth are you able to take a look?

@dconeybe
Copy link
Contributor

@thomasdao I just took a quick look at your reproduction app, without actually running it (thanks for providing that, by the way!). One thing I noticed is that your app is using very basic Firestore features, features which are present in the "lite" version of the Firestore SDK: https://firebase.google.com/docs/firestore/solutions/firestore-lite. I know it's not a "fix" per se, but if the lite sdk meets your needs then I'd recommend giving it a try since it has a much smaller bundle size and the networking implementation is much simpler (simple "REST" calls instead of complex WebChannel bi-directional streams).

To try it out, just make these changes:

  1. Replace 'firebase/firestore' with 'firebase/firestore/lite' in import statements
  2. Replace getDocsFromServer() with getDocs() (the lite sdk always goes straight to the server)

Your comments also mentioned that "...if I remove target field to choose the Firebase bundle for Node, then the query can complete normally" which suggests a problem with WebChannel. By using Firebase for Node it would be using grpc-js for the bi-directional network communication rather than WebChannel.

Anyways, I'll see if @MarkDuckworth can dig in further based on your response.

@MarkDuckworth MarkDuckworth self-assigned this Oct 18, 2023
@MarkDuckworth
Copy link
Contributor

This may be the same issue as #7641. I was able to use the repro from @thomasdao, and I saw duplicate target adds for a single target ID. This reproduction is very helpful and should help us resolve the issue faster.

@thomasdao
Copy link
Author

Hi @dconeybe, I know about firestore-lite, however in my project I need to use firestore because I need to listen to data changes from the server. I hope @MarkDuckworth can find out the issue :)

@MarkDuckworth
Copy link
Contributor

@thomasdao, the data in the collection used by your test app, is that test data that can be copied into our database? I'm having trouble reproducing using your reproduction app against my own data in my db.

@thomasdao
Copy link
Author

@MarkDuckworth yes you can clone to your database, please delete them when you're done, thank you!

@MarkDuckworth MarkDuckworth changed the title Error with Firestore on version 10.4.0 Hanging Firestore queries on version 10.4.0 Oct 27, 2023
@MarkDuckworth
Copy link
Contributor

@thomasdao, we released v10.5.2, which rolls back the change in v10.4.0 that was responsible for hanging queries. I verified the fix on two reproductions, including yours. Thanks for your help figuring this out!

MarkDuckworth added a commit that referenced this issue Nov 17, 2023
* google-closure-library: git://github.com/google/closure-library.git#7818ff7

* Tests and hard asserts for hanging query issue - #7652

* Re-enable useFetchStreams

* Create orange-pianos-push.md

* Revert change to yarn.lock for rules-unit-testing/functions/

* git+https for google-closure-library

* Update yarn.lock

* Yarn.lock again

* git+https in yarn.lock

* Increase timeout setting up the repro for hanging queries.

* Clean up error message for hardAssert based on peer feedback.
@firebase firebase locked and limited conversation to collaborators Nov 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants