Skip to content

Hanging query for Firestore #7860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomasdao opened this issue Dec 12, 2023 · 47 comments
Closed

Hanging query for Firestore #7860

thomasdao opened this issue Dec 12, 2023 · 47 comments

Comments

@thomasdao
Copy link

Operating System

Both Mac and Windows

Browser Version

Chrome, Electron Browser window

Firebase SDK Version

10.7.0, 10.7.1

Firebase SDK Product:

Firestore

Describe your project's tooling

Plain Electron app

Describe the problem

This is the new ticket for hanging query issue, follow up from #7771 and #7652

When update Firebase to 10.7.0 and 10.7.1, the query becomes a lot slower and frequently stuck with error below:

@firebase/firestore: Firestore (10.7.0): WebChannelConnection RPC 'Listen' stream 0x5b9a037f transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1}

Switch back to 10.6.0 and the query completes quickly.

Steps and code to reproduce issue

I've created a minimal sample to reproduce this issue and have shared with @MarkDuckworth, if you need to get access to the private repo, please let me know, thank you!

@thomasdao thomasdao added new A new issue that hasn't be categoirzed as question, bug or feature request question labels Dec 12, 2023
@thomasdao thomasdao changed the title Hanging query for FireStore Hanging query for Firestore Dec 12, 2023
@jbalidiong jbalidiong added needs-attention and removed new A new issue that hasn't be categoirzed as question, bug or feature request labels Dec 12, 2023
@ehsannas ehsannas self-assigned this Dec 12, 2023
@ehsannas
Copy link
Contributor

Thanks for reporting @thomasdao. I'll try to reproduce it

@thomasdao
Copy link
Author

@ehsannas thanks, I've invited you to the sample project :)

@ehsannas
Copy link
Contributor

Thanks @thomasdao . I am able to see the error in the logs from your repo. I do, however, see that each such log message is followed by an UNAVAILABLE code from the backend. Which means it's a legitimate error returned from the backend to the SDK. It's plausible that the newer WebChannel version has become much more efficient at sending parallel requests to the backend such that you're hitting a certain limit of request rate for a single client. This error code is retryable with a backoff, which means the SDK will recover and rerun the query after some delay.

Please take a look at:
https://firebase.google.com/docs/firestore/real-time_queries_at_scale#understand_high_write_traffic_in_the_system
https://firebase.google.com/docs/firestore/best-practices#ramping_up_traffic

@thomasdao
Copy link
Author

thomasdao commented Dec 14, 2023

@ehsannas I've never seen the UNAVAILABLE code, even if I wait for more than 10 minutes.

I find the reason newer WebChannel version has become much more efficient at sending parallel requests not really logical: the same type of query works with version 10.6.0, which indicates that the server is able to handle that query and the problem is likely with the newer version of the client.

I've tested adding a delay of 1 second between each paginated query to reduce server load, and see the same error @firebase/firestore: Firestore (10.7.0): WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1}.

@phileasthefogg
Copy link

I'm also running into this error. Subscription seems to work fine for a while and then gets dropped with the same RPC 'Listen' stream transport error. Any ideas on what this might be or where to catch the error?

@IvanKYW
Copy link

IvanKYW commented Feb 6, 2024

Same issue after upgrade AngularFire to 17.0.1 which depends on firebase ^10.7.0.

One of our project query becomes slower and run into the @firebase/firestore: Firestore (10.7.2): WebChannelConnection RPC 'Listen' stream error occasionally. The other smaller project works fine.

Tried experimentalForceLongPolling mentioned in #7968 but no luck. downgrade to 10.6.0 seems resolve the issue.

@ghinda
Copy link

ghinda commented Feb 7, 2024

I'm also seeing the same issue with hanging snapshot queries for a while, with the same type of WebChannelConnection RPC 'Listen' stream ... transport error.

Sometimes, after failing with the error, the snapshot query retries and returns correct data after a couple of minutes, but most times it just hangs indefinitely. In our case, it only happens with queries that would return a large amount of data (hundreds of docs containing fairly large strings).

The issues started with versions 10.4.x. They were then fixed in versions 10.6.x, but are now back again with 10.7.x. I've also tested the latest 10.8.0, and the issue is still there. As a summary:

  • 10.3.1: issue not present
  • 10.4.x: issue shows up
  • 10.5.x: issue still present
  • 10.6.x: issue fixed
  • 10.7.x / 10.8.0: issue shows up again

Using experimentalForceLongPolling does not seem to make a difference.

I wasn't able to reproduce it in a local or staging environment, as it only seems to show up in our production environment where we have around ~40K snapshot listeners / ~10K active connections, as reported in the Firebase console.

@MrDavidRios
Copy link

I'm also running into this error since upgrading to v10.7.0, and much like @phileasthefogg, getting the same RPC 'listen' stream transport error. This is a small project (< 10 active connections at a time), and I'm able to reproduce it in both local and production environments.

@thomasdao
Copy link
Author

Hi @ehsannas, not sure if you have been able to work on this issue? Maybe @MarkDuckworth can take a look. This issue has prevented us from updating to the latest version. Thank you!

@alex-dokienko
Copy link

same issue happens for my project (using flutter), in the beginning everything was fine (I've being using firestore for about 6months) but now suddenly getting all the time (maybe data sets grown, due to smaller db size didn't experience it before)

@ehsannas
Copy link
Contributor

@MrDavidRios Would you be able to share your project in which you're able to consistently reproduce this issue? (feel free to point me to a github repo). Thanks!

@hiroro-work
Copy link

This phenomenon seems to be more likely to occur in a slow network environment.
By setting "Fast 3G" or "Slow 3G" in Network of DevTools, we were able to reproduce the phenomenon even in an environment where it does not usually occur.

@dconeybe
Copy link
Contributor

dconeybe commented Mar 4, 2024

(note to googlers: this may be related to support case b/325591749, which reports similar webchannel issues when the network is throttled)

@jorgsiegel
Copy link

Same thing happens in our project. Unfortunately I can't downgrade to firebase 10.6.0 (without much effort) because of AngularFire and Angular dependencies.
It still happens on firebase 10.9.0 ...

@thomasdao
Copy link
Author

This issue happened since December last year, affect multiple project but did not receive any update. I'm on Blaze plan but cannot update the library to the latest version and it's really frustrating. Could you please share if any of you are investigating this issue? Thank you! @MarkDuckworth @dconeybe @ehsannas

@MarkDuckworth MarkDuckworth self-assigned this Apr 4, 2024
@MarkDuckworth
Copy link
Contributor

@thomasdao, I'll touch base with the team and see if I can move this forward.

@jorgsiegel
Copy link

This problem affects users in our production apps.
We are also in the middle of developing a new app and can consistently reproduce the error. It seems to be connected to the size of Firestore documents. Our documents are max. 300,000 bytes, which is far below the limit specified on the official Firestore documentation page (1 MiB / 1,048,576 bytes) and we are fetching max. 40 documents in a single query.

We would highly appreciate if the Firebase team could check what changed in recent versions and fix it soon.

@Valansch
Copy link

Valansch commented Apr 4, 2024

Thank you @MarkDuckworth.

Just to second @thomasdao & @jorgsiegel,
this has long been a part of the stable releases and effects our users. For various reasons we are unable to downgrade.
We have a long living gcp ticket open regarding this.
I have a feeling this happens more often the bigger the result set is. We run an SPA, where we stream about 5000 documents. All well in the region of 1KB. When the queries fail they restart over and over. Resulting in the client downloading 100MB what should be 5MB. We have no workaround for this.

Would really appreciate to see some progress here.

@valeriangalliat
Copy link

We're also encountering this issue (running 10.8)

Tried 10.11 and it's still happening, but as suggested above downgrading to 10.6 fixed it

@dconeybe
Copy link
Contributor

dconeybe commented Apr 12, 2024

I have a potential fix for this issue. Would anyone be willing/able to test it out? The fix is in #8145 (NOTE: it is still a work-in-progress). Please comment on the PR with the outcome of your experiment (rather than commenting here on the issue).

You will need to build the firestore sdk for yourself, but, thankfully, it's relatively straight forward.

  1. npm install -g yarn
  2. git clone --depth 100 https://github.com/firebase/firebase-js-sdk.git (if using an existing clone of this repo, make sure you're at a commit that includes #8145) git clone -b dconeybe/WebChannelOnOpenFix_Bug325591749 --depth 100 https://github.com/firebase/firebase-js-sdk.git
  3. cd firebase-js-sdk
  4. yarn
  5. yarn build
  6. cd packages/firestore
  7. yarn build:debug
  8. cp -r dist ~/YOUR_PROJECT/node_modules/@firebase/firestore
  9. rebuild your project and test it out

Note that the --depth 100 argument to git is just an optimization to pull about 8MB instead of 30MB. Feel free to omit that argument.

Note that the extra yarn build:debug command is optional, and produces Firestore's index.esm2017.js with all of the code mangling, code stripping, and optimizations disabled. This will produce more readable compiled code and stack traces without mangled names that are much easier to make sense of.

The "cp" command will copy the compiled Firestore JavaScript bundles into your own project's node_modules directory, clobbering the ones that npm downloaded. Make sure to restore the production version (e.g. by deleting the node_modules directory and re-running npm install) when done testing out this fix.

@thomasdao
Copy link
Author

@MarkDuckworth I check out your branch and follow the instruction from #7860 (comment). Please see the log attached, thanks!

firebase_log.txt

@MarkDuckworth
Copy link
Contributor

Thanks @thomasdao.

In my local tests, when I see WebChannelConnection RPC 'Listen' stream X transport errored: ..., the STAT_EVENT logging shows that the root cause was expected/normal. Furthermore I saw the SDK recover gracefully.

In your logs, the STAT_EVENTs leading up to the WebChannelConnection error are different. I'm trying to understand why. The repro that you previously shared with me is not currently reproducing this error. Does that shared repo still reproduce the issue for you?

@MarkDuckworth
Copy link
Contributor

Also @thomasdao, can you provide the Firebase project ID you used when creating firebase_log.txt? Is it the same project ID from your shared repro? We want to review server logs.

@thomasdao
Copy link
Author

thomasdao commented Apr 24, 2024

@MarkDuckworth

The repro that you previously shared with me is not currently reproducing this error. Does that shared repo still reproduce the issue for you?

Yes, I can still reproduce this issue. Sometimes the query can complete, but the next time I run it again, the query would hang.

Is it the same project ID from your shared repro?

Yes it's the same project ID.

@MarkDuckworth
Copy link
Contributor

Version 10.11.1 was released today and rolls back the WebChannel config to be equivalent to the 10.6 (and 10.5.2) releases. I have tested with @thomasdao's reproduction and I'm seeing the queries complete consistently and quickly. Errors WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1} were not observed.

@thomasdao
Copy link
Author

@MarkDuckworth thank you, I tried 10.11.1 and found the query can complete quickly.

Just curious, is WebChannel really superior to the FetchXmlHttpFactory? What's the problem with FetchXmlHttpFactory?

@IslamElKassas
Copy link

Friends,
It is already fixed by firebase team in the newest Version 10.11.1 - April 25, 2024

Cloud Firestore
Prevent spurious "Backend didn't respond within 10 seconds" errors when network is in fact responding, but slowly. See GitHub PR #8145.
https://firebase.google.com/support/release-notes/js

@thesoicalapp91
Copy link

thesoicalapp91 commented Aug 14, 2024

Wed 14 Aug 2024 - Still happening in "firebase": "^10.12.5". This issue is constant to the point where firebase (therefore the app) is completely unusable. Downgrading to 10.6 did not fix the issue nor using;

experimentalForceLongPolling: true,
useFetchStreams: false,

Can somebody share some light on what's going on with this issue, is it even being addressed? It's never been a problem and I've been using firbease for 3 / 4 years now.

@rakesh-snippyly
Copy link

We are also facing this issue!! It has impacted all our users in prod! Please help us fix this asap!!!

@fabhed
Copy link

fabhed commented Aug 20, 2024

Started getting this in production as well. Been on an older version 10.1.0 for a long time without any issues, until today.

@firebase/firestore: Firestore (10.1.0): Could not reach Cloud Firestore backend. Connection failed 1 times. Most recent error: FirebaseError: [code=unavailable]: The operation could not be completed

image

Update: Trying the latest sdk version, the issue still persisted.

@junhyeokkwak
Copy link

This is impacting our production app (oversee.shop) as well. Entire auth & direct connections to Firebase are completely down.

@vitorbarbosa19
Copy link

Same issue in our prod app

@ehsannas
Copy link
Contributor

Hi folks, did you change the SDK version you're using in production?

@vitorbarbosa19
Copy link

Now I'm using 10.13.0, but because I thought an upgrade could fix the issue.
Before the errors started to appear, I was in 10.11.1.

@junhyeokkwak
Copy link

We also updated to 10.13.0 and still facing the issue.

@fabhed
Copy link

fabhed commented Aug 20, 2024

Got this from the GCP support:

By the details I have found , it seems that there are several customers impacted at the moment , but our product specialist team is currently engaged to address the matter as soon as possible. I will try to see if there is more information that can be gathered through this channel so we can expedite the resolution

@MarkDuckworth
Copy link
Contributor

A WebChannelConnection transport error does not necessarily indicate that the behavior you are seeing is related to the original issue, which was fixed in 10.11.1. There are normal network conditions that could lead to the failure of the Web Channel transport. The SDK is able to recover from these.

We may be seeing a different issue/behavior, so what may be more helpful to understand is if you are also seeing "INTERNAL ASSERTION FAILED" in your SDK logs? And also, when you see these WebChannelConnection errors, does the SDK then resume normal behavior? If so, how long does it take.

If you are able to create and share SDK debug logs with us, that will help us diagnose. You can share logs in a GitHub Gist or through a private repo shared with me and @ehsannas. Thanks.

@rakesh-snippyly
Copy link

for us it never recovered. It just stopped making any network calls altogether. We had not updated the sdk version and then all of the sudden it broke.

@MarkDuckworth
Copy link
Contributor

Thanks for the info @rakesh-snippyly.

For anyone experiencing a sudden start of this behavior without updating the SDK, are you able to provide an approximate time you started experiencing it?

@junhyeokkwak
Copy link

junhyeokkwak commented Aug 20, 2024

The issue started happening for us without the SDK version update. Not sure the exact time it started happening, but sometime between 5pm EDT yesterday (Monday Aug 19) and 10am EDT today (Tuesday Aug 20)

@junhyeokkwak
Copy link

junhyeokkwak commented Aug 20, 2024

For some reason, it just starting working for us (without any changes on the deployed version)

@rakesh-snippyly
Copy link

same thing happened to us - its working now on its own!
We should find what happened.

@junhyeokkwak
Copy link

Same, would love to know what happened here

@rakesh-snippyly
Copy link

Hi any updates?

@ehsannas
Copy link
Contributor

We should find what happened

Yes. We are tracking the investigation internally (Googlers see b/361546680 and b/361546352). While issues that appear and disappear without SDK changes often point to some issue in the network and/or backend, we will also take the action to improve the error messages in the SDK in future releases.

@wevertoum this looks like a different issue (Bad Request?)

I don't believe there's any ongoing issue here. I will close this since it has now become an amalgam of different discussions on various issues on different SDK versions, and that the original issue has been resolved. Please do open new issues if you encounter anything.

@benmatela
Copy link

benmatela commented Sep 3, 2024

For my case I was using Github Actions to deploy and it looks like that's where the problem was.
Once I ran manual deployments like so firebase deploy locally the issue just disappeared.
Possibility of some package not working well or invalid secrets being denied by Firebase.

Using "firebase": "^10.6.0",

@firebase firebase locked and limited conversation to collaborators Sep 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests