-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[Bug]: Heartbeat not being updated while user edits files or has terminal activity. #5969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmmm.... any ideas @code-asher? I could have sworn it was implemented upstream which is why we removed it. |
@antofthy does this happen with the latest version? My thinking for how you can help us troubleshoot:
|
The tricky part is using something to monitor the heartbeat file. I have a small shell script to return the age of the file (appear of our idle test), and then start beeping me when the file gets more than 2 minute old so I know when heartbeat has stopped working. In a docker set up with a mounted home you can watch the heartbeat file outside the running docker instance (just as our idle timer is doing). Or on a separate docker connection to the docker instance. Now what do you mean by 'unit test'. I download code-server from the git version releases. I will create a new docker image with the latest version. |
Yes it still does it in the release of 4.9.1 |
I have now uploaded a screen shot with a monitor shell script in the code-server terminal, that shows heart beat had stopped, even though I am obviously active in code-server terminal. Here is a copy of the script.. while true; do
printf -v now '%(%s)T' -1
idle=$((now - $(stat --printf "%Y\n" ~/.local/share/code-server/heartbeat) ))
printf -v human "%5d:%02d" $((idle/60)) $((idle%60))
if (( idle > 65 ))
then echo -n "$human" $'\e[7m***** STOPPED *****\e[0m\r'
else echo -n "$human" $' Heartbeat Running \r'
fi
sleep 1
done At this point I can do anything in terminal or in a editor panel and heartbeat does NOT restrart. I do not know how long it has been like this. Or if the refactoring caused the problem, only that it was recently reported that user docker instances shutdown expectantly (idle timeout) even though users were editing files (an hour after heartbeat updates stopped and failed to update). I myself did not get involved until after it was reported, and did not believe it until I ran monitors that show this happening. |
If you name a version I can download to test the situation on that version. |
Oh I meant so we could add a test here somewhere: https://github.com/coder/code-server/tree/main/test/unit Thanks for all the notes! @code-asher any ideas on what we should do to fix this? |
Except I don't use the GIT code, I used the release code, as part of a larger "Student Software Development Environment". Maybe restoring the old code may help.. I am not certain. Do you know what version the code was switched to using VS code? I can test against a version before it was changed to confirm. I am not surprised it was not discovered as it can take a long time for heartbeat to stop. Really it is obvious to me that the communications link to the browser is still open an active, as terminal can do things, and file changes are being saved. It's just that code-server heartbeat code is not recognising that the communications with browser is still open and active. |
@jsjoeio My thinking is either the Node code we call to get active connections is somehow wrong: code-server/src/node/routes/index.ts Lines 35 to 42 in 134e9b4
Or the heartbeat code itself is buggy. Maybe we can start with some debug logging that tells us when the heartbeat is running, when it schedules itself to run, and when it stops and why it thinks it should stop ("stopping heartbeat because server reports there are no connections" for example). |
Great ideas! Okay I'll add it to the next milestone. |
I just realized I should clarify something, the heartbeat code that was removed in favor of upstream's implementation refers to a web socket heartbeat (a ping/pong message between server to client) that prevents reverse proxies from terminating connected sockets due to inactivity (it would cause a reconnect every 60 seconds or whatever the reverse proxy timeout was set to). This code-server activity heartbeat is a separate thing, different code and purpose entirely. I experimented with the heartbeat but so far no luck reproducing the issue. I did find a separate bug that seems to have surfaced with latest Node v16 but am not sure it could cause this issue as well. I added a (debug level) log in the meantime so we can see if it gets triggered but will need to make another release to get it out. |
Yes I believe we have noticed heartbeat not timing out for users that are coming via a VPN from home. As for what I did... I start codeserver, open a terminal and run a small script (as shown above) to watch the timestamp. I then do nothing, and just watch it for a long time (go do other things in other windows). Now that is fine and expected. I want heartbeat to stop on inactivity. The problem is it does not start again, with file editing, or new terminal activity. I will repeat my tests, when a new relase is out (release number?) |
Thank you for the notes! I like the beep idea. Next release will be 4.10.0, should be out soon after we test the release candidate for a bit. I had code-server focused so maybe the key for reproduction is to switch to other windows to allow it to go idle. |
I have tested 4.10.0. Ran my heartbeat check and waited for it to time out. Same as before, heartbeat only restarted on editing a new file, or reloading browser tab. Now I could add a 'prompt command' to reset heartbeat on terminal command activity, but I should not need to! NOTE: previously heartbeat file will be updated once a minute, as long as the browser was connected. |
Just tried version 4.12.0 I had a while a loop actively running on the a code-server shell command line, producing output, Again the heart beat only started updating again after opening a new file for editing. |
That is unfortunate. I just tried running 4.12.0 in the background (in an unfocused Chromium tab) for over an hour but my heartbeat is running steady. In this test I opened code-server and then tabbed away, I did not open any files or the terminal or interact with it in any way. I am monitoring the heartbeat file in a separate standalone terminal. Have you seen any new logs with |
I used v4.13.0 Again went to do other things... heartbeat stopped almost immetaially. Left it again, to wait for it to stop. which did quickly. Without even optioning the browser tab, just doing thing in other tabs... heartbeat restarted! Waiting for it to stop.. It did. I did something in the browser, not even opening the tab, and heartbeat restarted... I think whatever the problem was has been resolved. Yea! Thank you for your help, and whatever change was made. whether by you or upstream... |
I have no idea what could have changed this but glad to hear it is
working! Although it is supposed to keep beating as long as the
tab is open even if there is no activity so maybe there is still
something wrong.
|
The last tests was while I was at work, so no VPN involved. Though there was still a nginx ingress proxy to the container running code-server. May be it is something to do with nginx proxy. Working from home today, so there will be a VPN involved today. |
Okay the the problem persists, exactly as previously described. The heartbeat stops, and only restarts on editing a new file, or reloading the tab. No other actions seems to get the heartbeat to restart, NOT editing files, typing CLI commands. The difference between this and the previous (where any activity in any part f the browser reactivates the heartbeat) is that I am not only connecting via a nginx proxy to the code-server docker swarm container, but also going though a Fortigate VPN system to the nginx proxy. What log file should i be looking at for relevant log entries? |
There should be debug-level logs around the heartbeat in code-server's output or I never got the chance to test with a VPN but I will try early next week to see if it reproduces for me. |
Is there an existing issue for this?
OS/Web Information
code-server --version
: v4.8.3 also tested against v4.9.1, and v4.7.34.8.3 977b853 with Code 1.72.1
Steps to Reproduce
Expected
Heartbeat file
$HOME/.local/share/code-server/heartbeat
should have its time updated (touched) at least once every minute, while there is ANY activity. Essentially if a browser is connected to it, heartbeat should update. BUt it stops after a while.Alternatively heartbeat could update on any auto-file save, menu activity, or terminal window activity.
Actual
After a period of time (could be anywhere from 15 to 45 minutes) the heartbeat stops updating.
At which point it will NOT start again even if user starts to edit existing open files, making changes, saving using ctrl-S, closing open file editing, or perform commands in terminal. Actual changes are being made, and confirmed both in the file system, or even from looking via the code-server terminal, and yet heartbeat does not reflect that the user has performed some action, and as such is NOT idle.
The heartbeat only starts updating again, if a new file is opened, or browser tab (google chrome, or brave) is refreshed, or re-opened. And then only for a VERY short time (up to 5 minutes or so) before it stops again, even though user is active and changes are being made (as above). Sometimes with only a single heartbeat update being seen.
This is BAD as the idle-timeout system we use relies on the heartbeat (see below). It can't use web requests as there no new requests, as actual changes were made though open connections. Be it via terminal or file editing.
Logs
No log output has been seen in any log without verbose turned on.
However with the length of times involved the size of the verbose log is HUGE!
Especially as the 'File Watcher' also watches the file it logs to for the File Watching while verbose!
EG: .local/share/code-server/logs/.../remoteagent.log
Looking at around the time the heartbeat stops 2023-01-13_15:14:54
And removing any line with the pattern
File Watcher.*share/code-server
I get the following from
remoteagent.log
...I seriously doubt this is of any use.
Screenshot/Video
No response
Are you accessing code-server over HTTPS?
Notes
I have recently had complaints from students and staff saying the Heartbeat has become flaky.
The heartbeat is used by our docker system to determine when a user is no longer using the development environment they are running so it can be closed down after an hour of non-use.
I am finding the heartbeat stopped updating, even though files are still being edited or the terminal is being used. The heartbeat does start to work again if a different file is opened for editing, or the whole browser tab is refreshed. But not from actual use with files being edited or the terminal being used. The heart beat file, which normally updates once a minute just stops being updated.
Heartbeat is VITAL for the determination of the automatic shutdown of docker instances, as code-server holds web connections open, rather than making new requests (as jupyter notebook does). The Idle checker checks...
1/ user started or checked the status of docker instance from a control panel
2/ there has been any web request logged by the docker ingress service for that user on the code-server port (8443) or on other ports user may used in their docker environment (apache ports).
3/ the code-server heartbeat file was last updated.
System is basically as originally described in
#792 (comment)
Only with the addition of web request checks being added, as per...
#1274 (comment)
to the 'idle timeout' check script that runs every ten minutes (it looks back and records in another timestamp file) and user web requests for specific ports in the last 15 minutes of ingress logs.
The heartbeat may be related to the refactor of the heartbeat in...
#5544
The text was updated successfully, but these errors were encountered: