Skip to content

[Bug]: Heartbeat not being updated while user edits files or has terminal activity. #5969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
antofthy opened this issue Jan 13, 2023 · 22 comments
Open
2 tasks done
Labels
bug Something isn't working
Milestone

Comments

@antofthy
Copy link

antofthy commented Jan 13, 2023

Is there an existing issue for this?

  • I have searched the existing issues

OS/Web Information

  • Web Browser: chrome, brave
  • Local OS: linux, windows
  • Remote OS: docker running debian (for example php:8.0.6-apache docker image)
  • Remote Architecture: x86_64
  • code-server --version: v4.8.3 also tested against v4.9.1, and v4.7.3
    4.8.3 977b853 with Code 1.72.1

Steps to Reproduce

  1. Run code server and login in a web browser
  2. Continuously watch the heartbeat file in seperate window, to see when it was last updated (how old it is)
  3. Edit file and generally work in a open terminal window. Or just go do other things until heartbeat stops.
  4. After this do thins with code-server and VERY few times will get heartbeat to start updating again!

Expected

Heartbeat file $HOME/.local/share/code-server/heartbeat should have its time updated (touched) at least once every minute, while there is ANY activity. Essentially if a browser is connected to it, heartbeat should update. BUt it stops after a while.

Alternatively heartbeat could update on any auto-file save, menu activity, or terminal window activity.

Actual

After a period of time (could be anywhere from 15 to 45 minutes) the heartbeat stops updating.

At which point it will NOT start again even if user starts to edit existing open files, making changes, saving using ctrl-S, closing open file editing, or perform commands in terminal. Actual changes are being made, and confirmed both in the file system, or even from looking via the code-server terminal, and yet heartbeat does not reflect that the user has performed some action, and as such is NOT idle.

The heartbeat only starts updating again, if a new file is opened, or browser tab (google chrome, or brave) is refreshed, or re-opened. And then only for a VERY short time (up to 5 minutes or so) before it stops again, even though user is active and changes are being made (as above). Sometimes with only a single heartbeat update being seen.

This is BAD as the idle-timeout system we use relies on the heartbeat (see below). It can't use web requests as there no new requests, as actual changes were made though open connections. Be it via terminal or file editing.

Logs

No log output has been seen in any log without verbose turned on.
However with the length of times involved the size of the verbose log is HUGE!
Especially as the 'File Watcher' also watches the file it logs to for the File Watching while verbose!
EG: .local/share/code-server/logs/.../remoteagent.log

Looking at around the time the heartbeat stops 2023-01-13_15:14:54
And removing any line with the pattern File Watcher.*share/code-server
I get the following from remoteagent.log...

[2023-01-13 15:12:39.157] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var/www
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var/www/.local
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var/www/.local/share
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)]  >> normalized [ADDED] /var/www/20230113-1512-01-
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)]  >> normalized [ADDED] /var/www/20230113-1512-01-/var
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)]  >> normalized [ADDED] /var/www/20230113-1512-01-/var/www
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)]  >> normalized [ADDED] /var/www/20230113-1512-01-/var/www/.local
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)]  >> normalized [ADDED] /var/www/20230113-1512-01-/var/www/.local/share
[2023-01-13 15:15:00.644] [remoteagent] [trace] ExtensionManagementService.refreshReportedCache
[2023-01-13 15:15:00.644] [remoteagent] [trace] ExtensionManagementService.refreshControlCache {"malicious":[],"deprecated":{}}
[2023-01-13 15:16:00.640] [remoteagent] [trace] Started scanning user extensions
[2023-01-13 15:16:00.642] [remoteagent] [trace] Scanned user extensions: 0

I seriously doubt this is of any use.

Screenshot/Video

No response

Are you accessing code-server over HTTPS?

  • I am using HTTPS.

Notes

I have recently had complaints from students and staff saying the Heartbeat has become flaky.
The heartbeat is used by our docker system to determine when a user is no longer using the development environment they are running so it can be closed down after an hour of non-use.

I am finding the heartbeat stopped updating, even though files are still being edited or the terminal is being used. The heartbeat does start to work again if a different file is opened for editing, or the whole browser tab is refreshed. But not from actual use with files being edited or the terminal being used. The heart beat file, which normally updates once a minute just stops being updated.

Heartbeat is VITAL for the determination of the automatic shutdown of docker instances, as code-server holds web connections open, rather than making new requests (as jupyter notebook does). The Idle checker checks...
1/ user started or checked the status of docker instance from a control panel
2/ there has been any web request logged by the docker ingress service for that user on the code-server port (8443) or on other ports user may used in their docker environment (apache ports).
3/ the code-server heartbeat file was last updated.

System is basically as originally described in
#792 (comment)
Only with the addition of web request checks being added, as per...
#1274 (comment)
to the 'idle timeout' check script that runs every ten minutes (it looks back and records in another timestamp file) and user web requests for specific ports in the last 15 minutes of ingress logs.

The heartbeat may be related to the refactor of the heartbeat in...
#5544

@antofthy antofthy added bug Something isn't working triage This issue needs to be triaged by a maintainer labels Jan 13, 2023
@antofthy antofthy changed the title [Bug]: [Bug]: Heartbeat not being updated while user edits files or has terminal activity. Jan 13, 2023
@jsjoeio
Copy link
Contributor

jsjoeio commented Jan 13, 2023

Hmmm.... any ideas @code-asher? I could have sworn it was implemented upstream which is why we removed it.

@jsjoeio
Copy link
Contributor

jsjoeio commented Jan 13, 2023

@antofthy does this happen with the latest version?

My thinking for how you can help us troubleshoot:

  1. ideally reproduce as easy as possible
  2. better? reproduce in a unit test
  3. then we can look into a fix

@antofthy
Copy link
Author

antofthy commented Jan 14, 2023

The tricky part is using something to monitor the heartbeat file. I have a small shell script to return the age of the file (appear of our idle test), and then start beeping me when the file gets more than 2 minute old so I know when heartbeat has stopped working.

In a docker set up with a mounted home you can watch the heartbeat file outside the running docker instance (just as our idle timer is doing). Or on a separate docker connection to the docker instance.

Now what do you mean by 'unit test'. I download code-server from the git version releases.

I will create a new docker image with the latest version.

@antofthy
Copy link
Author

Yes it still does it in the release of 4.9.1
https://github.com/coder/code-server/tree/v4.9.1

@antofthy
Copy link
Author

antofthy commented Jan 14, 2023

I have now uploaded a screen shot with a monitor shell script in the code-server terminal, that shows heart beat had stopped, even though I am obviously active in code-server terminal.

Here is a copy of the script..

while true; do
  printf -v now '%(%s)T' -1
  idle=$((now - $(stat --printf "%Y\n" ~/.local/share/code-server/heartbeat) ))
  printf -v human "%5d:%02d" $((idle/60)) $((idle%60))
  if (( idle > 65 ))
  then echo -n "$human" $'\e[7m***** STOPPED *****\e[0m\r'
  else echo -n "$human" $' Heartbeat Running \r'
  fi
  sleep 1
done

capture_2023-01-14_14-28-09

At this point I can do anything in terminal or in a editor panel and heartbeat does NOT restrart.
until either the browser tab is refreshed, or I specifically open another file for editing. And even then the heartbeat does not run for very long, before it stops again.

I do not know how long it has been like this. Or if the refactoring caused the problem, only that it was recently reported that user docker instances shutdown expectantly (idle timeout) even though users were editing files (an hour after heartbeat updates stopped and failed to update).

I myself did not get involved until after it was reported, and did not believe it until I ran monitors that show this happening.

@antofthy
Copy link
Author

antofthy commented Jan 16, 2023

Had a lucky break in that I ran a variation of the script, and had the heartbeat stop not long after running it...
capture_2023-01-16_14-46-51
You can see the heartbeat updated at 14:40:20 but then did not update again. It did not update even after more terminal, or file editing as mentioned, so the web connection to code-server was still open.

@antofthy
Copy link
Author

If you name a version I can download to test the situation on that version.

@jsjoeio
Copy link
Contributor

jsjoeio commented Jan 17, 2023

Oh I meant so we could add a test here somewhere: https://github.com/coder/code-server/tree/main/test/unit

Thanks for all the notes!

@code-asher any ideas on what we should do to fix this?

@antofthy
Copy link
Author

Except I don't use the GIT code, I used the release code, as part of a larger "Student Software Development Environment".
This environment has been used code-server from version 1

Maybe restoring the old code may help.. I am not certain. Do you know what version the code was switched to using VS code? I can test against a version before it was changed to confirm. I am not surprised it was not discovered as it can take a long time for heartbeat to stop.

Really it is obvious to me that the communications link to the browser is still open an active, as terminal can do things, and file changes are being saved. It's just that code-server heartbeat code is not recognising that the communications with browser is still open and active.

@code-asher
Copy link
Member

code-asher commented Jan 18, 2023

@jsjoeio My thinking is either the Node code we call to get active connections is somehow wrong:

return new Promise((resolve, reject) => {
app.server.getConnections((error, count) => {
if (error) {
return reject(error)
}
logger.debug(plural(count, `${count} active connection`))
resolve(count > 0)
})

Or the heartbeat code itself is buggy. Maybe we can start with some debug logging that tells us when the heartbeat is running, when it schedules itself to run, and when it stops and why it thinks it should stop ("stopping heartbeat because server reports there are no connections" for example).

@jsjoeio
Copy link
Contributor

jsjoeio commented Jan 18, 2023

Great ideas! Okay I'll add it to the next milestone.

@jsjoeio jsjoeio removed the triage This issue needs to be triaged by a maintainer label Jan 18, 2023
@jsjoeio jsjoeio added this to the January 2022 milestone Jan 18, 2023
@code-asher
Copy link
Member

code-asher commented Feb 8, 2023

I just realized I should clarify something, the heartbeat code that was removed in favor of upstream's implementation refers to a web socket heartbeat (a ping/pong message between server to client) that prevents reverse proxies from terminating connected sockets due to inactivity (it would cause a reconnect every 60 seconds or whatever the reverse proxy timeout was set to).

This code-server activity heartbeat is a separate thing, different code and purpose entirely.

I experimented with the heartbeat but so far no luck reproducing the issue. I did find a separate bug that seems to have surfaced with latest Node v16 but am not sure it could cause this issue as well. I added a (debug level) log in the meantime so we can see if it gets triggered but will need to make another release to get it out.

@antofthy
Copy link
Author

antofthy commented Feb 10, 2023

Yes I believe we have noticed heartbeat not timing out for users that are coming via a VPN from home.
Not really a major issue, as it will terminate (timeout) when they finally terminate their VPN. Typicaly only staff stay logged in, and not the students (general users).

As for what I did...

I start codeserver, open a terminal and run a small script (as shown above) to watch the timestamp.
As an admin for the server I can also do this outside code-server but that does not appear to matter.
I typically add a $'\a' to the 'stopped' string to also general a annoying terminal beep, to notify me when the heartbeat has finally stopped.

I then do nothing, and just watch it for a long time (go do other things in other windows).

Now that is fine and expected. I want heartbeat to stop on inactivity.

The problem is it does not start again, with file editing, or new terminal activity.
except on the conditions I have mentioned... EG: reload the browser tab, or open a new file for edit.

I will repeat my tests, when a new relase is out (release number?)

@code-asher
Copy link
Member

code-asher commented Feb 13, 2023

Thank you for the notes! I like the beep idea.

Next release will be 4.10.0, should be out soon after we test the release candidate for a bit.

I had code-server focused so maybe the key for reproduction is to switch to other windows to allow it to go idle.

@antofthy
Copy link
Author

antofthy commented Feb 20, 2023

I have tested 4.10.0. Ran my heartbeat check and waited for it to time out.

Same as before, heartbeat only restarted on editing a new file, or reloading browser tab.
all other actions did not reset the heartbeat, even though I could edit and save and file already being edited. and do interactive commands in a terminal.

Now I could add a 'prompt command' to reset heartbeat on terminal command activity, but I should not need to!

NOTE: previously heartbeat file will be updated once a minute, as long as the browser was connected.
This is the behaviour which I actually expect.

@antofthy
Copy link
Author

antofthy commented May 8, 2023

Just tried version 4.12.0

I had a while a loop actively running on the a code-server shell command line, producing output,
and the mouse moving in and out of the code-server browser tab, at regular intervals.
the heartbeat file again stopped working after about 30 minutes.

Again the heart beat only started updating again after opening a new file for editing.

@code-asher
Copy link
Member

That is unfortunate. I just tried running 4.12.0 in the background (in an unfocused Chromium tab) for over an hour but my heartbeat is running steady. In this test I opened code-server and then tabbed away, I did not open any files or the terminal or interact with it in any way. I am monitoring the heartbeat file in a separate standalone terminal.

Have you seen any new logs with --log debug? At least one for the heartbeat was added in 4.10.0 if I recall correctly.

@antofthy
Copy link
Author

I used v4.13.0
heartbeat starts. I did a few things edits files, etc. then moved away.
heartbeat stopped a few minutes later.
This is fine... an expected...
On editing a new file and the heartbeat restarted. GOOD

Again went to do other things... heartbeat stopped almost immetaially.
closed a file I was editing, heart beat restarted. GOOD

Left it again, to wait for it to stop. which did quickly.

Without even optioning the browser tab, just doing thing in other tabs... heartbeat restarted!

Waiting for it to stop.. It did. I did something in the browser, not even opening the tab, and heartbeat restarted...

I think whatever the problem was has been resolved.
I hardly have to do any activity in the browser and it become active.

Yea!

Thank you for your help, and whatever change was made. whether by you or upstream...

@code-asher
Copy link
Member

code-asher commented May 31, 2023 via email

@antofthy
Copy link
Author

antofthy commented May 31, 2023

The last tests was while I was at work, so no VPN involved. Though there was still a nginx ingress proxy to the container running code-server. May be it is something to do with nginx proxy.

Working from home today, so there will be a VPN involved today.

@antofthy
Copy link
Author

antofthy commented Jun 2, 2023

Okay the the problem persists, exactly as previously described. The heartbeat stops, and only restarts on editing a new file, or reloading the tab. No other actions seems to get the heartbeat to restart, NOT editing files, typing CLI commands.

The difference between this and the previous (where any activity in any part f the browser reactivates the heartbeat) is that I am not only connecting via a nginx proxy to the code-server docker swarm container, but also going though a Fortigate VPN system to the nginx proxy.

What log file should i be looking at for relevant log entries?

@code-asher
Copy link
Member

There should be debug-level logs around the heartbeat in code-server's output or ~/.local/share/code-server/coder-logs with --log debug. I think there is also a trace log so --log trace could show that although it will be a lot of output.

I never got the chance to test with a VPN but I will try early next week to see if it reproduces for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants