DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1
-
Terminated immediately ...
-
@bearer said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
I will try the screen though - what does that offer?
its a terminal multiplexer / window manager or sometihng like so. it achieves that dcs will keep running if you have a network glitch. if you run dcs in the foreground and ssh stops all the processes in that shell are terminated - with screen they can keep running.
It's not a network glitch. When it happens it tends to take out the entire Pi. Completely. So using screen isn't going to help. I've managed to get the network to stay up about 3 or 4 times out of 40 or so crashes, in which case screen isn't needed as you can still issue commands.... everything just takes an age to respond. But yeh, as soon as the SSH connection goes a power cycle is the only fix.
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Well this sucks - it's just done the same thing with RC9 ..... I had screen running at the time and it reported nothing ....
That sucks. It's only been RC10 that I've had this issue on. To the extent that the first time it happened I went and checked to see if my router had died.
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Terminated immediately ...
t
hats interesting; it means the cpu is able to relatively cleanly terminate the session as opposed to just freezing; although it doesn't help you.see below for correction. -
I wonder if something didn't uninstall or get overwritten in the 'downgrade' process. I never used RC9, I came straight from RC6
-
@ChrisP said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
So using screen isn't going to help.
mostly a precaution to avoid terminating the process if the session is interrupted for other reasons.
-
@bearer I should say that there is no disconnect - just zero repsonse - no messages, it just stops ... I use a commercial tool (Secure CRT 8.7) and it still thinks it is connected but hitting enter just causes an on screen line feed.
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
@bearer I should say that there is no disconnect - just zero repsonse - no messages, it just stops ... I use a commercial tool (Secure CRT 8.7) and it still thinks it is connected but hitting enter just causes an on screen line feed.
ah, that is more what i was expecting. it would terminate after 30-60 seconds or so as a timeout; in turn meaning the pi forze or was too busy to close the connection. still good info one way or the other.
-
@bearer said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
it would terminate after 30-60 seconds or so as a timeout
Yup it does ...
-
(i wonder if setting process affinity could isolate the hanging to leave a core running ssh etc, if possible in raspbian - anyways thats it for me today)
-
I found this in the duet web server log
Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[3] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Route matched with {action = "Get", controller = "WebSocket"}. Executing controller action with signature System.Threading.Tasks.Task Get() on controller DuetWebServer.Controllers.WebSocketController (DuetWe Apr 26 19:19:11 duet3 DuetWebServer[1106]: fail: DuetWebServer.Controllers.WebSocketController[0] Apr 26 19:19:11 duet3 DuetWebServer[1106]: [WebSocketController] DCS is not started Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Executed action DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer) in 339302.5643ms Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Executed endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[2] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Request finished in 339446.6255ms 101 Apr 26 19:19:11 duet3 DuetWebServer[1106]: warn: DuetWebServer.Services.ModelObserver[0] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Failed to synchronize machine model Apr 26 19:19:11 duet3 DuetWebServer[1106]: System.Net.Sockets.SocketException (107): Transport endpoint is not connected Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetAPI.Utility.JsonHelper.ReceiveUtf8Json(Socket socket, CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPI/Utility/JsonHelper.cs:line 154 Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetAPIClient.BaseConnection.ReceiveJson(CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPIClient/BaseConnection.cs:line 294 Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetAPIClient.SubscribeConnection.GetMachineModelPatch(CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPIClient/SubscribeConnection.cs:line 100 Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetWebServer.Services.ModelObserver.Execute() in /home/christian/duet/DuetSoftwareFramework/src/DuetWebServer/Services/ModelObserver.cs:line 156 Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2]
Apr 26 19:59:29 duet3 DuetWebServer[1106]: warn: DuetWebServer.Services.ModelObserver[0] Apr 26 19:59:29 duet3 DuetWebServer[1106]: Failed to synchronize machine model Apr 26 19:59:29 duet3 DuetWebServer[1106]: System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (99): Cannot assign requested address /var/run/dsf/dcs.sock Apr 26 19:59:29 duet3 DuetWebServer[1106]: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) Apr 26 19:59:29 duet3 DuetWebServer[1106]: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) Apr 26 19:59:29 duet3 DuetWebServer[1106]: at DuetAPIClient.BaseConnection.Connect(ClientInitMessage initMessage, String socketPath, CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPIClient/BaseConnection.cs:l Apr 26 19:59:29 duet3 DuetWebServer[1106]: at DuetWebServer.Services.ModelObserver.Execute() in /home/christian/duet/DuetSoftwareFramework/src/DuetWebServer/Services/ModelObserver.cs:line 131 Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Request starting HTTP/1.1 GET http://10.100.2.225/machine Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Executing endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[3] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Route matched with {action = "Get", controller = "WebSocket"}. Executing controller action with signature System.Threading.Tasks.Task Get() on controller DuetWebServer.Controllers.WebSocketController (DuetWe Apr 26 19:59:30 duet3 DuetWebServer[1106]: fail: DuetWebServer.Controllers.WebSocketController[0] Apr 26 19:59:30 duet3 DuetWebServer[1106]: [WebSocketController] DCS is not started Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Executed action DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer) in 6.6056ms Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Executed endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[2] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Request finished in 7.2117ms 101 Apr 26 19:59:32 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Apr 26 19:59:32 duet3 DuetWebServer[1106]: Request starting HTTP/1.1 GET http://10.100.2.225/machine Apr 26 19:59:32 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0] Apr 26 19:59:32 duet3 DuetWebServer[1106]: Executing endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:59:32 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[3] Apr 26 19:59:32 duet3 DuetWebServer[1106]: Route matched with {action = "Get", controller = "WebSocket"}. Executing controller action with signature System.Threading.Tasks.Task Get() on controller DuetWebServer.Controllers.WebSocketController (DuetWe Apr 26 19:59:32 duet3 DuetWebServer[1106]: fail: DuetWebServer.Controllers.WebSocketController[0] Apr 26 19:59:32 duet3 DuetWebServer[1106]: [WebSocketController] DCS is not started
Same error even in RC9
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
I feel the need for a compatibility matrix for the 3 main components - which versions of RRF work wich versions of DWC.
Interesting. Duet3 + Pi 4B, 4 gig. I've been having random hangs that take a power cycle to clear. I am also on RC10, as of mid evening yesterday. I was not certain this was happening, nor certain that it started at RC whatever, so I have not reported anything, yet.
Now that I think about it, it came on hard when I switched to RC10. I had to power cycle at least eight or ten times last night.
I typically have a DWC, a VNC and a SSH running. They all just hang. Attempting to start a new SSH also hangs (note, not refused, connects and never gets a password prompt).
I will see what data I can gather.
-
Can I ask all you guys with issues, if the DWC is NOT connected, does it still lock up?
-
I've never run without it, only ever connected via WiFi, would take me a while to set up if the way to test is to put the SD card into the duet itself.
-
I believe he's saying, "start a job, and then close DWC".
Yes, I tried that. SSH only, no VNC, no DWC. Still locked within a few minutes.
-
@Danal said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
I believe he's saying, "start a job, and then close DWC".
Yes, I tried that. SSH only, no VNC, no DWC. Still locked within a few minutes.
Yeah that was it. I wanted to make sure it wasn't DWC related. I've been running RC10 + 2.1.1 and printing fine but I don't use the DWC.
Something to try... The systemd service file for the DCS was changed to set
CPUSchedulingPolicy=fifo CPUSchedulingPriority=20
which may be contributing to the problem.
Edit
/lib/systemd/system/duetcontrolserver.service
and remove those 2 lines, then reboot and see if that helps. -
@gtj0 said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Edit /lib/systemd/system/duetcontrolserver.service and remove those 2 lines, then reboot and see if that helps.
Will do. I've made a bunch of other changes, so let me re-verify the hangs are real, then I will try that. THANKS!
-
I'm a little late to the reporting; to help confirm, I too have seen full system lock ups and print connection losses too with RRF 3.01-R10, DWC 2.1.5 and DSF 2.1.1.
I was able to overcome and get working with:
sudo system duetcontrolserver restart
No system hardware changes, ribbons, or otherwise, just the new Beta install.
The duetcontrolserver would go to 400% CPU usage, and painstakingly getting to the terminal (I have screen direct on my Pi) was able to get a terminal open and issue the fix. SSH & web access were dead - couldn't remote in.
After the duetcontrolserver restart, system will work but still get random disconnects. I will try the modification above and report back too.
-
Well... just finished a two hour print, no changes, completed OK.
I will keep data points coming, in either direction.
-
Another fantastic build.
It is like playing the lottery. Sometimes works fine, other times loads of disconnections from DWC and/or error messages like "homing failed" or heaters take forever to turn on!
Back to RRF3.01_RC3!!!!
-
@chas2706 Are you experiencing this specific issue?