Should M999 terminate the DSF core application?
-
There has been some discussion going on about the default behaviour of M999 in DSF. While some support the idea of making M999 terminate DuetControlServer because it's part of the controller system, others say this step is redundant. While I will not state my opinion about this topic here, I can confirm that DCS had issues with resource invalidation after M999 in the past which have been thoroughly fixed in v1.3.2. This means DCS does not necessarily require a reset after M999.
In either case, I'll make this behaviour configurable in v1.3.3 but we need to agree on a reasonable default setting so I am starting a discussion here. Please put some comments below about what you think should be the default behaviour and why.
-
Allowing unauthenticated input/gcode from DWC to cause any change to the SBC that cannot be recovered from DWC is imho a bad idea, and it goes beyond just M999. (I.e. if M999 is to work, DSF has to work, which doesn't guarantee it will work after restarting)
(Edit: Security concern could be addressed by running DSF as an unprivileged user and/or prompt DWC for sudo password to carry out operations requiring privileged user/root, but doesn't address the lockout potential)
Edit2: I also think one should distinguish between normal operation and the need to reset fault due to immature code - during normal operation on mature code the need to reset DSF shouldn't exist.
-
-
If DCS will restart on abnormal termination automatically, then my opinion is that M999 should cause an abnormal termination. (so that the process restarts.) I think this should happen with all the DSF related processes on M999.
As an alternative (or extension), add a parameter to M999 allowing to specify what gets restarted. "M999" (with no parameters) simulates standalone duet behavior in restarting all the daemons/services related to "duet."
M999 P1 resets just the duet3 board
M999 Px resets the a combination of the duet board and SBC services specified by bits set in 'x'. (If there are SBC services dependent on each other, the dependent would be reset along with whatever they are dependent on.)(Edit: Using P as the parameter, instead of another letter, would prevent confusion with mixing "PERASE" with some other combination.)
-
DCS will restart after 10 seconds once it is terminated unless it is running from the console. The upcoming implementation for DCS 1.3.3 will have an extra timeout in the shutdown routine to ensure it terminates abnormally if the regular shutdown as initiated from M999 fails.
-
Three reasons that I believe M999 should force a restart of the entire DSF:
First: The philosophic statement that the Duet system is based on "Gcode Everywhere". There are sub-statements about configuration, operation, etc; the underlying core principle is quite clear. Any and every configuration, run (a job) and support task is accomplished by issuing gcode commands. Introduction of things that cannot be done via a Gcode is moving away from this core principle.
Therefore, I'd re-ask the question: Should Duet abandon G-Code anywhere as a driving principle?
Second: From the perspective of an end-user, Duet in the configuration of Duet 3 board + SBC (e.g. Raspberry Pi) is a system, not a stack of layers. An end user expect to put G-Code commands into the documented APIs, and get movement, or configuration, or resets, or whatever, out the other end.
Note: From the perspective of the technically literate end user, who desires to help Duet mature and debug products, there are many things that can be done to help isolate and diagnose problems. Back on D2 systems, it is very common to look at different logs, or use a photo of the board, or similar, to help separate things. But... but... when operating the system, all that should disappear.
Third: There have been reproducible conditions that cannot be recovered from Gcode consoles (DWC and/or PanelDue and/or third party) that require a DSF restart to recover. Those are being cleaned up; however, no software ever becomes completely bug free.
For a 'counter example': RRF now runs under an RTOS. How about if M999 restart the movement planner and pulse generator tasks, but not the ??? or main task?? That would make no sense. You can see where this is going, so I won't burden you with reading all the metaphors. The summary is: Reset the system
There's more, but I already have a wall of text, and the 'more' circles around those themes: Gcode everywhere, One system, Operational Consistency.
-
@garyd9 said in Should M999 terminate the DSF core application?:
If DCS will restart on abnormal termination automatically, then my opinion is that M999 should cause an abnormal termination. (so that the process restarts.) I think this should happen with all the DSF related processes on M999.
As an alternative (or extension), add a parameter to M999 allowing to specify what gets restarted. "M999" (with no parameters) simulates standalone duet behavior in restarting all the daemons/services related to "duet."
M999 P1 resets just the duet3 board
M999 Px resets the a combination of the duet board and SBC services specified by bits set in 'x'. (If there are SBC services dependent on each other, the dependent would be reset along with whatever they are dependent on.)(Edit: Using P as the parameter, instead of another letter, would prevent confusion with mixing "PERASE" with some other combination.)
This has been discussed. What is the default with no arguments?
-
And, "security" keeps coming up. I'm really not sure why. sudo gets mentioned, etc. None of that needs to enter into an implementation of duetcontrolserver restart. Perhaps folks are thinking that duetwebserver would need to kill duetcontrolserver? There is a tested pull out there that does not do this... (tested by two end-users)
As garyd9 points out, all duetcontrolserver needs to do is exit (it can even be a normal exit). The way the service is defined to Raspbian, it will be noticed as missing and re-started within a few seconds. There is no piece of this sequence (exit, get restarted by the system) that is any different from normal startup/shutdown. In particular, there is no change to the "attack surface" in security terminology.
What if it is so hung that it won't self exit? If it is alive enough to M999 the board, it is alive enough to self exit. If not, a power cycle is in order.
-
This may have been discussed before, but if there is an argument for restarting dsf, shouldn't the same logic apply to restarting the SBC? So why not have M999 reboot the pi as well as the control board?
-
Personally, I'd be in favor of a full SBC restart as a result of an M999. In my opinion, this should also be the default.
If there is an position that people are running other things on the Pi that they don't want disrupted, two thoughts:
- An optional parameter to limit. Default is reboot SBC. Option 1 is restart DSF, option 2 is restart only Firmware. (Or whatever similar).
And...
- What is the effect on the "other things running on the Pi" if the only recover option turns out to be "power cycle"?
And again, what should the default behavior, no options, of M999 be?
-
@Danal said in Should M999 terminate the DSF core application?:
This has been discussed. What is the default with no arguments?
It's mentioned in the post:
"M999" (with no parameters) simulates standalone duet behavior in restarting all the daemons/services related to "duet.
-
@Danal said in Should M999 terminate the DSF core application?:
And, "security" keeps coming up. I'm really not sure why. sudo gets mentioned, etc. None of that needs to enter into an implementation of duetcontrolserver restart. Perhaps folks are thinking that duetwebserver would need to kill duetcontrolserver? There is a tested pull out there that does not do this... (tested by two end-users)
Partly because the problem goes way beyond M999, and you still end up with an unprivileged user affecting a privileged process.
-
@Danal said in Should M999 terminate the DSF core application?:
Personally, I'd be in favor of a full SBC restart as a result of an M999. In my opinion, this should also be the default.
...And again, what should the default behavior, no options, of M999 be?
I think I disagree with the default rebooting the entire SBC operating system. First, that would require su permissions (while terminating processes running as 'duet' would only require the duet user.) Second, it creates possible unintended side effects.
In other words, rebooting the entire SBC takes "duet" functionality outside of the "duet" sandbox.
-
@bearer said in [Should M999 terminate the DSF core application?]
Partly because the problem goes way beyond M999, and you still end up with an unprivileged user affecting a privileged process.
I truly don't understand that statement. EVERY SINGLE g-code handled by duetcontrolserver is "an unprivileged users affect a privileged process".
If you mean things like changing the hostname or IP, that is (still? once again?) seeing the Pi as somehow 'separate' from the Duet system. It is, in a D3+Pi configuration, literally, the WiFi network interface for the Duet system, and the "Gcode everywhere" core principal says that M552 "has the correct level of privilege" to set that IP.
-
I agree with @Danal with respect to why M999 should case a rest of some sort on the SBC. While running RC6 I frequently had to restart DCS because the system became unresponsive and this solved it, and I did this by SSHing into the Pi to do this. This is okay while I'm sitting next to the printer with a laptop testing stuff, but long-term this isn't an acceptable way of recovering control (the alternative being a power cycle) not only because most users aren't going to now how to do this or even care about learning how to do this, but because even "power users" aren't always going to the tools immediately to hand to do this. So it makes sense that M999 / Emergency Stop on DWC should be able to recover control quickly and efficiently.
As for whether this should be in the form of resetting DCS or a full restart of the SBC, personally, I'm not keen on the idea of having to reset the whole SBC. My guess is that typically it'll only need to be DCS that needs a restart and if the SBC really needs a whole reboot then the system probably has bigger issues that need addressing.
-
@garyd9 said in Should M999 terminate the DSF core application?:
@Danal said in Should M999 terminate the DSF core application?:
Personally, I'd be in favor of a full SBC restart as a result of an M999. In my opinion, this should also be the default.
...And again, what should the default behavior, no options, of M999 be?
I think I disagree with the default rebooting the entire SBC operating system. First, that would require su permissions (while terminating processes running as 'duet' would only require the duet user.) Second, it creates possible unintended side effects.
In other words, rebooting the entire SBC takes "duet" functionality outside of the "duet" sandbox.
Yeah, I don't have much hope of actually getting broad consensus to make that the default.
People still see the SBC as somehow separate from the Duet system. Example: "Sandbox". Example: "SU permissions". That entire way of thinking is very Pi centric. It stopped being a Pi the moment it got bolted into the printer.
Again, I don't really expect to convince people of this. So let's ask the broader community:
Assume M999 can reset:
- just the board
- board + DSF
- board + DSF (somewhat implicit in the reboot) + Pi (reboot)
What should the default, M999 with no arguments, be?
-
Just throwing in a few thoughts: There could be a parameter for M999 that tells DCS to restart provided M999 only resets the Duet 3 + expansion boards. In addition, we will need a new M-code to either shut down or restart the Pi on demand. All of these actions could be made easily accessible on the Machine Settings page of DWC for users of DSF.
PS: I agree more G-codes should effect the machine configuration if users are on DuetPi. We will come up with a new DSF plugin to achieve that. I don't want to distract too much from the actual question though: What should M999 do by default?
-
@chrishamm said in Should M999 terminate the DSF core application?:
ll come up with a new DSF plugin to achieve that. I don't want to distract too much from the actual question though: What should M999 do by default?
Personally, my vote would be for:
- M999 resets the board, expansion boards and DSF by default
- M999 with some parameter can reset the board, expansion boards and the whole SBC
- M999 with some other parameter runs some *.g file to "park" the physical system and then safely shuts down the SBC
-
@ChrisP said in Should M999 terminate the DSF core application?:
@chrishamm said in Should M999 terminate the DSF core application?:
ll come up with a new DSF plugin to achieve that. I don't want to distract too much from the actual question though: What should M999 do by default?
Personally, my vote would be for:
- M999 resets the board, expansion boards and DSF by default
- M999 with some parameter can reset the board, expansion boards and the whole SBC
- M999 with some other parameter runs some *.g file to "park" the physical system and then safely shuts down the SBC
Sounds great. Q: Is that third one really the existing "pause" or "stop" g-code followed by an M999 with the shutdown param?
-
@Danal said in Should M999 terminate the DSF core application?:
People still see the SBC as somehow separate from the Duet system
i wonder why; where did you buy your SBC?