Should M999 terminate the DSF core application?
-
@gtj0 said in Should M999 terminate the DSF core application?:
@Danal said in Should M999 terminate the DSF core application?:
So let me ask it this way. A popular use for RPi is a "whole home" or "home automation" controller. Would anyone consider running this function on the physical Pi embedded in their 3D Printer? Probably no technical reason you can't... but it would seem very non-optimal to combine these things. That leads back to the philosophic question: Is the SBC bolted into a given printer part of the Duet implementation on that printer? or is it somehow separate?
I think it's "part of the printer" BUT it may be doing things nor directly involved in printing, like running your DuetLapse or maybe later, even slicing. I'd hate to send M999 and have the SBC restart in the middle of combining all those jpegs into an mp4 or aborting a slice.
So if the default M999 is "board + DSF" and it takes an extra parm to reboot the SBC, would that would be a good balance?
-
@Danal said in Should M999 terminate the DSF core application?:
@gtj0 said in Should M999 terminate the DSF core application?:
@Danal said in Should M999 terminate the DSF core application?:
So let me ask it this way. A popular use for RPi is a "whole home" or "home automation" controller. Would anyone consider running this function on the physical Pi embedded in their 3D Printer? Probably no technical reason you can't... but it would seem very non-optimal to combine these things. That leads back to the philosophic question: Is the SBC bolted into a given printer part of the Duet implementation on that printer? or is it somehow separate?
I think it's "part of the printer" BUT it may be doing things nor directly involved in printing, like running your DuetLapse or maybe later, even slicing. I'd hate to send M999 and have the SBC restart in the middle of combining all those jpegs into an mp4 or aborting a slice.
So if the default M999 is "board + DSF" and it takes an extra parm to reboot the SBC, would that would be a good balance?
Works For Me!
-
Yeah, I think that's a pretty good balance spot.
I also think it answers the question that Chris originally asked. Fun discussions... and a conclusion! Will wonders never cease!.
-
@Danal said in Should M999 terminate the DSF core application?:
This is an open question among many people. I have a Pi on a CNC, a Duet printer, and a couple of monitors. I haven't done a shutdown on any of them, I've just yanked power, in years.
This is like saying "I manually spin my stepper motors really fast all the time while they are still plugged into the duet, and never once have I experienced a problem."
Frequently not having a failure doesn't mean it's not a reckless thing to do, and your argument could even encourage other people to do the same reckless thing. There is simply no (reasonably feasible) way to ensure a clean shutdown of a raspbian included read/write filesystem on sudden power loss. It might even be a good idea on these SBCs to disable the filesystem write cache, and force all the logs, etc, to be written to a tempfs.
@Danal said in Should M999 terminate the DSF core application?:
So let me ask it this way. A popular use for RPi is a "whole home" or "home automation" controller. Would anyone consider running this function on the physical Pi embedded in their 3D Printer? Probably technically possible... but it would seem very non-optimal to combine these things. That leads back to the philosophic question: Is the SBC bolted into a given printer part of the Duet implementation on that printer? or is it somehow separate?
On a current Raspberry Pi, no - but only because of the limitations of the board. If I had a SBC with a better processor, and much faster file I/O, I'd have no problem whatsoever doing that. I could justify it as both are functions that control physical appliances.
-
Re: Shutdown.
We are going to disagree on this one.
And, no, those two situations are not at all parallel. There is an RPM of steppers that WILL cause damage. This is simply provable, in moments (I won't ask anyone to demonstrate). There is no way to prove or disprove that fsck on startup will or won't handle any possible power down state, simply because you can't prove a negative.
Again, I really didn't want to reopen this debate. Every individual should do what they believe is right. Please, no one should change their habit based on THIS discussion.
Re: Running multiple things.
So you'd keep your printer powered on at all times to run your home auto? (Or maybe power up and down subsections...) And you'd accept interruptions to your home auto when you had no choice than to power cycle the printer? And so on and so forth. There is no right answer here, but this all seems contrary to the mission of "single board, embedded, computers".
I sincerely hope SBCs are headed the other way... more of them (not fewer), single "function" (maybe "area of function" is a better phrase), and so forth. More specialized, not more shared.
Again, we may have to agree to disagree.
-
@Danal said in Should M999 terminate the DSF core application?:
Re: Shutdown.
Again, I really didn't want to reopen this debate.Why keep debating it immediately before stating you didn't want to reopen it?
So you'd keep your printer powered on at all times to run your home auto? (Or maybe power up and down subsections...) And you'd accept interruptions to your home auto when you had no choice than to power cycle the printer? And so on and so forth. There is no right answer here, but this all seems contrary to the mission of "single board, embedded, computers".
If you look above, you'll see that I don't see the SBC as integrated as part of the printer. I see it as something that controls the printer. Just as a computer controls my webcam, and the computer running my home automation controls a light switch. So, I wouldn't power down my home automation if/when I power down the printer: I'd turn off the printer and leave the SBC running. (I already do this.)
"Single board computer" doesn't mean "single purpose computer." As well, it's not embedded within the duet. An embedded system wouldn't be sold separately from a different company with a different physical footprint that's optional. It would be "embedded." As well, most devices that actually embed linux-based OS's (such as home internet routers) configure the OS to be more tolerant of power outages (by not caching filesystem (or NVRAM) writes, and using tempfs for logs, etc) and do other things that essentially make the underlying OS (and general purpose processor) invisible.
-
@garyd9 said in Should M999 terminate the DSF core application?:
Why keep debating it immediately before stating you didn't want to reopen it?
Because you keep posting about it and I do wish that bystanders know there are two, widespread, operational practices. I don't want people to change in either direction without more research on their own.
-
@Danal said in Should M999 terminate the DSF core application?:
@garyd9 said in Should M999 terminate the DSF core application?:
Why keep debating it immediately before stating you didn't want to reopen it?
Because you keep posting about it and I do wish that bystanders know there are two, widespread, operational practices. I don't want people to change in either direction without more research on their own.
That's fair. I hope you don't take offense at my debating with you. Discussion is the best way to share different views/opinions. Once in a while, new and better ideas are the result.
-
@garyd9 said in Should M999 terminate the DSF core application?:
@Danal said in Should M999 terminate the DSF core application?:
@garyd9 said in Should M999 terminate the DSF core application?:
Why keep debating it immediately before stating you didn't want to reopen it?
Because you keep posting about it and I do wish that bystanders know there are two, widespread, operational practices. I don't want people to change in either direction without more research on their own.
That's fair. I hope you don't take offense at my debating with you. Discussion is the best way to share different views/opinions. Once in a while, new and better ideas are the result.
Actually, I enjoy the heck out of these discussions! I'm always a little worried about the person at the other end, so it is very nice to hear your statement.
-
Just to save @chrishamm some reading, a consensus emerged above:
M999 should:
- With no parms restart RRF on the board, and DSF (at least duetcontrolserver) on the Pi.
- With optional parms, it should also be capable of restarting:
- RRF on Board only
- DSF (dcs?) only
- Rebooting the entire Pi.
Anyone correct anything if it is wrong, or even in the slightest misleading.
-
To continue this discussion a bit further...
https://github.com/chrishamm/DuetSoftwareFramework/issues/120The current implementation seems to create a race condition....
- M999 is issued.
- RRF is restarted and DCS is killed.
- RRF looks for config.g from DCS but DCS isn't there.
- RRF hangs at "Off".
- DCS starts but RRF is Off so there's nothing you can do.
- Issuing M999 again restarts the cycle.
I'd like to suggest that on startup, DCS check RRF state and if it hasn't already loaded config.g, load it.
@dc42 is there a way for DCS to tell if RRF has or has not run config.g? I don't think we'd want DCS to restart RRF if it had run config.g but was Off for some other reason. Just if it hadn't initialized the first time.
-
Interesting. Shouldn't RRF, on every startup, look for the SD files first, and if not found, wait in a known state for DSF to contact it? That was my impression of how ordinary power on worked.
On a restart, what would be different?
-
@Danal Yeah, now that I think about it, that's what should have happened I guess. I've got a print going now but I'll test some other scenarios in a bit.
-
@dc42 FYI...
After M122 DCS had shut down while RRF was still starting...
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (7 max) === RTOS === Static ram: 154580 Dynamic ram: 160520 of which 20 recycled Exception stack ram used: 308 Never used ram: 77788 Tasks: NETWORK(ready,2084) HEAT(blocked,1452) CanReceiv(suspended,3824) CanSender(suspended,1484) CanClock(blocked,1464) TMC(suspended,216) MAIN(running,5108) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:00:08 ago, cause: software Last software reset at 2020-04-18 18:08, reason: User, spinning module LinuxInterface, available RAM 75940 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 40.1, current 40.4, max 40.6 Supply voltage: min 2.3, current 2.3, max 25.4, under voltage events: 0, over voltage events: 0, power good: no 12V rail voltage: min 0.4, current 0.4, max 12.2, under voltage events: 1 Driver 0: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 1: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 2: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 3: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 4: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 5: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Date/time: 1970-01-01 00:00:00 Slowest loop: 2.68ms; fastest: 0.13ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 0, running macro Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 0.66ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 0, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 8745ms ago RX/TX seq numbers: 0/3 SPI underruns 0, overruns 0 Number of disconnects: 1 Buffer RX/TX: 0/0-0 ok
After DCS restarts. Note that RRF recognizes that the DCS is running but doesn't do anything about it.
Connection to Linux established! M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (12 max) === RTOS === Static ram: 154580 Dynamic ram: 160520 of which 20 recycled Exception stack ram used: 308 Never used ram: 77788 Tasks: NETWORK(ready,2084) HEAT(blocked,1452) CanReceiv(suspended,3824) CanSender(suspended,1484) CanClock(blocked,1464) TMC(suspended,216) MAIN(running,4468) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:01:26 ago, cause: software Last software reset at 2020-04-18 18:08, reason: User, spinning module LinuxInterface, available RAM 75940 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 40.0, current 40.1, max 40.6 Supply voltage: min 0.1, current 0.2, max 2.3, under voltage events: 0, over voltage events: 0, power good: no 12V rail voltage: min 0.2, current 0.3, max 0.4, under voltage events: 1 Driver 0: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 1: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 2: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 3: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 4: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 5: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Date/time: 1970-01-01 00:00:00 Slowest loop: 2.07ms; fastest: 0.13ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 1.04ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 0, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 27ms ago RX/TX seq numbers: 595/597 SPI underruns 0, overruns 0 Number of disconnects: 1 Buffer RX/TX: 0/0-0 ok
After pressing reset button with DCS running. All good.
M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (10 max) === RTOS === Static ram: 154580 Dynamic ram: 162360 of which 36 recycled Exception stack ram used: 300 Never used ram: 75940 Tasks: NETWORK(ready,2076) HEAT(blocked,1196) CanReceiv(suspended,3824) CanSender(suspended,1484) CanClock(blocked,1464) TMC(blocked,216) MA IN(running,4840) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:00:07 ago, cause: reset button Last software reset at 2020-04-18 18:08, reason: User, spinning module LinuxInterface, available RAM 75940 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 37.4, current 39.7, max 39.8 Supply voltage: min 0.1, current 25.4, max 26.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 0.3, current 12.2, max 12.2, under voltage events: 0 Driver 0: standstill, reads 27124, writes 11 timeouts 0, SG min/max 0/0 Driver 1: standstill, reads 27125, writes 11 timeouts 0, SG min/max 0/0 Driver 2: standstill, reads 27125, writes 11 timeouts 0, SG min/max 0/0 Driver 3: standstill, reads 27126, writes 11 timeouts 0, SG min/max 0/0 Driver 4: standstill, reads 27126, writes 11 timeouts 0, SG min/max 0/0 Driver 5: standstill, reads 27126, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2020-04-18 18:10:13 Slowest loop: 3.90ms; fastest: 0.13ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 0.47ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 27, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 16ms ago RX/TX seq numbers: 1770/287 SPI underruns 0, overruns 0 Number of disconnects: 0 Buffer RX/TX: 0/0-0
-
Also running
M98 P"config.g"
Fixes the issue if you can't get to the reset button
-
@jay_s_uk said in Should M999 terminate the DSF core application?:
Also running
M98 P"config.g"
Fixes the issue if you can't get to the reset button
Good point!
-
The best way to solve this might just be for the DCS to defer it's shutdown until RRF has finished reading the config.g file.
-
@gtj0 said in Should M999 terminate the DSF core application?:
The best way to solve this might just be for the DCS to defer it's shutdown until RRF has finished reading the config.g file.
A restart is to get out of being hung. Waiting for things to happen is a way to get hung again (or stay hung).
DCS really needs to "fire" a reset at the SPI interface, and then exit as directly as possible (no cleanup). At one point, I coded this, literally just 'exit(8)' right after the function call to send the board the restart, and it was tested by me and one other (maybe even you, @gtj0? whoever it was re-built it into a 64 bit build of DCS) and it worked fine. 'Worked' meaning that it force restarted both the board and DCS, and it resulted in a fully running system (with no other actions).
I haven't looked at the RC8 code for this... but at one point Chris rejected the pull because it didn't do cleanup (plus some other more philosophic reasons). To me, that red button in the upper corner of the screen should work very much like the 'reset' pin on a microcontroller CPU. Not one more instruction... just restart. Or as close as reasonably possible.
-
@Danal Yeah it was me that tested it. I agree that it should be an immediate reset, at least from an RRF perspective. When you press the button it's usually because something bad/dangerous is happening and you just want to STOP. The problem is getting it going again.
-
@Danal You need to ensure that the reset package actually makes it to RRF so I had to change a few extra spots in DCS. But I'll check again if I actually exit the SPI loop when requested and if that is not the case, I'll enforce it now. Either way, I'll try to fix it in DSF next.
With the new code,
Environment.Exit
is still the last resort in case the internal termination request fails after 4 seconds. See the last few lines in Program.cs for further details.