Duet 3 Softwarecrash?
-
@phaedrux Not right after, but there was no other action between. The problem happened more or less at 21.15 (screenshot) and the M122 one hour later.
-
@metty I have added this to my list to look at. If it happens again, please post a new M122 report taken afterwards.
-
@metty your MCU shows 50+ deg C
Maybe consider some cooling on the Duet 3 board -
@martin7404 Thanks for the hint.. I will keep an eye on the temperature. 50 deg does not look problematic for me, but I am not so familiar with the hardware to be sure.
-
@metty the Duet boards should be ok up to 80 degrees C
-
@dc42 I found the printer today again in the same state. I suspect several hours passed between the crash and the M122:
M122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.3 (2021-06-15 21:45:47) running on Duet 3 MB6HC v1.01 or later (standalone mode)
Board ID: 08DJM-956BA-NA3TJ-6JKDJ-3SN6S-19AQV
Used output buffers: 1 of 40 (16 max)
=== RTOS ===
Static ram: 150904
Dynamic ram: 94224 of which 0 recycled
Never used RAM 109064, free system stack 200 words
Tasks: NETWORK(ready,26.2%,234) ETHERNET(notifyWait,0.0%,124) HEAT(delaying,0.0%,331) Move(notifyWait,0.0%,302) CanReceiv(notifyWait,0.1%,774) CanSender(notifyWait,0.0%,374) CanClock(delaying,0.0%,339) TMC(notifyWait,8.2%,93) MAIN(running,65.4%,924) IDLE(ready,0.0%,29), total 100.0%
Owned mutexes: HTTP(MAIN)
=== Platform ===
Last reset 00:05:17 ago, cause: software
Last software reset at 2021-07-02 23:54, reason: StackOverflow, Platform spinning, available RAM 109064, slot 1
Software reset code 0x4100 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0040080e BFAR 0x00000000 SP 0x2045ffbc Task ETHE Freestk 64050 bad marker
Stack: 20421698 204216cc 0047a959 00000000 20424c4c 000003e8 00479fc9 20421a74 20424c4c 00000000 00f00000 e000e000 c0000000 00000000 0047a0e5 00479e68 21000000 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
Error status: 0x00
Step timer max interval 127
MCU temperature: min 43.8, current 44.1, max 44.3
Supply voltage: min 24.2, current 24.3, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0
Heap OK, handles allocated/used 0/0, heap memory allocated/used/recyclable 0/0/0, gc cycles 0
Driver 0: position 0, standstill, reads 44362, writes 15 timeouts 0, SG min/max 0/0
Driver 1: position 0, standstill, reads 44362, writes 15 timeouts 0, SG min/max 0/0
Driver 2: position 0, standstill, reads 44364, writes 14 timeouts 0, SG min/max 0/0
Driver 3: position 0, standstill, reads 44362, writes 16 timeouts 0, SG min/max 0/0
Driver 4: position 0, standstill, reads 44367, writes 11 timeouts 0, SG min/max 0/0
Driver 5: position 0, standstill, reads 44367, writes 11 timeouts 0, SG min/max 0/0
Date/time: 2021-07-02 23:59:59
Slowest loop: 3.83ms; fastest: 0.05ms
=== Storage ===
Free file entries: 10
SD card 0 detected, interface speed: 25.0MBytes/sec
SD card longest read time 2.2ms, write time 0.0ms, max retries 0
=== Move ===
DMs created 125, maxWait 0ms, bed compensation in use: none, comp offset 0.000
=== MainDDARing ===
Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== AuxDDARing ===
Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== Heat ===
Bed heaters = 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = 2 -1 -1 -1
=== GCodes ===
Segments left: 0
Movement lock held by null
HTTP is ready with "M122 " in state(s) 0
Telnet is idle in state(s) 0
File is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger is idle in state(s) 0
Queue is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
Code queue is empty.
=== CAN ===
Messages queued 2900, received 15276, lost 0, longest wait 2ms for reply type 6031, peak Tx sync delay 7, free buffers 49 (min 48), ts 1587/1586/0
Tx timeouts 0,0,0,0,0,0
=== Network ===
Slowest loop: 2.71ms; fastest: 0.02ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions
HTTP sessions: 1 of 8- Ethernet -
State: active
Error counts: 0 0 1 0 0
Socket states: 5 2 2 2 2 0 0 0
- Ethernet -
-
@metty said in Duet 3 Softwarecrash?:
Last software reset at 2021-07-02 23:54, reason: StackOverflow, Platform spinning, available RAM 109064, slot 1
Software reset code 0x4100 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0040080e BFAR 0x00000000 SP 0x2045ffbc Task ETHE Freestk 64050 bad marker
Stack: 20421698 204216cc 0047a959 00000000 20424c4c 000003e8 00479fc9 20421a74 20424c4c 00000000 00f00000 e000e000 c0000000 00000000 0047a0e5 00479e68 21000000 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
Error status: 0x00Thanks for the report. @dc42 will like to see that.
-
@phaedrux
It happened again, so stack-overflow for sure? So increasing the stack would be a solution? Or is there any other method to check what was responsible for the overflow?
If I post more 122, I guess you can not read more info's that you already have, is that correct?M122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.3 (2021-06-15 21:45:47) running on Duet 3 MB6HC v1.01 or later (standalone mode)
Board ID: 08DJM-956BA-NA3TJ-6JKDJ-3SN6S-19AQV
Used output buffers: 3 of 40 (29 max)
=== RTOS ===
Static ram: 150904
Dynamic ram: 94224 of which 0 recycled
Never used RAM 109064, free system stack 200 words
Tasks: NETWORK(ready,26.4%,254) ETHERNET(notifyWait,0.1%,124) HEAT(delaying,0.0%,325) Move(notifyWait,0.0%,302) CanReceiv(notifyWait,0.1%,774) CanSender(notifyWait,0.0%,374) CanClock(delaying,0.0%,339) TMC(notifyWait,8.2%,93) MAIN(running,65.1%,924) IDLE(ready,0.0%,29), total 100.0%
Owned mutexes:
=== Platform ===
Last reset 01:14:31 ago, cause: software
Last software reset at 2021-07-10 09:01, reason: StackOverflow, Platform spinning, available RAM 105380, slot 0
Software reset code 0x4100 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0042780e BFAR 0x00000000 SP 0x2045ffbc Task ETHE Freestk 64050 ok
Stack: 20421698 204216cc 0047a959 00000000 20424c4c 000003e8 00479fc9 20421a74 20424c4c 00000000 00f00000 e000e000 c0000000 00000000 0047a0e5 00479e68 21000000 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
Error status: 0x00
Step timer max interval 164
MCU temperature: min 45.4, current 45.5, max 45.7
Supply voltage: min 24.2, current 24.2, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0
Heap OK, handles allocated/used 0/0, heap memory allocated/used/recyclable 0/0/0, gc cycles 0
Driver 0: position 0, standstill, reads 18369, writes 0 timeouts 0, SG min/max not available
Driver 1: position 0, standstill, reads 18370, writes 0 timeouts 0, SG min/max not available
Driver 2: position 0, standstill, reads 18370, writes 0 timeouts 0, SG min/max not available
Driver 3: position 0, standstill, reads 18370, writes 0 timeouts 0, SG min/max not available
Driver 4: position 0, standstill, reads 18370, writes 0 timeouts 0, SG min/max not available
Driver 5: position 0, standstill, reads 18370, writes 0 timeouts 0, SG min/max not available
Date/time: 2021-07-10 10:15:38
Slowest loop: 3.68ms; fastest: 0.05ms
=== Storage ===
Free file entries: 10
SD card 0 detected, interface speed: 25.0MBytes/sec
SD card longest read time 2.2ms, write time 0.0ms, max retries 0
=== Move ===
DMs created 125, maxWait 0ms, bed compensation in use: none, comp offset 0.000
=== MainDDARing ===
Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== AuxDDARing ===
Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== Heat ===
Bed heaters = 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = 2 -1 -1 -1
=== GCodes ===
Segments left: 0
Movement lock held by null
HTTP is idle in state(s) 0
Telnet is idle in state(s) 0
File is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger is idle in state(s) 0
Queue is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
Code queue is empty.
=== CAN ===
Messages queued 137, received 729, lost 0, longest wait 0ms for reply type 0, peak Tx sync delay 4, free buffers 49 (min 49), ts 76/76/0
Tx timeouts 0,0,0,0,0,0
=== Network ===
Slowest loop: 2.55ms; fastest: 0.02ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions
HTTP sessions: 1 of 8- Ethernet -
State: active
Error counts: 0 0 1 0 0
Socket states: 2 2 2 2 2 0 0 0
- Ethernet -
-
Can you also provide the gcode file that provoked the crash?
-
@metty I'm sorry, your issue was on my list to look at but as the information in the M122 report is not a clear case of a stack overflow, I hadn't finished looking into it.
We've just released RRF 3.4.0beta1. It has the same size stack for the task that crashed as in 3.3.0. So I've made another build with a larger stack available at https://www.dropbox.com/sh/razha1aikzytuw5/AAAEZnuoDAtUqklUsSnvNzKHa?dl=0. Please try this build.
If it crashes again, please post another M122 report after the crash. If it appears to fix the issue, please post a M122 report taken after the machine has done a lot of printing, so that I can check on the stack usage.
-
@dc42 Thanks a lot, for preparing a firmware for me. I can test it in the end of the week. I keep you informed if the crash happens again or not.
@Phaedrux If you want I can share the g-code file. I guess it is difficult to find a problem there. I could not find a pattern of this error. Suddenly it happened. I was printing the same part, with the same code 12 times. Maybe every 4th time it crashed. -
@metty do you have any update for me on whether the new firmware build solved the problem?
-
@metty have you any update?
-
@dc42 Thanks to coming back to me. Last days were crazy. Unfortunately, I was not able to print or build anything on the printer last week.
That should be better now. I was just able to install the firmware. Will thus immediately test. Since the error also only occurs sporadically, we will certainly only know in a week whether it has done anything. I will be in touch again! -
@metty we've today released RRF 3.4.0 beta 2. This includes the larger stack size.
-
@dc42 Nice thanks! So just installed 3.4 beta2