Print ended prematurely 30 hrs in 3.4 beta 7 SBC
-
I noticed my printer was awfully quiet and when I checked I found it had just stopped with the head parked in the middle of doing some infill. The heaters had shut off and were cooling. The 12864 had some conflicting info where it showed the heaters still on but they were not, but otherwise showed printer was in a non-printing state. I checked DWC and found no errors. The status page said job completed but clearly it was not. I dumped a M122 shown below.
Luckily I was able to use M114 and G92 in a hacked gcode to recover the last 7 hours of the print. But I’m disturbed that it just ended early without errors. I just simulated the original file and it completed without error. The line it stopped on during the print was in a stretch of uninteresting G1 commands. Any other areas to investigate?
1/3/2022, 11:13:36 PM: M122: === Diagnostics ===
RepRapFirmware for Duet 3 Mini 5+ version 3.4.0beta7 (2021-12-16 12:22:56) running on Duet 3 Mini5plus WiFi (SBC mode)
Board ID: RNXYU-8296U-D65J0-40KMW-2503Z-ZP7HW
Used output buffers: 1 of 40 (12 max)
=== RTOS ===
Static ram: 103460
Dynamic ram: 104412 of which 0 recycled
Never used RAM 30144, free system stack 118 words
Tasks: SBC(ready,138.9%,480) HEAT(notifyWait,1.3%,342) Move(notifyWait,87.1%,273) CanReceiv(notifyWait,0.0%,942) CanSender(notifyWait,0.0%,358) CanClock(delaying,0.4%,331) TMC(notifyWait,37.6%,105) MAIN(running,121.3%,484) IDLE(ready,0.2%,29) AIN(delaying,25.3%,264), total 412.1%
Owned mutexes: HTTP(MAIN)
=== Platform ===
Last reset 29:39:08 ago, cause: software
Last software reset at 2022-01-02 23:34, reason: User, none spinning, available RAM 30824, slot 2
Software reset code 0x0012 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00487000 BFAR 0xe000ed38 SP 0x00000000 Task SBC Freestk 0 n/a
Error status: 0x00
MCU revision 3, ADC conversions started 106756829, completed 106756829, timed out 0, errs 0
Step timer max interval 1479
MCU temperature: min 20.8, current 22.1, max 32.2
Supply voltage: min 23.7, current 23.9, max 24.2, under voltage events: 0, over voltage events: 0, power good: yes
Heap OK, handles allocated/used 99/0, heap memory allocated/used/recyclable 2048/84/84, gc cycles 0
Events: 0 queued, 0 completed
Driver 0: pos 31583, standstill, SG min 0, read errors 0, write errors 1, ifcnt 105, reads 2294, writes 14, timeouts 0, DMA errors 0, CC errors 0
Driver 1: pos 10279, standstill, SG min 0, read errors 0, write errors 1, ifcnt 188, reads 2287, writes 21, timeouts 0, DMA errors 0, CC errors 0
Driver 2: pos 20360, standstill, SG min 0, read errors 0, write errors 1, ifcnt 188, reads 2287, writes 21, timeouts 0, DMA errors 0, CC errors 0
Driver 3: pos 0, standstill, SG min 0, read errors 0, write errors 1, ifcnt 188, reads 2286, writes 21, timeouts 0, DMA errors 0, CC errors 0
Driver 4: pos 0, standstill, SG min 0, read errors 1, write errors 1, ifcnt 59, reads 2298, writes 9, timeouts 0, DMA errors 0, CC errors 0
Driver 5: pos 0, standstill, SG min 0, read errors 0, write errors 1, ifcnt 188, reads 2286, writes 21, timeouts 1, DMA errors 0, CC errors 0, failedOp 0x6c
Driver 6: pos 0, standstill, SG min 0, read errors 0, write errors 1, ifcnt 188, reads 2286, writes 21, timeouts 0, DMA errors 0, CC errors 0
Date/time: 2022-01-04 05:13:35
Cache data hit count 4294967295
Slowest loop: 1057.14ms; fastest: 0.09ms
=== Storage ===
Free file entries: 10
SD card 0 not detected, interface speed: 0.0MBytes/sec
SD card longest read time 0.0ms, write time 0.0ms, max retries 0
=== Move ===
DMs created 83, segments created 35, maxWait 31364ms, bed compensation in use: mesh, comp offset 0.000
=== MainDDARing ===
Scheduled moves 5538039, completed 5538039, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 2], CDDA state -1
=== AuxDDARing ===
Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== Heat ===
Bed heaters 0 -1, chamber heaters 3 -1, ordering errs 0
=== GCodes ===
Segments left: 0
Movement lock held by null
HTTP* is doing "M122" in state(s) 0
Telnet is idle in state(s) 0
File* is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger* is idle in state(s) 0
Queue* is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
Code queue is empty
=== CAN ===
Messages queued 960761, received 0, lost 0, boc 0
Longest wait 0ms for reply type 0, peak Tx sync delay 0, free buffers 17 (min 17), ts 533744/0/0
Tx timeouts 0,29,533743,0,0,426986 last cancelled message type 4514 dest 127
=== SBC interface ===
Transfer state: 4, failed transfers: 0, checksum errors: 0
RX/TX seq numbers: 31931/31931
SPI underruns 0, overruns 0
State: 5, disconnects: 1, timeouts: 1, IAP RAM available 0x0f918
Buffer RX/TX: 0/0-0
=== Duet Control Server ===
Duet Control Server v3.4-b7
Code buffer space: 4096
Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 0
Full transfers per second: 41.80, max time between full transfers: 48.2ms, max pin wait times: 32.3ms/10.7ms
Codes per second: 0.00
Maximum length of RX/TX data transfers: 3528/132 -
I forgot I took a screenshot at failure. Notice it thinks it is 100% complete, though Z is 25 when final should be 66 for this print. The heaters are off but have active temps. I have not seen that before.
-
I wonder if
failedOp 0x6c
in your M122 is relevant.dc42 has previously mentioned that those failedOp usually do not matter because the firmware retries. I do not know what that means, but perhaps it is just a red herring. it jumped out at me when I looked at your M122. Tho, I'm not used to RRF3+ m122 reports so maybe those are more normal than I know.
-
@bot I don’t know how to read that log either. Driver 5 is also only one with a timeout, so may be a real issue.
Driver 4 is unused which might explain the read error there.
-
-
@jltx were there any messages on the console when the print stopped?
There are various ways to turn the heaters off without changing the temperatures, for example a M0 command would do this. It would also stop the print. So perhaps the printer executed a M0 command. We saw this sort of thing very occasionally a few years ago when the user has a PanelDue attached and the wiring from it was picking up noise, but we haven't seen it since we enabled checksums on data received from PanelDue by default. So if you have a PanelDue attached, check that the M575 line in config.g that enables it uses S1 not S0, and uses the default baud rate 57600.
-
@dc42 no errors in console. I have a 12864 display. If an errant M0 were somehow emitted would that status screen show 100% complete?
-
@jltx said in Print ended prematurely 30 hrs in 3.4 beta 7 SBC:
@dc42 no errors in console. I have a 12864 display. If an errant M0 were somehow emitted would that status screen show 100% complete?
Yes if the M0 appeared to be in the file, then yes. If it came from outside, I'm not sure, but I suspect it would again.
Did you check the console for error messages? An error reading from the SD card would cause the job to be terminated prematurely. If you don't have "Do not close error messages automatically" selected in the DWC settings then the error message popup that you get when you are not on the Console page will close automatically.
-
@dc42 I did check the console and it had no errors reported. But I’m not clear that it captures anything when my laptop is asleep. I feel like it skips over anything incoming while asleep.
Is there a mode I can set the treats M0 like M1? I can’t ever think when I’d use M0. I never have before. More likely I think is your theory on SD read error (though I don’t know what that would happen either, just more likely). But wouldn’t that get captured in M122? Also the failure mode there should be a pause by firmware to allow for restart, e.g don’t turn off bed heater or motors and send warning. I was lucky that I happened to catch it. I was about to go to bed and it would have been ruined the next morning.