Solved DSF 6HC Pi Disconnections over SPI
-
I have a non-printer system consisting of 13 stepper motors across 1 6HC and 3 3HCs.
The 6HC is in SBC mode with a Raspberry Pi running Bullseye.
All Duet boards are at 3.5.2 firmware.
We are using the Duet system in CNC mode as a motion control system to perform an assembly operation.The application consists of 70-ish macro files that are called from a "main.g" to perform an assembly process. A single cycle could consist of 50+ M98 commands being called from the main.g, depending on what is going on in the cycle.
Our cycle time is about 60 seconds. We run for about 100 minutes before we run out of parts and need to reload the machine and start up again.A PyQt python program is running on the Pi using Python-DSF to monitor the Duet and provide an HMI for the operator (think start/stop buttons, status indicators, cycles counts, etc.)
We are polling the full OM every 500ms using the Http Request Post method.
We send commands from the PyQt program using the CommandConnection method on an as needed basis. This only happens in manual mode and setup mode.Every so often, we lose connection to the 6HC - our connection goes from Idle or Busy to Disconnected.
It occurs at random parts of the process so we do not think it's related to a specific macro file but something else.When issuing M122 commands during the process, I can see the TfrRdy pin glitches value incrementing.
The glitch value increases by about 15 glitches per cycle.After reading through the forums a bit, we have tried:
- different ribbon cables (4 different cables from Duet 6HC boxes, so "factory")
- moving wiring away from ribbon cable connection points on the Pi and Duet
Our cable runs from the 6HC and under it to the Pi located right beside the 6HC.
We are mounted to an Aluminum backplane with brass standoffs for the 6HC and 3HCs and plastic standoffs for the Pi.
The 6HC and 3HCs are powered by an industrial 24V power supply.
The Pi is powered by a 24V to 5V/5A buck convertor via USB-C connection.
We have 20 limits switches, a couple vacuum switches and photoeyes connected to the Duet boards either directly or through 24V relays to convert down as needed.
There are a handful of 24V outputs driven by the Duet to 5V relays.
We are using a NeoPixel LED strip connected to it's own 5V power supply.I'm at the point where I'm about to tear a strip of tinfoil off my hat to shroud the cable and see if that helps.
M122 at start of latest cycle:
M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode) Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA Used output buffers: 1 of 40 (24 max) === RTOS === Static ram: 155360 Dynamic ram: 96808 of which 88 recycled Never used RAM 20152, free system stack 146 words Tasks: SBC(2,ready,1.6%,699) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.1%,211) CanReceiv(6,nWait 1,0.0%,794) CanSender(5,nWait 7,0.0%,329) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.5%,55) MAIN(2,running,87.6%,444) IDLE(0,ready,1.2%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:06:46 ago, cause: software Last software reset at 2024-06-25 14:14, reason: User, Gcodes spinning, available RAM 20472, slot 2 Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU temperature: min 39.6, current 40.5, max 40.6 Supply voltage: min 22.2, current 23.7, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.3, max 12.7, under voltage events: 0 Heap OK, handles allocated/used 297/229, heap memory allocated/used/recyclable 6144/4524/372, gc cycles 4 Events: 0 queued, 0 completed Driver 0: standstill, SG min 0, mspos 792, reads 6434, writes 17 timeouts 0 Driver 1: standstill, SG min 0, mspos 312, reads 6434, writes 17 timeouts 0 Driver 2: standstill, SG min 0, mspos 520, reads 6434, writes 17 timeouts 0 Driver 3: standstill, SG min 0, mspos 392, reads 6430, writes 21 timeouts 0 Driver 4: standstill, SG min 0, mspos 856, reads 6430, writes 21 timeouts 0 Driver 5: standstill, SG min 0, mspos 472, reads 6430, writes 21 timeouts 0 Date/time: 2024-06-25 14:21:08 Slowest loop: 44.04ms; fastest: 0.05ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 9, maxWait 74593ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 next step interrupt due in 1144749 ticks, disabled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 168, completed 166, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 10], CDDA state 3 === DDARing 1 === Scheduled moves 12, completed 12, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 4], CDDA state -1 === Heat === Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by SBC, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is doing "M400" in state(s) 0 0 0 0, running macro Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000600 Q1 segments left 0, axes/extruders owned 0x0001000 Code queue 1 is empty === CAN === Messages queued 3842, received 10135, lost 0, errs 1, boc 0 Longest wait 1ms for reply type 6018, peak Tx sync delay 68, free buffers 50 (min 48), ts 2032/2031/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 0, checksum errors: 0 RX/TX seq numbers: 32144/32144 SPI underruns 0, overruns 0 State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x24cfc Buffer RX/TX: 24/72-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit) HTTP+Executed: > Executing M122 SBC: Buffered code: M400 Buffered code: M598 Buffered codes: 48 bytes total >> Doing macro main.g, started by M98 P"main.g" >> Number of flush requests: 1 >>> Doing macro nt/wrap_bundle.g, started by M98 P"nt/wrap_bundle.g" >>> Suspended code: G90 >>> Suspended code: G1 W0 F5000 >>> Suspended code: G91 >>>> Doing macro nt/roll_1.g, started by M98 P"nt/roll_1.g" >>>> Number of flush requests: 1 Code buffer space: 4024 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 36 Full transfers per second: 80.39, max time between full transfers: 47.4ms, max pin wait times: 33.6ms/13.1ms Codes per second: 3.53 Maximum length of RX/TX data transfers: 6055/908
M122 many cycles later:
6/25/2024, 3:07:06 PM M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode) Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA Used output buffers: 1 of 40 (33 max) === RTOS === Static ram: 155360 Dynamic ram: 97000 of which 32 recycled Never used RAM 20016, free system stack 134 words Tasks: SBC(2,rWait:,2.1%,699) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.2%,211) CanReceiv(6,nWait 1,0.0%,794) CanSender(5,nWait 7,0.0%,329) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.6%,53) MAIN(2,running,86.2%,444) IDLE(0,ready,1.8%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:52:44 ago, cause: software Last software reset at 2024-06-25 14:14, reason: User, Gcodes spinning, available RAM 20472, slot 2 Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU temperature: min 40.3, current 40.5, max 40.9 Supply voltage: min 22.2, current 23.7, max 23.8, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.3, max 12.7, under voltage events: 0 Heap OK, handles allocated/used 297/227, heap memory allocated/used/recyclable 6144/4132/20, gc cycles 47 Events: 0 queued, 0 completed Driver 0: standstill, SG min 0, mspos 776, reads 146, writes 0 timeouts 0 Driver 1: ok, SG min 0, mspos 747, reads 146, writes 0 timeouts 0 Driver 2: standstill, SG min 0, mspos 136, reads 146, writes 0 timeouts 0 Driver 3: standstill, SG min 0, mspos 40, reads 147, writes 0 timeouts 0 Driver 4: standstill, SG min 0, mspos 472, reads 147, writes 0 timeouts 0 Driver 5: standstill, SG min n/a, mspos 712, reads 147, writes 0 timeouts 0 Date/time: 2024-06-25 15:07:06 Slowest loop: 36.73ms; fastest: 0.05ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 9, maxWait 8547ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 next step interrupt due in 12 ticks, disabled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 1889, completed 1888, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 45], CDDA state 3 === DDARing 1 === Scheduled moves 252, completed 252, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 18], CDDA state -1 === Heat === Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by null, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is idle in state(s) 0 0 0, running macro Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000004 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0001000 Code queue 1 is empty === CAN === Messages queued 5042, received 13141, lost 0, errs 0, boc 0 Longest wait 1ms for reply type 6061, peak Tx sync delay 142, free buffers 50 (min 48), ts 2622/2622/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 5, checksum errors: 5 RX/TX seq numbers: 60875/60875 SPI underruns 7, overruns 5 State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x24cfc Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit) HTTP+Executed: > Executing M122 SBC: >> Doing macro main.g, started by M98 P"main.g" >> Number of flush requests: 1 >>> Doing macro pick_place/place_bundle.g, started by M98 P"pick_place/place_bundle.g" Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 662 Full transfers per second: 79.55, max time between full transfers: 53.9ms, max pin wait times: 33.9ms/9.9ms Codes per second: 5.09 Maximum length of RX/TX data transfers: 8176/908
So the glitches are increasing but the system is still running. Then, out of the blue, it will stop and we get Disconnected in the DWC and in our PyQqt python program. We restart DCS and the machine starts up again.
Here the DCS log when the latest disconnection happened and we restarted it:
-- Journal begins at Wed 2024-06-19 11:43:20 EDT, ends at Tue 2024-06-25 15:13:49 EDT. -- Jun 25 15:03:09 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_actuators.g on channel SBC Jun 25 15:03:11 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file functions/set_actuators.g Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Bad data CRC32 (expected 0x6610709d, got 0xffdb3c31) Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Restarting full transfer because an unexpected response code has been received (code 0x00000001) Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Bad data CRC32 (expected 0x6610709d, got 0x9aaf5092) Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Restarting full transfer because an unexpected response code has been received (code 0x00000001) Jun 25 15:03:16 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_bundle_grip.g on channel SBC Jun 25 15:03:17 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file functions/set_bundle_grip.g Jun 25 15:03:19 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_bundle_grip.g on channel SBC Jun 25 15:03:20 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file functions/set_bundle_grip.g Jun 25 15:03:21 A1000-2 DuetControlServer[15703]: [info] Starting macro file led/solid.g on channel SBC Jun 25 15:03:21 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file led/solid.g Jun 25 15:09:24 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_gripper.g on channel SBC Jun 25 15:09:24 A1000-2 DuetControlServer[15703]: [warn] Bad data CRC32 (expected 0x6dbac3ae, got 0x5e32a259) Jun 25 15:09:24 A1000-2 DuetControlServer[15703]: [warn] Lost connection to Duet (Timeout while waiting for transfer ready pin) Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Aborting orphaned macro file functions/set_gripper.g Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [info] Aborted macro file functions/set_gripper.g Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Failed to find corresponding state for code flush request, falling back to current state Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: ==> Cancelling unfinished starting code: M98 P"functions/set_gripper.g" X"Off" Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Aborting orphaned macro file pick_place/place_utensil.g Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [info] Aborted macro file pick_place/place_utensil.g Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: ==> Cancelling unfinished starting code: M98 P"pick_place/place_utensil.g" X"Knife" Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Aborting orphaned macro file main.g Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [info] Aborted macro file main.g Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: ==> Cancelling unfinished starting code: M98 P"main.g" Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Failed to find suitable stack level for flush request, falling back to current one Jun 25 15:13:34 A1000-2 systemd[1]: duetcontrolserver.service: Main process exited, code=exited, status=70/SOFTWARE Jun 25 15:13:34 A1000-2 systemd[1]: duetcontrolserver.service: Failed with result 'exit-code'. Jun 25 15:13:34 A1000-2 systemd[1]: duetcontrolserver.service: Consumed 9min 19.410s CPU time. Jun 25 15:13:39 A1000-2 systemd[1]: duetcontrolserver.service: Scheduled restart job, restart counter is at 10. Jun 25 15:13:39 A1000-2 systemd[1]: Stopped Duet Control Server. Jun 25 15:13:39 A1000-2 systemd[1]: duetcontrolserver.service: Consumed 9min 19.410s CPU time. Jun 25 15:13:39 A1000-2 systemd[1]: Starting Duet Control Server... Jun 25 15:13:39 A1000-2 DuetControlServer[20004]: Duet Control Server v3.5.2 Jun 25 15:13:39 A1000-2 DuetControlServer[20004]: Written by Christian Hammacher for Duet3D Jun 25 15:13:39 A1000-2 DuetControlServer[20004]: Licensed under the terms of the GNU Public License Version 3 Jun 25 15:13:40 A1000-2 DuetControlServer[20004]: [info] Settings loaded Jun 25 15:13:41 A1000-2 DuetControlServer[20004]: [info] Environment initialized Jun 25 15:13:41 A1000-2 DuetControlServer[20004]: [info] Connection to Duet established Jun 25 15:13:41 A1000-2 DuetControlServer[20004]: [info] IPC socket created at /run/dsf/dcs.sock Jun 25 15:13:41 A1000-2 systemd[1]: Started Duet Control Server.
To get things to reset, I press Emergency Stop on DWC or stop/start DCS or power off/power on.
I am working on getting the Subscribe method working to poll the OM, as opposed to Http Request Post to cut down on the traffic over SPI.
Just wondering if there is something blatantly obvious that we are doing wrong in our implementation of dsf-python?
Are we hammering DWC too much/often? -
-
@davidjryan UPDATE
Subscribe method implemented so no more HTTP request POSTs, everything is DSF command connection and subscribe.
Subscribe full/patch interval is 0.1s after last successful response. I'm seeing about 3-4 subscribe updates per second. I added the 0.1s to give DCS a "breather" from the object model requests (full or patch).Still getting about the same amount of TfrRdy pin glitches per cycle.
I am waiting for the first disconnection.Latest M122
6/25/2024, 6:29:25 PM M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode) Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA Used output buffers: 1 of 40 (33 max) === RTOS === Static ram: 155360 Dynamic ram: 97000 of which 0 recycled Never used RAM 20048, free system stack 136 words Tasks: SBC(2,ready,2.2%,699) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.2%,211) CanReceiv(6,nWait 1,0.0%,794) CanSender(5,nWait 7,0.0%,329) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.5%,53) MAIN(2,running,86.1%,444) IDLE(0,ready,1.9%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 03:07:19 ago, cause: software Last software reset at 2024-06-25 15:22, reason: User, Platform spinning, available RAM 20016, slot 0 Software reset code 0x6000 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0044a000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU temperature: min 41.7, current 41.9, max 42.5 Supply voltage: min 22.1, current 23.6, max 23.8, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 11.9, current 12.3, max 12.7, under voltage events: 0 Heap OK, handles allocated/used 297/231, heap memory allocated/used/recyclable 6144/5116/948, gc cycles 156 Events: 0 queued, 0 completed Driver 0: standstill, SG min 0, mspos 776, reads 38193, writes 0 timeouts 0 Driver 1: standstill, SG min 0, mspos 488, reads 38193, writes 0 timeouts 0 Driver 2: standstill, SG min 0, mspos 232, reads 38193, writes 0 timeouts 0 Driver 3: standstill, SG min 0, mspos 984, reads 38193, writes 0 timeouts 0 Driver 4: standstill, SG min 0, mspos 488, reads 38193, writes 0 timeouts 0 Driver 5: standstill, SG min n/a, mspos 504, reads 38194, writes 0 timeouts 0 Date/time: 2024-06-25 18:29:24 Slowest loop: 44.63ms; fastest: 0.05ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 9, maxWait 8474ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 next step interrupt due in 128657 ticks, disabled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 6261, completed 6257, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 62], CDDA state 3 === DDARing 1 === Scheduled moves 883, completed 882, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 23], CDDA state 3 === Heat === Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by null, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is doing "M42 P6 S0" in state(s) 0 0 0 0 0, running macro Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 1, axes/extruders owned 0x0000600 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0001000 Code queue 1 is empty === CAN === Messages queued 6367, received 16618, lost 0, errs 0, boc 0 Longest wait 1ms for reply type 6061, peak Tx sync delay 10, free buffers 50 (min 48), ts 3310/3310/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 33, checksum errors: 36 RX/TX seq numbers: 12802/12802 SPI underruns 45, overruns 22 State: 5, disconnects: 3, timeouts: 3 total, 3 by SBC, IAP RAM available 0x24cfc Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit) HTTP+Executed: > Executing M122 SBC: Buffered code: M42 P6 S0 Buffered codes: 40 bytes total >> Doing macro main.g, started by M98 P"main.g" >> Number of flush requests: 1 >>> Doing macro nt/wrap_bundle.g, started by M98 P"nt/wrap_bundle.g" >>> Suspended code: G90 >>> Suspended code: G1 W0 F5000 >>> Suspended code: G91 >>>> Doing macro nt/roll_1.g, started by M98 P"nt/roll_1.g" >>>> Suspended code: M400 >>>> Suspended code: M598 >>>> Number of flush requests: 1 >>>>> Doing macro functions/set_actuators.g, started by M98 P"functions/set_actuators.g" X"Retract" >>>>> Number of flush requests: 1 Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 729 Full transfers per second: 83.88, max time between full transfers: 50.7ms, max pin wait times: 37.0ms/22.5ms Codes per second: 5.00 Maximum length of RX/TX data transfers: 8080/600
-
@davidjryan one for @chrishamm
-
@davidjryan It looks like the link between Duet and SBC is picking up interference. I suggest you ground the entire machine properly and make sure you don't have any stepper wires very close to the ribbon cable. Shielding the ribbon cable may help as well. At least the number of checksum errors should be around 0 and on your setup that's way higher.
You don't really need that 0.1s delay but feel free to use it. DWC can use that as well to avoid excessive rendering load, although it's turned off by default.
-
@chrishamm UPDATE
Changes since last message:
We have rotated our 6HC board so now the Duet and Pi connections are 50mm away from one another. We are still using the Duet supplied cable. The Pi is on a Raspberry Pi supplied power supply (so not on our 24V to 5V buck transformer anymore).The rest of my system is grounded to the earth ground of the 24V power supply which is grounded to the 120V outlet which is ground to earth. When I put a meter on any ground test point and the cabinet or earth ground, I have continuity.
When you say "ground" the machine properly, what can I do for the Duet? There is a SHIELD_GND lug on the Duet between the USB and ethernet connectors. I saw the forum thread that it's for shielded ethernet cables. Would running a ground wire from my machine ground to this lug help at all?
My wiring setup is in the bottom image below.
What causes TfrRdy pin glitches, failed transfers and checksum errors (as reported by M122)?
Which of the above is more concerning or are they all interrelated?
Are any of the above programming related or are they hardware related or a combo?
Should I be looking at my programming?I have a testbed Duet 6HC on bench with a single motor attached to it. I use it to test 6HC\3HC\1HCL boards, motors, and such before we put them on our machines. The Pi 4 and Duet are on plastic standoffs just loosely sitting on a table. It uses the same power supply as our machines and the same software code for polling the object model and interacting with the Duet. After running all night and 215000 continuous object model reads and other DCS interactions, there are 2 TrfRdy pin glitches and checksum errors on that setup. Granted, the best bench does not exercise the 6HC like the machine does nor do we have 13 motors on it nor are there 3-5 motors running simultaneously the majority of the time. I wrapped a stepper motor cable around the ribbon cable to try to "induce" noise but it has no effect.
M122 of the Testbed this morning after running for 13-ish hours:
6/27/2024, 8:48:20 AM M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.01 (SBC mode) Board ID: 08DJM-9P63L-DJMSS-6JTDA-3SN6S-1VHRB Used output buffers: 1 of 40 (31 max) === RTOS === Static ram: 155360 Dynamic ram: 94072 of which 40 recycled Never used RAM 89936, free system stack 176 words Tasks: SBC(2,ready,1.4%,821) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.0%,242) CanReceiv(6,nWait 1,0.0%,771) CanSender(5,nWait 7,0.0%,334) CanClock(7,delaying,0.0%,339) TMC(4,nWait 6,9.8%,53) MAIN(2,running,87.8%,444) IDLE(0,ready,0.9%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 16:18:55 ago, cause: software Last software reset at 2024-06-26 16:29, reason: User, Gcodes spinning, available RAM 90144, slot 0 Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0044a000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU temperature: min 46.0, current 46.0, max 46.3 Supply voltage: min 23.6, current 23.8, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.1, under voltage events: 0 Heap OK, handles allocated/used 297/202, heap memory allocated/used/recyclable 4096/4032/512, gc cycles 4 Events: 0 queued, 0 completed Driver 0: ok, SG min 0, mspos 276, reads 59271, writes 0 timeouts 0 Driver 1: standstill, SG min n/a, mspos 8, reads 59272, writes 0 timeouts 0 Driver 2: standstill, SG min n/a, mspos 8, reads 59272, writes 0 timeouts 0 Driver 3: standstill, SG min n/a, mspos 8, reads 59272, writes 0 timeouts 0 Driver 4: standstill, SG min n/a, mspos 8, reads 59272, writes 0 timeouts 0 Driver 5: standstill, SG min n/a, mspos 8, reads 59272, writes 0 timeouts 0 Date/time: 2024-06-27 08:48:20 Slowest loop: 30.37ms; fastest: 0.06ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 3, maxWait 1555ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 next step interrupt due in 6 ticks, disabled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 58, completed 57, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state 3 === DDARing 1 === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by SBC, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is doing "G4 P1" in state(s) 0 0 0, running macro Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000001 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0000000 Code queue 1 is empty === CAN === Messages queued 54, received 86, lost 0, errs 0, boc 0 Longest wait 0ms for reply type 0, peak Tx sync delay 6, free buffers 50 (min 50), ts 54/54/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 0, checksum errors: 0 RX/TX seq numbers: 61209/61209 SPI underruns 0, overruns 0 State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x24cfc Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit) HTTP+Executed: > Executing M122 SBC+ProcessInternally: >> Macro testx.g: Executing set var.numLoops = var.numLoops - 1 >>> Macro testx2.g: Executing set global.sCycleStatus = "Finished testx2" SBC: Buffered code: G4 P1 Buffered codes: 32 bytes total >> Doing macro testx.g, started by M98 P"testx.g" >> Number of flush requests: 1 >>> Doing macro testx2.g, started by M98 P"testx2.g" >>> Number of flush requests: 1 Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 2 Full transfers per second: 42.77, max time between full transfers: 34.9ms, max pin wait times: 29.6ms/2.5ms Codes per second: 4.26 Maximum length of RX/TX data transfers: 7564/520
Test Bed Wiring
Machine Wiring
-
@davidjryan UPDATE
Just wondering if anyone in-the-know from Duet can comment on the questions in my last post?
We grounded the SHIELD_GND lug between the USB and ethernet ports, no noticeable effect.
The last change was to put two separate 24V power supplies on the machine. One is for the Duet 6HC and 3HCs, one is for all other 24V/12V/5V requirements, and the Pi is on it's standalone "wall wart" power supply.
Since then, we've only had one communication disconnection. So it "seems" to be happening less, but..........
The TrfReady pin glitches are still high as are the checksum errors. We keep waiting for the other shoe to drop...Our test bed ran all weekend with 1,200,000+ object model queries plus 125,000+ M122 queries (one a second), plus 20,000+ M98 P"xxxx" commands (calls the same macro every time which moves a single axis for 10s, every 20s). There were 33 pin glitches and 0 checksum errors. Again, not apples to apples with our machine but we have somewhat of a control to compare to and to try different things suggested by folks.
Our next change is to shorten the ribbon cable between the Pi and Duet. We just put it on and it's half the length of the cable included with the 6HC.
Here are the last two M122 back-to-back queries after the addition of the second power supply but before the ribbon change. The M122 from 6/28 is also AFTER the last disconnection AFTER the second power supply was added. 7/1 is from this morning, so it's been almost 500 cycles since the last disconnection. We've had stretches like that in the past with our "original" setup, so we will continue to see if it'll get past 1000 cycles.
If we see more disconnections, I'll try swapping 6HCs and then Pi 4s.
7/1/2024, 10:30:17 AM M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode) Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA Used output buffers: 1 of 40 (33 max) === RTOS === Static ram: 155360 Dynamic ram: 97000 of which 0 recycled Never used RAM 19976, free system stack 134 words Tasks: SBC(2,rWait:,1.9%,697) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.2%,211) CanReceiv(6,nWait 1,0.0%,771) CanSender(5,nWait 7,0.0%,325) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.5%,53) MAIN(2,running,87.0%,444) IDLE(0,ready,1.4%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 72:28:31 ago, cause: software Last software reset at 2024-06-28 10:01, reason: User, Gcodes spinning, available RAM 20048, slot 0 Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU temperature: min 33.2, current 34.2, max 34.7 Supply voltage: min 23.9, current 24.0, max 24.2, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 11.9, current 12.4, max 12.9, under voltage events: 0 Heap OK, handles allocated/used 297/229, heap memory allocated/used/recyclable 6144/5044/916, gc cycles 689 Events: 0 queued, 0 completed Driver 0: standstill, SG min 0, mspos 488, reads 6199, writes 8 timeouts 0 Driver 1: ok, SG min 0, mspos 548, reads 6199, writes 8 timeouts 0 Driver 2: standstill, SG min 0, mspos 984, reads 6195, writes 12 timeouts 0 Driver 3: standstill, SG min 0, mspos 232, reads 6198, writes 8 timeouts 0 Driver 4: standstill, SG min 0, mspos 1000, reads 6198, writes 8 timeouts 0 Driver 5: standstill, SG min 0, mspos 680, reads 6199, writes 8 timeouts 0 Date/time: 2024-07-01 10:30:16 Slowest loop: 49.94ms; fastest: 0.05ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 12, maxWait 613674ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 next step interrupt due in 4 ticks, disabled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 27957, completed 27956, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 343], CDDA state 3 === DDARing 1 === Scheduled moves 3949, completed 3949, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 118], CDDA state -1 === Heat === Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by SBC, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is doing "M400" in state(s) 0 0 0, running macro Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000004 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0001000 Code queue 1 is empty === CAN === Messages queued 44350, received 116164, lost 0, errs 0, boc 0 Longest wait 1ms for reply type 6061, peak Tx sync delay 380, free buffers 50 (min 48), ts 23450/23450/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 33, checksum errors: 33 RX/TX seq numbers: 64962/64962 SPI underruns 27, overruns 0 State: 5, disconnects: 1, timeouts: 1 total, 1 by SBC, IAP RAM available 0x24cfc Buffer RX/TX: 140/220-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit) HTTP+Executed: > Executing M122 SBC: Buffered code: M400 Buffered code: M98 P"functions/check_pause.g" Buffered codes: 80 bytes total >> Doing macro main.g, started by M98 P"main.g" >> Number of flush requests: 1 >>> Doing macro pick_place/pick_utensil.g, started by M98 P"pick_place/pick_utensil.g" X"Slot 3" >>> Number of flush requests: 1 Code buffer space: 3876 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 6163 Full transfers per second: 79.27, max time between full transfers: 62.6ms, max pin wait times: 46.7ms/21.3ms Codes per second: 3.77 Maximum length of RX/TX data transfers: 8052/652 6/28/2024, 4:18:16 PM M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode) Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA Used output buffers: 1 of 40 (31 max) === RTOS === Static ram: 155360 Dynamic ram: 97000 of which 0 recycled Never used RAM 20048, free system stack 134 words Tasks: SBC(2,nWait 7,2.2%,697) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.2%,211) CanReceiv(6,nWait 1,0.0%,771) CanSender(5,nWait 7,0.0%,326) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.5%,53) MAIN(1,running,86.1%,444) IDLE(0,ready,1.9%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 06:16:30 ago, cause: software Last software reset at 2024-06-28 10:01, reason: User, Gcodes spinning, available RAM 20048, slot 0 Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU temperature: min 35.4, current 36.0, max 36.4 Supply voltage: min 23.9, current 24.0, max 24.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 11.9, current 12.3, max 12.7, under voltage events: 0 Heap OK, handles allocated/used 297/227, heap memory allocated/used/recyclable 6144/6036/1920, gc cycles 367 Events: 0 queued, 0 completed Driver 0: standstill, SG min 0, mspos 728, reads 42327, writes 0 timeouts 0 Driver 1: standstill, SG min 0, mspos 232, reads 42326, writes 0 timeouts 0 Driver 2: standstill, SG min 0, mspos 968, reads 42326, writes 0 timeouts 0 Driver 3: standstill, SG min 0, mspos 600, reads 42326, writes 0 timeouts 0 Driver 4: standstill, SG min 0, mspos 216, reads 42327, writes 0 timeouts 0 Driver 5: standstill, SG min n/a, mspos 920, reads 42327, writes 0 timeouts 0 Date/time: 2024-06-28 16:18:15 Slowest loop: 47.79ms; fastest: 0.05ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 9, maxWait 8486ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 no step interrupt scheduled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 15081, completed 15081, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 131], CDDA state -1 === DDARing 1 === Scheduled moves 2157, completed 2157, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 46], CDDA state -1 === Heat === Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by null, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is idle in state(s) 0 0 0, running macro Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000005 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0001000 Code queue 1 is empty === CAN === Messages queued 12198, received 31740, lost 0, errs 0, boc 0 Longest wait 1ms for reply type 6061, peak Tx sync delay 148, free buffers 50 (min 48), ts 6347/6347/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 1, failed transfers: 19, checksum errors: 19 RX/TX seq numbers: 13186/13186 SPI underruns 15, overruns 0 State: 1, disconnects: 1, timeouts: 1 total, 1 by SBC, IAP RAM available 0x24cfc Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit) HTTP+Executed: > Executing M122 SBC: >> Doing macro main.g, started by M98 P"main.g" >> Number of flush requests: 1 >>> Doing macro carts/check_cart_height.g, started by M98 P"carts/check_cart_height.g" Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 2989 Full transfers per second: 96.55, max time between full transfers: 54.4ms, max pin wait times: 40.5ms/23.4ms Codes per second: 4.91 Maximum length of RX/TX data transfers: 7788/640
-
@davidjryan said in DSF 6HC Pi Disconnections over SPI:
The last change was to put two separate 24V power supplies on the machine. One is for the Duet 6HC and 3HCs, one is for all other 24V/12V/5V requirements, and the Pi is on it's standalone "wall wart" power supply.
Generally, when using more than one power supply, we recommend connecting the negative terminals (ie GND, not earth) of the power supplies together, especially in the case of mainboards and toolboards connected via CAN. This eliminates the impact of a potential difference between boards where the power supplies are not in phase, or producing slightly different voltages, which can interrupt the low voltage signals between boards.
In an SBC setup, the ribbon cable connects GND between the boards, which should mitigate any potential difference between the Duet and RPi. So check the ground on both the Duet and RPi, and between them. As you're using a wall wart power supply for the Pi, it may not have an earth, so may rely on the GND from the Duet, so check this is a solid connection. It may also help to plug the wall wart power supply into a power socket as close to the power socket used for the Duet PSU, ie not into a separate wall socket, but into the same multi-way power cable.
This should, at least, make sure that it's not a power supply or ground issue causing the errors.
Ian
-
Update: 10-23-2024
After tabling this issue for the past couple of months, we had to revisit it when we performed a major upgrade to our system that caused the issue to come back in full force to the point where our "automatic" workaround (basically, detection of the issue and software reset) wouldn't work anymore.
We replaced a lot of our 24V hardware with new versions of the same devices and went to different, dual 24V power supplies as well.
After the upgrade, we ran into the disconnection issue almost immediately when trying to run the equipment. We would get tens, almost hundreds of TfrRdy pin glitches in matter of seconds and the Pi/Duet would disconnect within 30s. We disconnected one 24V device at a time and ran some tests until we found that a brushed 24V/2A vacuum pump was putting out excessive noise on the 24V wire which was feeding back to the Pi/Duet. We put a capacitor across the pump's terminals and the TfrRdy pin glitches went to 0. We've been running for a couple of days now and we haven't had a single disconnection between the Pi and Duet.
It looks like our problem was excessive noise generated by poor component selection/sourcing. We tested the other pumps and devices on the system and we are seeing varying amounts of noise but none to the extent of that one particular vacuum pump.
I'll mark this one as solved!
-