I have a non-printer system consisting of 13 stepper motors across 1 6HC and 3 3HCs.
The 6HC is in SBC mode with a Raspberry Pi running Bullseye.
All Duet boards are at 3.5.2 firmware.
We are using the Duet system in CNC mode as a motion control system to perform an assembly operation.
The application consists of 70-ish macro files that are called from a "main.g" to perform an assembly process. A single cycle could consist of 50+ M98 commands being called from the main.g, depending on what is going on in the cycle.
Our cycle time is about 60 seconds. We run for about 100 minutes before we run out of parts and need to reload the machine and start up again.
A PyQt python program is running on the Pi using Python-DSF to monitor the Duet and provide an HMI for the operator (think start/stop buttons, status indicators, cycles counts, etc.)
We are polling the full OM every 500ms using the Http Request Post method.
We send commands from the PyQt program using the CommandConnection method on an as needed basis. This only happens in manual mode and setup mode.
Every so often, we lose connection to the 6HC - our connection goes from Idle or Busy to Disconnected.
It occurs at random parts of the process so we do not think it's related to a specific macro file but something else.
When issuing M122 commands during the process, I can see the TfrRdy pin glitches value incrementing.
The glitch value increases by about 15 glitches per cycle.
After reading through the forums a bit, we have tried:
- different ribbon cables (4 different cables from Duet 6HC boxes, so "factory")
- moving wiring away from ribbon cable connection points on the Pi and Duet
Our cable runs from the 6HC and under it to the Pi located right beside the 6HC.
We are mounted to an Aluminum backplane with brass standoffs for the 6HC and 3HCs and plastic standoffs for the Pi.
The 6HC and 3HCs are powered by an industrial 24V power supply.
The Pi is powered by a 24V to 5V/5A buck convertor via USB-C connection.
We have 20 limits switches, a couple vacuum switches and photoeyes connected to the Duet boards either directly or through 24V relays to convert down as needed.
There are a handful of 24V outputs driven by the Duet to 5V relays.
We are using a NeoPixel LED strip connected to it's own 5V power supply.
I'm at the point where I'm about to tear a strip of tinfoil off my hat to shroud the cable and see if that helps.
M122 at start of latest cycle:
M122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode)
Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA
Used output buffers: 1 of 40 (24 max)
=== RTOS ===
Static ram: 155360
Dynamic ram: 96808 of which 88 recycled
Never used RAM 20152, free system stack 146 words
Tasks: SBC(2,ready,1.6%,699) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.1%,211) CanReceiv(6,nWait 1,0.0%,794) CanSender(5,nWait 7,0.0%,329) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.5%,55) MAIN(2,running,87.6%,444) IDLE(0,ready,1.2%,29), total 100.0%
Owned mutexes: HTTP(MAIN)
=== Platform ===
Last reset 00:06:46 ago, cause: software
Last software reset at 2024-06-25 14:14, reason: User, Gcodes spinning, available RAM 20472, slot 2
Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a
Error status: 0x00
MCU temperature: min 39.6, current 40.5, max 40.6
Supply voltage: min 22.2, current 23.7, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 12.0, current 12.3, max 12.7, under voltage events: 0
Heap OK, handles allocated/used 297/229, heap memory allocated/used/recyclable 6144/4524/372, gc cycles 4
Events: 0 queued, 0 completed
Driver 0: standstill, SG min 0, mspos 792, reads 6434, writes 17 timeouts 0
Driver 1: standstill, SG min 0, mspos 312, reads 6434, writes 17 timeouts 0
Driver 2: standstill, SG min 0, mspos 520, reads 6434, writes 17 timeouts 0
Driver 3: standstill, SG min 0, mspos 392, reads 6430, writes 21 timeouts 0
Driver 4: standstill, SG min 0, mspos 856, reads 6430, writes 21 timeouts 0
Driver 5: standstill, SG min 0, mspos 472, reads 6430, writes 21 timeouts 0
Date/time: 2024-06-25 14:21:08
Slowest loop: 44.04ms; fastest: 0.05ms
=== Storage ===
Free file entries: 20
SD card 0 not detected, interface speed: 37.5MBytes/sec
SD card longest read time 0.0ms, write time 0.0ms, max retries 0
=== Move ===
DMs created 125, segments created 9, maxWait 74593ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00
next step interrupt due in 1144749 ticks, disabled
Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0
=== DDARing 0 ===
Scheduled moves 168, completed 166, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 10], CDDA state 3
=== DDARing 1 ===
Scheduled moves 12, completed 12, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 4], CDDA state -1
=== Heat ===
Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0
=== GCodes ===
Movement locks held by SBC, null
HTTP* is doing "M122" in state(s) 0
Telnet is idle in state(s) 0
File is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger* is idle in state(s) 0
Queue* is idle in state(s) 0
LCD is idle in state(s) 0
SBC* is doing "M400" in state(s) 0 0 0 0, running macro
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
File2 is idle in state(s) 0
Queue2 is idle in state(s) 0
Q0 segments left 0, axes/extruders owned 0x0000600
Q1 segments left 0, axes/extruders owned 0x0001000
Code queue 1 is empty
=== CAN ===
Messages queued 3842, received 10135, lost 0, errs 1, boc 0
Longest wait 1ms for reply type 6018, peak Tx sync delay 68, free buffers 50 (min 48), ts 2032/2031/0
Tx timeouts 0,0,0,0,0,0
=== SBC interface ===
Transfer state: 5, failed transfers: 0, checksum errors: 0
RX/TX seq numbers: 32144/32144
SPI underruns 0, overruns 0
State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x24cfc
Buffer RX/TX: 24/72-0, open files: 0
=== Duet Control Server ===
Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit)
HTTP+Executed:
> Executing M122
SBC:
Buffered code: M400
Buffered code: M598
Buffered codes: 48 bytes total
>> Doing macro main.g, started by M98 P"main.g"
>> Number of flush requests: 1
>>> Doing macro nt/wrap_bundle.g, started by M98 P"nt/wrap_bundle.g"
>>> Suspended code: G90
>>> Suspended code: G1 W0 F5000
>>> Suspended code: G91
>>>> Doing macro nt/roll_1.g, started by M98 P"nt/roll_1.g"
>>>> Number of flush requests: 1
Code buffer space: 4024
Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 36
Full transfers per second: 80.39, max time between full transfers: 47.4ms, max pin wait times: 33.6ms/13.1ms
Codes per second: 3.53
Maximum length of RX/TX data transfers: 6055/908
M122 many cycles later:
6/25/2024, 3:07:06 PM M122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.02 or later (SBC mode)
Board ID: 08DJM-9P63L-DJMSS-6JKD0-3SN6M-9UHZA
Used output buffers: 1 of 40 (33 max)
=== RTOS ===
Static ram: 155360
Dynamic ram: 97000 of which 32 recycled
Never used RAM 20016, free system stack 134 words
Tasks: SBC(2,rWait:,2.1%,699) HEAT(3,nWait 6,0.0%,351) Move(4,nWait 6,0.2%,211) CanReceiv(6,nWait 1,0.0%,794) CanSender(5,nWait 7,0.0%,329) CanClock(7,delaying,0.0%,346) TMC(4,nWait 6,9.6%,53) MAIN(2,running,86.2%,444) IDLE(0,ready,1.8%,29), total 100.0%
Owned mutexes: HTTP(MAIN)
=== Platform ===
Last reset 00:52:44 ago, cause: software
Last software reset at 2024-06-25 14:14, reason: User, Gcodes spinning, available RAM 20472, slot 2
Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a
Error status: 0x00
MCU temperature: min 40.3, current 40.5, max 40.9
Supply voltage: min 22.2, current 23.7, max 23.8, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 12.0, current 12.3, max 12.7, under voltage events: 0
Heap OK, handles allocated/used 297/227, heap memory allocated/used/recyclable 6144/4132/20, gc cycles 47
Events: 0 queued, 0 completed
Driver 0: standstill, SG min 0, mspos 776, reads 146, writes 0 timeouts 0
Driver 1: ok, SG min 0, mspos 747, reads 146, writes 0 timeouts 0
Driver 2: standstill, SG min 0, mspos 136, reads 146, writes 0 timeouts 0
Driver 3: standstill, SG min 0, mspos 40, reads 147, writes 0 timeouts 0
Driver 4: standstill, SG min 0, mspos 472, reads 147, writes 0 timeouts 0
Driver 5: standstill, SG min n/a, mspos 712, reads 147, writes 0 timeouts 0
Date/time: 2024-06-25 15:07:06
Slowest loop: 36.73ms; fastest: 0.05ms
=== Storage ===
Free file entries: 20
SD card 0 not detected, interface speed: 37.5MBytes/sec
SD card longest read time 0.0ms, write time 0.0ms, max retries 0
=== Move ===
DMs created 125, segments created 9, maxWait 8547ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00
next step interrupt due in 12 ticks, disabled
Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0
=== DDARing 0 ===
Scheduled moves 1889, completed 1888, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 45], CDDA state 3
=== DDARing 1 ===
Scheduled moves 252, completed 252, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 18], CDDA state -1
=== Heat ===
Bed heaters -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0
=== GCodes ===
Movement locks held by null, null
HTTP* is doing "M122" in state(s) 0
Telnet is idle in state(s) 0
File is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger* is idle in state(s) 0
Queue* is idle in state(s) 0
LCD is idle in state(s) 0
SBC* is idle in state(s) 0 0 0, running macro
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
File2 is idle in state(s) 0
Queue2 is idle in state(s) 0
Q0 segments left 0, axes/extruders owned 0x0000004
Code queue 0 is empty
Q1 segments left 0, axes/extruders owned 0x0001000
Code queue 1 is empty
=== CAN ===
Messages queued 5042, received 13141, lost 0, errs 0, boc 0
Longest wait 1ms for reply type 6061, peak Tx sync delay 142, free buffers 50 (min 48), ts 2622/2622/0
Tx timeouts 0,0,0,0,0,0
=== SBC interface ===
Transfer state: 5, failed transfers: 5, checksum errors: 5
RX/TX seq numbers: 60875/60875
SPI underruns 7, overruns 5
State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x24cfc
Buffer RX/TX: 0/0-0, open files: 0
=== Duet Control Server ===
Duet Control Server version 3.5.2 (2024-06-12 07:12:47, 64-bit)
HTTP+Executed:
> Executing M122
SBC:
>> Doing macro main.g, started by M98 P"main.g"
>> Number of flush requests: 1
>>> Doing macro pick_place/place_bundle.g, started by M98 P"pick_place/place_bundle.g"
Code buffer space: 4096
Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 662
Full transfers per second: 79.55, max time between full transfers: 53.9ms, max pin wait times: 33.9ms/9.9ms
Codes per second: 5.09
Maximum length of RX/TX data transfers: 8176/908
So the glitches are increasing but the system is still running. Then, out of the blue, it will stop and we get Disconnected in the DWC and in our PyQqt python program. We restart DCS and the machine starts up again.
Here the DCS log when the latest disconnection happened and we restarted it:
-- Journal begins at Wed 2024-06-19 11:43:20 EDT, ends at Tue 2024-06-25 15:13:49 EDT. --
Jun 25 15:03:09 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_actuators.g on channel SBC
Jun 25 15:03:11 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file functions/set_actuators.g
Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Bad data CRC32 (expected 0x6610709d, got 0xffdb3c31)
Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Restarting full transfer because an unexpected response code has been received (code 0x00000001)
Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Bad data CRC32 (expected 0x6610709d, got 0x9aaf5092)
Jun 25 15:03:13 A1000-2 DuetControlServer[15703]: [warn] Restarting full transfer because an unexpected response code has been received (code 0x00000001)
Jun 25 15:03:16 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_bundle_grip.g on channel SBC
Jun 25 15:03:17 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file functions/set_bundle_grip.g
Jun 25 15:03:19 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_bundle_grip.g on channel SBC
Jun 25 15:03:20 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file functions/set_bundle_grip.g
Jun 25 15:03:21 A1000-2 DuetControlServer[15703]: [info] Starting macro file led/solid.g on channel SBC
Jun 25 15:03:21 A1000-2 DuetControlServer[15703]: [info] SBC: Finished macro file led/solid.g
Jun 25 15:09:24 A1000-2 DuetControlServer[15703]: [info] Starting macro file functions/set_gripper.g on channel SBC
Jun 25 15:09:24 A1000-2 DuetControlServer[15703]: [warn] Bad data CRC32 (expected 0x6dbac3ae, got 0x5e32a259)
Jun 25 15:09:24 A1000-2 DuetControlServer[15703]: [warn] Lost connection to Duet (Timeout while waiting for transfer ready pin)
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Aborting orphaned macro file functions/set_gripper.g
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [info] Aborted macro file functions/set_gripper.g
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Failed to find corresponding state for code flush request, falling back to current state
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: ==> Cancelling unfinished starting code: M98 P"functions/set_gripper.g" X"Off"
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Aborting orphaned macro file pick_place/place_utensil.g
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [info] Aborted macro file pick_place/place_utensil.g
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: ==> Cancelling unfinished starting code: M98 P"pick_place/place_utensil.g" X"Knife"
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Aborting orphaned macro file main.g
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [info] Aborted macro file main.g
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: ==> Cancelling unfinished starting code: M98 P"main.g"
Jun 25 15:13:29 A1000-2 DuetControlServer[15703]: [warn] SBC: Failed to find suitable stack level for flush request, falling back to current one
Jun 25 15:13:34 A1000-2 systemd[1]: duetcontrolserver.service: Main process exited, code=exited, status=70/SOFTWARE
Jun 25 15:13:34 A1000-2 systemd[1]: duetcontrolserver.service: Failed with result 'exit-code'.
Jun 25 15:13:34 A1000-2 systemd[1]: duetcontrolserver.service: Consumed 9min 19.410s CPU time.
Jun 25 15:13:39 A1000-2 systemd[1]: duetcontrolserver.service: Scheduled restart job, restart counter is at 10.
Jun 25 15:13:39 A1000-2 systemd[1]: Stopped Duet Control Server.
Jun 25 15:13:39 A1000-2 systemd[1]: duetcontrolserver.service: Consumed 9min 19.410s CPU time.
Jun 25 15:13:39 A1000-2 systemd[1]: Starting Duet Control Server...
Jun 25 15:13:39 A1000-2 DuetControlServer[20004]: Duet Control Server v3.5.2
Jun 25 15:13:39 A1000-2 DuetControlServer[20004]: Written by Christian Hammacher for Duet3D
Jun 25 15:13:39 A1000-2 DuetControlServer[20004]: Licensed under the terms of the GNU Public License Version 3
Jun 25 15:13:40 A1000-2 DuetControlServer[20004]: [info] Settings loaded
Jun 25 15:13:41 A1000-2 DuetControlServer[20004]: [info] Environment initialized
Jun 25 15:13:41 A1000-2 DuetControlServer[20004]: [info] Connection to Duet established
Jun 25 15:13:41 A1000-2 DuetControlServer[20004]: [info] IPC socket created at /run/dsf/dcs.sock
Jun 25 15:13:41 A1000-2 systemd[1]: Started Duet Control Server.
To get things to reset, I press Emergency Stop on DWC or stop/start DCS or power off/power on.
I am working on getting the Subscribe method working to poll the OM, as opposed to Http Request Post to cut down on the traffic over SPI.
Just wondering if there is something blatantly obvious that we are doing wrong in our implementation of dsf-python?
Are we hammering DWC too much/often?