Canbus intermittent disconnections
-
Hello, I'm having CAN disconnection issues on a large cartesian system (1.5x1.5x0.3m travel yzx) which uses a Duet 6HC and 4 x 1HCL daisy chained together.
My questions are mostly:
- What can cause CAN signal disconnections?
- Is there a way to make it fail-safe when it does disconnect?
Here's more detail about the problem:
The Y axis is driven by 2 motors, one on each side. The order the boards are connected in is 6HC, then 1HCL for Y1, another for Y2, another for Z, and another for X.
Previously I've had an issue with Y2 board and onwards disconnecting mid-move, which has caused the axis to rack severely and damage itself. This has required using the e-stop, so it was difficult to troubleshoot the boards since they lost power.
Currently a new issue has cropped up where I'm running a series of moves over and over in a macro, moving between 3 different positions in YZ and then moving X in each one. The macro randomly stops sometimes, but when it does it's always in the same one of the 3 YZ positions. The duet reports that the X 1HCL board has disconnected, but when checking the lights on the board it seems connected, and if I send commands it responds. So it appears to briefly disconnect just for a moment, and that coincides with when a command is asked of it, causing the system to stop.
Checking the CAN wires, they have strain relief and there's plenty of slack in them, so the connector isn't getting pulled. I've manually pulled on the connector, moving it about, but haven't been able to force it to disconnect. I'm not sure what's causing this random disconnect, and always in the same position? Sometimes it happens a few minutes after starting the macro, sometimes it could take half an hour.So, back to my questions. What could cause disconnects? And how can we make sure when it does happen, it stops all motors, and doesn't let 1 Y motor carry on causing the system to rack?
-
This post is deleted! -
@Herve_Smith Haven't been able to spot anything mechanical so far. I've not been logging so far, but have turned that on now.
-
-
@jjem said in Canbus intermittent disconnections:
The duet reports that the X 1HCL board has disconnected
Next time that happens, send M122 B# where # is the address of that board, and look to see whether the reported Last Reset Time ties in with when the system was last started or restarted, or whether it is very recent. Also look at the last reset reason.
-
@dc42 Just did this in pronterface:
>>> m98 p"simulaterunning.g" SENDING:M98 P"simulaterunning.G"
After it carried out a few moves I got the following when it tried to home X, which it does after every move in the above macro:
Error: Response timeout: CAN addr 53, req type 6037, RID=253 [ERROR] Error: Response timeout: CAN addr 53, req type 6037, RID=253 Error: Failed to enable endstops [ERROR] Error: Failed to enable endstops
A few seconds later I sent M122:
>>> m122 b53 SENDING:M122 B53 Diagnostics for board 53: Duet EXP1HCL firmware version 3.4.6 (2023-07-21 14:14:45) Bootloader ID: SAME5x bootloader version 2.4 (2021-12-10) All averaging filters OK Never used RAM 52212, free system stack 173 words Tasks: Move(notifyWait<null>,0.0%,108) HEAT(notifyWait<null>,0.0%,108) CanAsync(notifyWait<null>,0.0%,65) CanRecv(notifyWait<null>,0.0%,79) CanClock(notifyWait<null>,0.0%,72) TMC(notifyWait<null>,31.5%,360) CLSend(notifyWait<null>,0.0%,152) MAIN(running<null>,66.6%,399) IDLE(ready<null>,0.0%,40) AIN(notifyWait<null>,1.9%,265), total 100.0% Last reset 00:09:49 ago, cause: power up Last software reset data not available Closed loop enabled: no, pre-error threshold: 0.00, error threshold: 0.00, encoder type none Driver 0: pos -1575, 5.0 steps/mm,standstill, SG min 0, mspos 128, reads 47822, writes 0 timeouts 0, steps req 5250 done 4200 Moves scheduled 126, completed 126, in progress 0, hiccups 0, step errors 0, maxPrep 40, maxOverdue 0, maxInc 0, mcErrs 0, gcmErrs 0 Peak sync jitter -8/5, peak Rx sync delay 716, resyncs 0/0, no step interrupt scheduled VIN voltage: min 47.2, current 47.4, max 47.5 V12 voltage: min 12.1, current 12.1, max 12.2 MCU temperature: min 37.2C, current 40.3C, max 40.3C Last sensors broadcast 0x00000000 found 0 25 ticks ago, 0 ordering errs, loop time 0 CAN messages queued 1540, send timeouts 0, received 1057, lost 0, free buffers 37, min 37, error reg b0000 dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 1417, adv 34985/36088
"Last reset 00:09:49 ago, cause: power up" does match with when I turned it on. I sent M122 only a few seconds after it had the disconnection, so it doesn't appear to have logged the disconnection. Is that expected behaviour?
-
@jjem please share the contents of simuaterunning.g. The M122 report indicates that board 53 didn't reboot.
-
@dc42 simulaterunning.g calls 18 macros repeatedly. Each macro homes x then moves to a different yz position. Here's a sample of 5 of the macros. The other 13 are identical with just different yz coordinates.
Does M122 just log if it reboots, but not if it disconnects?
G90 ;Set absolute positioning G28 X0 ;Home X axis G1 Z1549.7 Y227.52 F60000 ;move G90 ;Set absolute positioning G28 X0 ;Home X axis G1 Z864.5 Y869 F60000 ;move G90 ;Set absolute positioning G28 X0 ;Home X axis G1 Z179.3 Y387.5 F60000 ;move G90 ;Set absolute positioning G28 X0 ;Home X axis G1 Z1549.7 Y1029.37 F60000 ;move G90 ;Set absolute positioning G28 X0 ;Home X axis G1 Z179.3 Y1029.37 F60000 ;move
-
I've discovered that the disconnection always happens at a certain Y position. I'd assumed it was an issue with the cable to the X board since that was the one reporting the issue, but it seems even if the y or z boards disconnect, if it tried to do something on the x board it only reports an error on the x, not on the other boards. M122 doesn't show a reset on the other boards either though, so it's just a guess that it's actually the Z board disconnecting not the X (although that does cause the X to disconnect because they're chained).
So I'm going to try replacing the canbus cable on the Y axis which runs to the Z board since I think it may have a break in it since it has a poor connection in a particular Y position.
-
Cable to Z board has been replaced and that's fixed the problem. There wasn't a break in the wires, but when doing a continuity test between the ends and flexing it, the signal was intermittent. The new cable is more durable so should last longer without strands breaking.
It was difficult to troubleshoot because board 53 was reporting an error, when actually it was board 52 that had disconnected. When a board reports an error, would it be possible to make the mainboard check all other boards to report if any of them have disconnected as well? That way you could much more quickly figure out which the first board in the chain to disconnect was, which would make troubleshooting a lot easier.
-
@jjem I'm glad you solved it. If the CAN bus is wired in the order main board -> board 53 -> board 52 then a problem with the cable from the main board to board 53 will result in both board 52 and board 53 becoming disconnected. If the disconnection is short then only one of them may register in RRF.
-
-
-
@dc42 it was main board -> board 50 -> board 51 -> board 52 -> board 53 . The disconnection was between 51 and 52 but it only reported error for board 53. I don't have a great understanding of the workings of canbus. Does it have a heartbeat signal to check if boards are connected or will it only know it they're disconnected when trying to get them to do something?
-
@jjem in RRF 3.4 and later there is a heartbeat, but a disconnection shorter than 2 seconds may not be noticed.