Canbus failure modes
-
What are the expected failure modes during a canbus disconnection? It doesn't currently appear to fail safely and I've seen some odd behaviours.
One behaviour I've noticed is that if a board disconnects during a homing routine it will repeat the entire routine again from the start. This is great, it's a safe way to handle the disconnection.
Another behaviour which is not as safe, is where during a 2 axis move I think the board for one axis disconnected. The other axis carried on. The axis which disconnected seemed to assume that it had carried out a move which it actually hadn't so it had incorrect data of where it was and crashed when it performed the next move.
I've touched on another way in which I've had it fail not-safely here:
https://forum.duet3d.com/topic/33687/canbus-intermittent-disconnectionsI would really like to know what the expected behaviour is for canbus disconnections. It would also be great to have some features implemented to make them safer. Of course I'll try to fix the root cause of the disconnection but since there's always a risk of a wire breaking (for example), safe failure modes are important.
-
@jjem said in Canbus failure modes:
One behaviour I've noticed is that if a board disconnects during a homing routine it will repeat the entire routine again from the start. This is great, it's a safe way to handle the disconnection.
I suspect that's because if you try to home all axes, it first runs homeall.g, then if any axes are still not homed it runs the individual homing files for those axes.
Another behaviour which is not as safe, is where during a 2 axis move I think the board for one axis disconnected. The other axis carried on. The axis which disconnected seemed to assume that it had carried out a move which it actually hadn't so it had incorrect data of where it was and crashed when it performed the next move.
How is this different from a stepper motor wire becoming disconnected?
Since RRF 3.4.0, if the main board stops receiving data from a CAN-connected expansion board for more than 5 seconds then an event of type "expansion_timeout" is raised. If a CAN-connected board announces itself again unexpectedly (probably because it has just reset) then an event of type "expansion_reconnect" is raised. So you do have some ability to detect CAN bus issues and tale some action.
-
@dc42 It was just a single axis I was trying to home. Sometimes it would home, do its bounce off, and then would appear to loose connection before doing the final slower homing move. When this happened it would appear to run the entire homing routine again for that axis.
I see your point about stepper motor wires, however I think this is different because it seems to be signal integrity problem rather than a broken connection, which isn't something I've ever had issues with on stepper motors. Also, since you can daisy-chain CAN boards, an issue with a single wire could also knock out lots of motors and sensors rather than a single component.
Thank you for raising the event types. I had not spotted that CAN disconnections were included. Looking at it now it says (RRF 3.5.0-beta.4 and later only) though, not RRF 3.4.0. Which one is it? I'm not against upgrading to the beta firmware if that's what it takes, however we are finding that the disconnections often last a lot less than 5s though, so I don't know if it's applicable to this scenario.