Duet sometimes really slow? - I2C error or?
-
This just happened again, this time I was sitting in front of the printer when it happened. First the pauses, then eventually (after a couple of minutes) the motor phases warning. This time it was phase B for drivers 1 and 8.
This time I managed to capture the M122 immediately as the problem happened (while the print was slowly progressing).
Lots of timeout errors...
I2C nak errors 0, send timeouts 511032, receive timeouts 0, finishTimeouts 511032I have also captured the behavior on video, before receiving my first motor phase warning:
https://www.youtube.com/watch?v=w1QNw_RJozMNote that as I write this, the printer is again working fine after clicking the emergency stop button and starting another print. I did not even leave enough time for the bed to cool down from the print that failed.
-
@fulg said in motor phase A may be disconnected reported by driver(s) 0 1 2:
I2C does not go through that GND wire.
No, it doesn't. But a degraded GND connection leads to I2C errors. It is recommended to use the thickest possible (12-14 AWG) solid core wire with the shortest possible length.
I2C issues are most certainly the cause for the slow/stop motion.
Also I have a hunch that since the "motor phase A/B disconnected" warning will only be issued if this state persists at least 500ms I2C timeout issues could confuse/influence the time-measurement. But that is more of a wild guess because I am everything but familiar how the code for I2C interacts with the rest of the code.
-
So I have changed my wiring from 20AWG to a single 16AWG pair (connected exactly like the documentation, a single pair from the PSU and short wires from the Duet to the Duex). The behavior has not changed, the wiring was not to blame.
Reproducing this is easy, leave the printer with the motors energized for 24-48 hours. Then eventually the I2C timeouts will take over and pause movements during a print, essentially ruining it. After that I get the random driver failures, which are likely caused by the I2C timeouts and not the root cause of the problem.
One thing that might be important, my printer uses 4 motors for Z, and all four are on the Duex. XYE are on the Duet, the rest are unused. So at any given time, 7 steppers are energized. Perhaps the recent rewrites in 2.02/2.03 have some unintended side effects?
The printer is not a new build, I have been printing for hundreds if not thousands of hours with RRF without problems. I will try going back to earlier RRF releases...
I have attached the new diagnostics output:
diagnostics.txtYou can find a complete copy of my config here.
-
@fulg said in motor phase A may be disconnected reported by driver(s) 0 1 2:
.......................................The printer is not a new build, I have been printing for hundreds if not thousands of hours with RRF without problems. I will try going back to earlier RRF releases..............................
It seems that unfortunately you too are experiencing these I2C errors. There are quite a few threads about this, the latest one (apart from this one) is here https://forum.duet3d.com/topic/10313/printer-pausing-between-commands/24. See @dc42s post date 07 May 2019, 9:13 and try that.
From what I have been able to discover, the earliest thread regarding this issue is from around July 2018. There are no threads or posts reporting I2C errors prior to that date. Did you by any chance change the firmware around the time you started seeing these problems? If so, do you happen to know what version you were one before?
-
@deckingman said in motor phase A may be disconnected reported by driver(s) 0 1 2:
From what I have been able to discover, the earliest thread regarding this issue is from around July 2018. There are no threads or posts reporting I2C errors prior to that date.
Now, that might be of importance. I checked and in July 2018 RRF 2.01beta1 was released. This means up until RRF 2.0 no one at least reported I2C issues and this at least hints to something change between 2.0 and 2.01beta1.
@dc42 You know your changes best. Is there anything in 2.01beta1 that persists until this day and also might be responsible for these issues?
Or wasM122
simply only extended to include these numbers? Changelog lists a couple of modifications toM122
but non regarding I2C.EDIT: found the discussion in the thread mentioned by @deckingman above and see that there are already cases with RRF 2.0.
-
@wilriker Yes. As I stated in the other thread that I linked to, it seems that there were no reports of any of these issue which can be associated with I2C misbehaviour until around the same time that the firmware changed to RTOS. It could of course be pure coincidence but it's the only common denominator that I have been able to find across multiple users with multiple generations of boards. Mine are of the very first pre-production boards, both Duet Ethernet and Duex5, and which had no problems from when I got them in December 2016 until around July 2018. Other users who have reported similar problems have more recent hardware.
-
@deckingman Thank you for this, I was feeling quite alone! I have since downgraded to 2.01, as I know 2.02 is also affected by this (I was discussing with someone else on Discord with the same printer / same problem and they are using 2.02).
I tend to upgrade RRF only when non-beta/non-RC versions are available, I don't know why I upgraded to 2.03beta3. Last year I was turning off the motors after each print, it is only recently that I've started to keep the steppers powered all the time.
EDIT: 2.03beta3 not RC3, this is especially important to get this right now that RC1 actually exists.
-
@fulg The earliest known report of similar issues was in this thread https://forum.duet3d.com/topic/6077/inconsistent-delays-during-homing-and-other-movements and the OP of that thread stated that he was using RRF 2.0. When I first reported my own experiences of these issues I was using 2.01 (RTOS) (2018-07-26b2). Of course, it may just be coincidence that there are no reports of any similar issues with firmware which pre-dates the 2.0 RTOS series.
-
@deckingman I found the other threads later and noticed the problem also occurs in 2.0. I had hopes when seeing the I2C resistors hack but sadly you later confirmed it does not fix this problem.
I will leave my 2.01 setup running for now until the next occurrence (to validate Duex detection with M115 after a soft reset, as asked in another thread), and then downgrade again to 1.21.
The repro is easy for me, I just leave the steppers powered all the time. As I said before my setup is somewhat unique in that I have 4 Z motors on the Duex that are constantly moving, most printers are not built like that. Sometimes I get the error within a few hours but so far I've always reproduced it with 1 or 2 days of having the steppers powered-but-idling.
I have never seen the problem occur when turning off the steppers after each print (even if the printer itself stayed on), although I suppose it could have happened mid-print and I was just lucky until now.
-
@fulg For me, it's a bit different. I've only had one or two print failures, neither of which was more than a few of hours long. Usually, I get the failure quite soon after powering up the printer during the first homing moves or shortly after. On many occasions that has been after the printer has been
idlepwered down for a few days. I have had the issue during homing, cycled the power, then homed again and done a 30 plus hour print without further problems. The way it manifests itself is long pauses between moves but not reports of motor phase errors.
It's this randomness, that makes it so hard to track down. Many users report different problems. Some have the Duex5 board well populated, others have only a few things connected. The only thing in common between users is the I2C errors. -
In case anyone is following this thread but not one of the other I2C error related threads, I can now confirm that running M199 to reset the firmware fixes the issue so no need to cycle the power. At least that's how it is for me.
I can also confirm that when running M115 immediately after M999, it reports that the Duex5 is present. Since I had the error and ran M199 to restart the firmware, without powering down, I've re-homed the printer and it is now happily printing away (at least for now).
-
@deckingman said in motor phase A may be disconnected reported by driver(s) 0 1 2:
In case anyone is following this thread but not one of the other I2C error related threads, I can now confirm that running M199 to reset the firmware fixes the issue so no need to cycle the power. At least that's how it is for me.
I can also confirm that when running M115 immediately after M999, it reports that the Duex5 is present. Since I had the error and ran M199 to restart the firmware, without powering down, I've re-homed the printer and it is now happily printing away (at least for now).
Thanks for confirming this. Now all I need to do is find a way to reproduce the problem so that I can test possible fixes.
-
@dc42 said in motor phase A may be disconnected reported by driver(s) 0 1 2:
Thanks for confirming this. Now all I need to do is find a way to reproduce the problem so that I can test possible fixes.
Best of luck with that !!. I certainly can't find a way a reproduce it - I wish I could. It seems to me like it's just something that randomly happens. But I'm more than happy to try anything you can think of to provoke it.
BTW, the print I started after restarting the firmware using M999 was a tad under 4 hours and completed without any further I2C errors or problems.
-
I thought I might have found a way to provoke the misbehaviour but now I'm not so sure.
I started by doing exactly the same thing as yesterday but without physically changing a roll of filament. That is, turn on printer, home all, drop bed 100mm using DWC, extrude 300mm of filament a 5mm/sec using DWC, retract 1mm, home all again. During the second home all, I got the dreaded pauses and I2C errors just like yesterday. I cleared this by resetting the firmware then ran M115 which confirmed that the Duex5 was recognised. Then ran the entire sequence again but this time, I didn't get any issues. The only thing different was that on the second run, I didn't start with the printer powered off. I'll try this when I get time later.
-
I had fitted my duex5 with 2.2k resistors, and have a very beefy and super short ground lines. I could not get to working on the machine due to hiwin rails being on backorder, now everything is assembled, I was running calibration movements to level out the heads and I had exact same thing happen, machine started moving slowly, i2c timeouts. I had previously tried toggling the fans repeatedly to try to trigger i2c timeouts or errors, and i2c was fine, the behavior that triggered it is switching tools and having that tool move to center to verify z alignment. Basically I'm virtually guaranteed to trigger i2c timeouts during a calibration session (about an hour of positioning) anything else I should do?
-
@dc42 I don't have a Duex but it seems this is a never ending story with the I2C timeouts. Did you consider doing a small redesign of Duet and Duex to use a chip like PCA9615?
I think event without a twisted pair cable at this length it should it should greatly increase resistance to any interference.Just my 2 cents, I know it will not help existing users but could help in the future.
-
@dragonn said in Duet sometimes really slow? - I2C error or?:
@dc42 I don't have a Duex but it seems this is a never ending story with the I2C timeouts. Did you consider doing a small redesign of Duet and Duex to use a chip like PCA9615?
I think event without a twisted pair cable at this length it should it should greatly increase resistance to any interference.Just my 2 cents, I know it will not help existing users but could help in the future.
Yes I expect the use of those chips would help. But it would require a redesign of both the Duet and the DueX, and new Duets wouldn't be compatible with older DueX boards, or vice versa.
As well as the extra I2C pullup resistors, another hardware change that might help is to put series resistors of around 100 to 180 ohms each in the two I2C conductors in the ribbon cable. If this works, it would be an easy change for us to make to future DueX boards.
I'm about to set up a Duet/DueX5 with a deliberately bad ground connection, to see if I can provoke this problem.
Ian has shown that when the problem occurs, the I2C subsystem can be recovered without a hardware reset. So recovering from this error should be possible.
-
@dc42 said in Duet sometimes really slow? - I2C error or?:
Ian has shown that when the problem occurs, the I2C subsystem can be recovered without a hardware reset. So recovering from this error should be possible.
Well a power cycle is a hardware reset even if it is done using SW to initiate it I think? Or were there any other SW way of recovering other than power cycle?
-
@martin1454 said in Duet sometimes really slow? - I2C error or?:
@dc42 said in Duet sometimes really slow? - I2C error or?:
Ian has shown that when the problem occurs, the I2C subsystem can be recovered without a hardware reset. So recovering from this error should be possible.
Well a power cycle is a hardware reset even if it is done using SW to initiate it I think? Or were there any other SW way of recovering other than power cycle?
There is another thread which has gone OT to be about this same issue. In that thread, Ian said at https://forum.duet3d.com/post/9349:
In case anyone is following this thread but not one of the other I2C error related threads, I can now confirm that running M199 to reset the firmware fixes the issue so no need to cycle the power. At least that's how it is for me.
I can also confirm that when running M115 immediately after M999, it reports that the Duex5 is present. Since I had the error and ran M199 to restart the firmware, without powering down, I've re-homed the printer and it is now happily printing away (at least for now).
I am assuming that Ian meant M999, not M199.
-
@dc42 ah okay - Will try a M999 next time