Duet sometimes really slow? - I2C error or?

fulg

@deckingman Thank you for this, I was feeling quite alone! I have since downgraded to 2.01, as I know 2.02 is also affected by this (I was discussing with someone else on Discord with the same printer / same problem and they are using 2.02).

I tend to upgrade RRF only when non-beta/non-RC versions are available, I don't know why I upgraded to 2.03beta3. Last year I was turning off the motors after each print, it is only recently that I've started to keep the steppers powered all the time.

EDIT: 2.03beta3 not RC3, this is especially important to get this right now that RC1 actually exists.

deckingman

@fulg The earliest known report of similar issues was in this thread https://forum.duet3d.com/topic/6077/inconsistent-delays-during-homing-and-other-movements and the OP of that thread stated that he was using RRF 2.0. When I first reported my own experiences of these issues I was using 2.01 (RTOS) (2018-07-26b2). Of course, it may just be coincidence that there are no reports of any similar issues with firmware which pre-dates the 2.0 RTOS series.

fulg

@deckingman I found the other threads later and noticed the problem also occurs in 2.0. I had hopes when seeing the I2C resistors hack but sadly you later confirmed it does not fix this problem.

I will leave my 2.01 setup running for now until the next occurrence (to validate Duex detection with M115 after a soft reset, as asked in another thread), and then downgrade again to 1.21.

The repro is easy for me, I just leave the steppers powered all the time. As I said before my setup is somewhat unique in that I have 4 Z motors on the Duex that are constantly moving, most printers are not built like that. Sometimes I get the error within a few hours but so far I've always reproduced it with 1 or 2 days of having the steppers powered-but-idling.

I have never seen the problem occur when turning off the steppers after each print (even if the printer itself stayed on), although I suppose it could have happened mid-print and I was just lucky until now.

deckingman

@fulg For me, it's a bit different. I've only had one or two print failures, neither of which was more than a few of hours long. Usually, I get the failure quite soon after powering up the printer during the first homing moves or shortly after. On many occasions that has been after the printer has been ~~idle~~ pwered down for a few days. I have had the issue during homing, cycled the power, then homed again and done a 30 plus hour print without further problems. The way it manifests itself is long pauses between moves but not reports of motor phase errors.
It's this randomness, that makes it so hard to track down. Many users report different problems. Some have the Duex5 board well populated, others have only a few things connected. The only thing in common between users is the I2C errors.

deckingman

In case anyone is following this thread but not one of the other I2C error related threads, I can now confirm that running M199 to reset the firmware fixes the issue so no need to cycle the power. At least that's how it is for me.

I can also confirm that when running M115 immediately after M999, it reports that the Duex5 is present. Since I had the error and ran M199 to restart the firmware, without powering down, I've re-homed the printer and it is now happily printing away (at least for now).

dc42

@deckingman said in motor phase A may be disconnected reported by driver(s) 0 1 2:

In case anyone is following this thread but not one of the other I2C error related threads, I can now confirm that running M199 to reset the firmware fixes the issue so no need to cycle the power. At least that's how it is for me.

I can also confirm that when running M115 immediately after M999, it reports that the Duex5 is present. Since I had the error and ran M199 to restart the firmware, without powering down, I've re-homed the printer and it is now happily printing away (at least for now).

Thanks for confirming this. Now all I need to do is find a way to reproduce the problem so that I can test possible fixes.

deckingman

@dc42 said in motor phase A may be disconnected reported by driver(s) 0 1 2:

Thanks for confirming this. Now all I need to do is find a way to reproduce the problem so that I can test possible fixes.

Best of luck with that !!. I certainly can't find a way a reproduce it - I wish I could. It seems to me like it's just something that randomly happens. But I'm more than happy to try anything you can think of to provoke it.

BTW, the print I started after restarting the firmware using M999 was a tad under 4 hours and completed without any further I2C errors or problems.

deckingman

I thought I might have found a way to provoke the misbehaviour but now I'm not so sure.

I started by doing exactly the same thing as yesterday but without physically changing a roll of filament. That is, turn on printer, home all, drop bed 100mm using DWC, extrude 300mm of filament a 5mm/sec using DWC, retract 1mm, home all again. During the second home all, I got the dreaded pauses and I2C errors just like yesterday. I cleared this by resetting the firmware then ran M115 which confirmed that the Duex5 was recognised. Then ran the entire sequence again but this time, I didn't get any issues. The only thing different was that on the second run, I didn't start with the printer powered off. I'll try this when I get time later.

kazolar

I had fitted my duex5 with 2.2k resistors, and have a very beefy and super short ground lines. I could not get to working on the machine due to hiwin rails being on backorder, now everything is assembled, I was running calibration movements to level out the heads and I had exact same thing happen, machine started moving slowly, i2c timeouts. I had previously tried toggling the fans repeatedly to try to trigger i2c timeouts or errors, and i2c was fine, the behavior that triggered it is switching tools and having that tool move to center to verify z alignment. Basically I'm virtually guaranteed to trigger i2c timeouts during a calibration session (about an hour of positioning) anything else I should do?

dragonn

@dc42 I don't have a Duex but it seems this is a never ending story with the I2C timeouts. Did you consider doing a small redesign of Duet and Duex to use a chip like PCA9615?
I think event without a twisted pair cable at this length it should it should greatly increase resistance to any interference.

Just my 2 cents, I know it will not help existing users but could help in the future.

dc42

@dragonn said in Duet sometimes really slow? - I2C error or?:

@dc42 I don't have a Duex but it seems this is a never ending story with the I2C timeouts. Did you consider doing a small redesign of Duet and Duex to use a chip like PCA9615?
I think event without a twisted pair cable at this length it should it should greatly increase resistance to any interference.

Just my 2 cents, I know it will not help existing users but could help in the future.

Yes I expect the use of those chips would help. But it would require a redesign of both the Duet and the DueX, and new Duets wouldn't be compatible with older DueX boards, or vice versa.

As well as the extra I2C pullup resistors, another hardware change that might help is to put series resistors of around 100 to 180 ohms each in the two I2C conductors in the ribbon cable. If this works, it would be an easy change for us to make to future DueX boards.

I'm about to set up a Duet/DueX5 with a deliberately bad ground connection, to see if I can provoke this problem.

Ian has shown that when the problem occurs, the I2C subsystem can be recovered without a hardware reset. So recovering from this error should be possible.

Martin1454

@dc42 said in Duet sometimes really slow? - I2C error or?:

Ian has shown that when the problem occurs, the I2C subsystem can be recovered without a hardware reset. So recovering from this error should be possible.

Well a power cycle is a hardware reset even if it is done using SW to initiate it I think? Or were there any other SW way of recovering other than power cycle?

dc42

@martin1454 said in Duet sometimes really slow? - I2C error or?:

@dc42 said in Duet sometimes really slow? - I2C error or?:

Ian has shown that when the problem occurs, the I2C subsystem can be recovered without a hardware reset. So recovering from this error should be possible.

Well a power cycle is a hardware reset even if it is done using SW to initiate it I think? Or were there any other SW way of recovering other than power cycle?

There is another thread which has gone OT to be about this same issue. In that thread, Ian said at https://forum.duet3d.com/post/9349:

In case anyone is following this thread but not one of the other I2C error related threads, I can now confirm that running M199 to reset the firmware fixes the issue so no need to cycle the power. At least that's how it is for me.

I can also confirm that when running M115 immediately after M999, it reports that the Duex5 is present. Since I had the error and ran M199 to restart the firmware, without powering down, I've re-homed the printer and it is now happily printing away (at least for now).

I am assuming that Ian meant M999, not M199.

Martin1454

@dc42 ah okay - Will try a M999 next time

kazolar

@dc42 whatever it's worth, whenever I've encountered this behavior a reboot via the stop button on the screen doesn't help.
Also I never ran into these slow behaviors until upgrading to 2.05/7, I will try the code base from prior to the code rewrite. 07 seems making it worse where the end stops just stop working instead of just being slow. Also I can't see how I can make my ground connection any better, literally as short a run as possible, I can't fit a thicker set of wires into a ferule (that would fit into the terminal block) than what's on there now.

deckingman

@dc42 said in Duet sometimes really slow? - I2C error or?:

I am assuming that Ian meant M999, not M199.

Ooops - yes indeed I did mean M999. I managed to type M999 correctly one time out of three

dragonn

@dc42

@dc42 said in Duet sometimes really slow? - I2C error or?:

Yes I expect the use of those chips would help. But it would require a redesign of both the Duet and the DueX, and new Duets wouldn't be compatible with older DueX boards, or vice versa.
Two jumpers on Duet and Duex that could be used to bypass the PCA chip would allow compatibility with older hardware. I know this adds an extra complexity but it maybe worth it.

deckingman

I might be homing in on a way to provoke this problem but it's difficult. For the third day running I've been able to get it to happen at exactly the same point.

The sequence is as my post above but I have to start with the machine powered off and it has to have been powered off over night. It only happens once a day. If I power down the machine and start the sequence again, it won't misbehave a second time but for 3 days running I've been able to do exactly the same sequence and get exactly the same problem at exactly the same point in the sequence. I've had my suspicions that it seems more likely to happen after the machine has been powered down for a considerable time but I have no idea why that should be. Capacitance that takes a long time to decay?? Not my area of expertise.

For 3 days running, the problem has started at the same point in the second iteration of my homeall file. So to reiterate, the sequence is as follows:

Turn on machine and connect to DWC (usig Firerfox)
Run home all through DWC
Drop bed 100mm through DWC
Heat hot end to 190 deg C
Extrude 300mm of filament through extruder 0 at 5mm/sec using DWC (as if loading a new reel of filament).
Retract 1mm at 5mm /sec through DWC
Run home all again through DWC.

My home all file is pretty complicated. Here it is in it's entirety with the point where the I2C errors (or at least the pauses between moves) occur, marked.

TO; select a tool - any one will do
M104 S140; heat to 140 but don't wait

;*****Home XYUV (lower 2 gantries)

M584 X0 U3 Y1 V4 P5; temporarily map drives to U and V axes

M906 X400 U400 Y400 V400 Z1200 ; reduce motor currents

G91 ; set to use relative coordinates

G1 Z5 F600 ; move bed down 5 mm

G1 X-380 U-380 Y-380 V-380 F4800 S1; move all 4 axes fairly quickly until one or other triggers a switch

G1 X-380 U-380 F4800 S1; now move just X and U fairly quickly left until one or other triggers a switch

G1 X-380 S1; course home X
G1 U-380 S1; course home U

G1 X10 U10 F600 ; Go back a few mm

G1 X-380 U-380 F360 S1; Move slowly to X and U axis endstops once more and stop when one triggers

G1 X-380 F360 S1 ; fine home X
G1 U-380 F360 S1 ; fine home U

G1 Y-380 V-380 F4800 S1; now move Y and V fairly quickly until one or other triggers a switch

G1 Y-380 S1; course home Y
G1 V-380 S1; course home V

G1 Y10 V10 F600; Go back a few mm

G1 Y-380 V-380 F360 S1; Move slowly to Y and V axis endstops once more and stop when one triggers

G1 Y-380 F360 S1 ; fine home Y
G1 V-380 F360 S1 ; fine home V

;****Now home upper Gantry

M584 X6 Y9 ; map upper motors to X and Y
M574 X1 S1 C5 ; map end stop 5(E2) to X axis
M574 Y1 S1 C6 ; map end stop 6 (E3) to Y axis

G1 X-380 Y-380 F4800 S1; move X and Y fairly quickly until one or other switches triggers
G1 X-380 F4800 S1 ; course home X
G1 Y-380 F4800 S1 ; course home Y

G1 X10 Y 10 F600 ; back off a few mm

G1 X-380 F360 S1 ; fine home X
G1 Y-380 F360 S1 ; fine home Y

M574 X1 S1 C0 ; put X axis end stop switch back to standard mapping
M574 Y1 S1 C1 ; put Y axis end stop switch back to standard mapping

M584 X0:3:6 Y1:4:9 Z2 U10 V11 E5:7:8 P3; Put axes motors back to standard configuration

;******Now home Z

G90; set to absolute coordinates

G1 X180 Y180 F12000; move to more or less the centre of the bed

M109 S140 ; continue heating hot end to 140 but this time wait

;change to faster probing speed
M558 F450
G30 ; FAST home Z using values from G31

*********This is the point where pauses due to to I2C errors become apparent

G91 ;relative
G1 Z5 F300 ; lower bed
G90 ;absolute

;change back to slower probing speed
M558 F180
G30 ; SLOW home Z

G91 ;relative
G1 Z5 F300
G90 ; back to absolute

M906 X1800 U1800 Y1800 V1800 Z1800 ; set motor currents back to defaults

M104 S0; set hot end temp back to zero

End of home all file.

I'll try the entire sequence of power up, home, drop bed, extrude and repeat home again tomorrow to see if I can provoke it for the 4th day in succession.

dc42

That's interesting.

I tried to reproduce it yesterday, using separate ground wires from the PSU to the Duet and DueX, three Nema 23 motors running at 2A moving continuously driven from the Duet, and a diode connected between Fan0- and E2 endstop stop pin, set to 10Hz frequency, so that the E2 endstop input toggles at 10Hz in order to force the Duet to read the status via I2C frequently. The first time I tried this, I got an I2C lockup after 9 minutes. But I was not able to reproduce it again.

I have reviewed the I2C code and made some changes. In particular, when an I2C error is detected, I now reset the I2C controller on the Duet and retry. Up to 3 tries are done. Each I2C reset is recorded and the reset count is included in the M122 log along with the other I2C stats. Also I have made a change that I planned a long time ago, which is to use a separate RTOS task to monitor the DueX5 state and do the I2C transactions when it changes. This should substantially reduce the latency of the endstop inputs on the DueX.

I intend to release this in firmware 2.03RC2 later today. Ian, would you be able to upgrade to this release and see if you are able to reproduce the problem on it?

deckingman

@dc42 said in Duet sometimes really slow? - I2C error or?:

I intend to release this in firmware 2.03RC2 later today. Ian, would you be able to upgrade to this release and see if you are able to reproduce the problem on it?

Yes of course.

Interestingly, I repeated the exact same sequence for the 4th day running and got exactly the same result. That is to say, pauses between moves that occurred in exactly the part of the second home all sequence. As before an M122 showed I2C errors which were cleared with a subsequent M999. I have no idea why I can't reproduce this unless the printer has been powered down over night. I've switched the printer off and will try again later today to see if a 1 hour or 2 hour power down will provoke it.

That's 4 days running that I've had exactly the same thing happen, so at least it looks like we are getting close to having a method that will provoke the problem with some degree of confidence, even if we don't know how or what part of the sequence is responsible.

This kind of reminds of a problem I had in a previous life with a customer who had a V12 E type Jaguar. This thing would break down but only when he had driven from Luton to the 7 bridge on the Welsh border and stopped to pay the toll. I got there in the end but it was a bitch to diagnose.