Duet 2 underruns high loop times stuttering
-
- Upgrading to RRF3 requires the time I don't have. The actual goal would be to move to duet 3 at some later point (after the covid stuff is over and I have time) -- I wanted to do so, but the rewire job would take a significant amount of time, and I can't (our group can't either) have my quad be down for longer than an evening -- the rewire of the stepper connectors was as long of a down time as I could realistically do. I would do this project at a later date -- now, with pressing needs - I can't.
- That would be good, but if these were in 2.05.something, it would be much easier for me to deploy it to test.
- Just reading all the steps I need to do to move to RRF3 - it's a over a day of troubleshooting.
Phase A&B warning were happening after underruns started and after stuttering -- they were simple a symptom of the slow loop. They were happening on steppers which were not even doing anything -- part of the 4th extruder and it's X axis -- it was parked, so that didn't make much sense.
I only had legitimate phase warnings when the connectors melted -- and obviously those have been sorted. - I have not calibrated the MCU temp, but I check everything a thermal camera, and nothing is hot -- everything is below 30c -- even my external 2209s don't even reach body temperature. My cooling is really good, I have a lot of air blowing across the board exhausting air through various vents.
Question -- it's highly likely the problem is 2 fold in 2.05 -- and most likely it's fixed in 2.05.1 -- but the same reason that now results in cold power on or idle - to require 40 minutes of printing to print properly is the reason I'm having this problem. I can literally power a warm system off for a few minutes -- power it up -- hit print and have it work...cold, I get underruns -- my sense is that's an indication of a hardware fault. Anything software is reproducible every time doing the same thing -- hardware tends to act funny if there is cold solder joint somewhere and magically works when warm.
Having said that, why do you not believe this is a hardware issue? Isn't the indication that an idle print fails, and a warmed up print succeeds an indication of that?
Here is how I see it -- if I order a new duet 2, and it fixes the issue, the likelihood of me getting a duet 3 is zero, I would be need a duet 3 probably 4 or 5 expansion modules, that was the eventual plan. The likelihood of me coming out of this with a positive experience is zero also. It's not a question of money, my employer has offered to pay for the cost of the replacement. -
@kazolar said in Duet 2 underruns high loop times stuttering:
Having said that, why do you not believe this is a hardware issue?
one old TV repair hint ... how about you turn off all your fancy cooling of the duet, heat it up with a hairdryer and see if then it will give you a good print without requirement of 40min "warmup time"
-
@arhi I've actually thought about doing that --I'm thinking about doing exactly that tomorrow morning. Right now it's already warmed up and printing happily.
-
Have you described your system completely somewhere else? It seems like you have additional 2209 steppers hooked up externally? Which axes are those driving?
Also, is this the PSU you are using? https://www.eyeboot.com/24v-600w-dc-power-supply.html
-
@bot yes that's the power supply. The 2209s are driving Y and U axis, and extruder 0, and 1.
-
duplication mode -- underruns. This worked before, not now:
11:12:00 PMM122
=== Diagnostics ===
RepRapFirmware for Duet 2 WiFi/Ethernet version 2.05 running on Duet Ethernet 1.02 or later + DueX5
Board ID: 08DGM-9T6BU-FG3S0-7JTD4-3S06K-1A4ZD
Used output buffers: 1 of 24 (21 max)
=== RTOS ===
Static ram: 25708
Dynamic ram: 96332 of which 0 recycled
Exception stack ram used: 448
Never used ram: 8584
Tasks: NETWORK(ready,616) HEAT(blocked,1136) DUEX(blocked,164) MAIN(running,1668) IDLE(ready,156)
Owned mutexes: I2C(DUEX)
=== Platform ===
Last reset 00:44:07 ago, cause: software
Last software reset at 2020-04-25 22:27, reason: User, spinning module GCodes, available RAM 8560 bytes (slot 3)
Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0441f000 BFAR 0xe000ed38 SP 0xffffffff Task 0x4e49414d
Error status: 0
Free file entries: 9
SD card 0 detected, interface speed: 20.0MBytes/sec
SD card longest block write time: 16.5ms, max retries 0
MCU temperature: min 24.1, current 26.0, max 26.6
Supply voltage: min 24.2, current 24.5, max 25.0, under voltage events: 0, over voltage events: 0, power good: yes
Driver 0: standstill, SG min/max 0/333
Driver 1: standstill, SG min/max not available
Driver 2: standstill, SG min/max not available
Driver 3: standstill, SG min/max 0/242
Driver 4: standstill, SG min/max not available
Driver 5: standstill, SG min/max not available
Driver 6: standstill, SG min/max 21/422
Driver 7: standstill, SG min/max 144/451
Driver 8: standstill, SG min/max 72/422
Driver 9: standstill, SG min/max 45/438
Date/time: 2020-04-25 23:11:57
Cache data hit count 4294967295
Slowest loop: 215.62ms; fastest: 0.08ms
I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
=== Move ===
Hiccups: 0, FreeDm: 157, MinFreeDm: 105, MaxWait: 574595ms
Bed compensation in use: none, comp offset 0.000
=== DDARing ===
Scheduled moves: 1335, completed moves: 1304, StepErrors: 0, LaErrors: 0, Underruns: 0, 31
=== Heat ===
Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1
Heater 0 is on, I-accum = 1.0
Heater 1 is on, I-accum = 0.3
Heater 2 is on, I-accum = 0.4
=== GCodes ===
Segments left: 0
Stack records: 2 allocated, 0 in use
Movement lock held by null
http is idle in state(s) 0
telnet is idle in state(s) 0
file is idle in state(s) 0
serial is idle in state(s) 0
aux is idle in state(s) 0
daemon is idle in state(s) 0
queue is idle in state(s) 0
autopause is idle in state(s) 0
Code queue is empty.
=== Network ===
Slowest loop: 267.21ms; fastest: 0.06ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 2 of 8
Interface state 5, link 100Mbps full duplex -
@dc42 at this point I tried a duplication print which worked fine last week even when I had trouble with triplicate -- and this print is now causing underruns and stuttering. I ask again -- is the board under warranty. I did the same thing that has worked in the past, and it didn't I'm not sure I can get this print to work at all now. I formatted the card, copied the files over and trying it again.
-
@dc42 I had trouble getting all the ear relievers to stay put -- so I slowed down the first layer to 15mm/sec -- after 2 hours that froze up with an underrun -- that's an idex print. Now I wanted to see if it underruns sooner, and tried speeding it up and at 30mm/sec it seems to be sticking OK -- so I ran that -- and that print is giving me much lower loop times. I will wait for another 10 minutes, but this has a shot of working -- is a super slow first layer a problem? I know I've used that technique before with no issues.
-
@kazolar said in Duet 2 underruns high loop times stuttering:
Question -- it's highly likely the problem is 2 fold in 2.05 -- and most likely it's fixed in 2.05.1 -- but the same reason that now results in cold power on or idle - to require 40 minutes of printing to print properly is the reason I'm having this problem.
There was an important bug fix in 2.05.1 (a 1-byte buffer overflow) and the consequences of that bug are unknown. You should definitely use a firmware build that includes that bug fix. It was in file OutputBuffer.cpp.
I can literally power a warm system off for a few minutes -- power it up -- hit print and have it work...cold, I get underruns -- my sense is that's an indication of a hardware fault. Anything software is reproducible every time doing the same thing -- hardware tends to act funny if there is cold solder joint somewhere and magically works when warm.
Tasks: NETWORK(ready,616) HEAT(blocked,1136) DUEX(blocked,164) MAIN(running,1668) IDLE(ready,156)
Owned mutexes: I2C(DUEX)That's significant. It means that the Duet was communicating with the DueX when you ran M122. Were you doing anything that might cause that? For example, changing the speed of a fan connected to the DueX; or toggling an endstop or other switch connected to the DueX?
If it is a hardware problem then I think a likely cause is a poor solder joint between the SX1509B chip on the DueX board and the PCB. We've seen trouble with that before. That could cause spurious input transitions on the pin with the bad joint, leading to extra I2C traffic to read the changed input, leading in turn to increased loop times and underruns.
Do you have any normally-open switches connected to endstop inputs on the DueX5, or to GPIO pins on the DueX5?
I will add a counter in RRF3 to count the number of I2C transactions and display a transactions/minute count in the diagnostics. As you are building your own firmware, you could add a similar count in your RRF2 build.
-
@dc42 this print is only 2 extruders, but the all z axis motors are on duex5, one of the fans used by the 2 extruder is on duex5. That's it. The 2nd extruder has some minimal stuff overflowed to duex5.
Yes I am building my own firmwareCan you give me the code pointer to add the counter and I'll do that.
I'll switch over 2.05.1 in the process.
if you're saying it's the duex5, it would make some sense. There was one instance when I just rerouted a wire going to it, just moved it for better management, didn't disturb anything, just unplugged an end stop and plugged it back in the on power up, duet refused to boot. It kept cycling. I disconnected everything from the duex5, duet booted up. Then I plugged it all back in, one at at time and everything worked again. So then it sounds like I need a new duex5. I've had i2c issues with it before. Seems my duex5 might have been glitchy for a while, and the times the z axis motors got singed could have caused some issues with it.
-
@dc42 I don't use normally open, you guys don't recommend normally open, and it makes sense not use it. I have 3 normally closed switches and the rest are optical powered switches which also act normally closed.
The duex5 is older is also purchased from filastruder in May 2017. Is that under warranty, or do I need to order a replacement. Kinda sucks cause I just had that board out when I replaced the 4 z axis connectors. I'm not that proficient in microsoldering to try to reflow an smd chip. -
I just want to clarify something:
You have the 5 stepper drivers on the Duet board, are they all being used?
Then you have 5 more stepper drivers on the duex, are they all being used?
And you also have 4 external drivers, driving Y, U, E0 and E1.
It seems that may be more motors than the firmware can handle? I thought 11 or 12 was the most RRF2 could manage. If you are using all of these, that's 14. Can you clarify?
-
@bot yes. dc42 helped me add 2 more. It's not in the official firmware because is 2 extra ops which would normally not be used by anyone else with a duet 2. It has worked for the last 3+ years perfectly fine. There is a way to reuse the pt100 pins for on duet and duex5. Same way that the LCD pins are being reused for 2 extra steppers.
-
@kazolar Ahh, gotcha. Thanks.
-
@dc42 so we're back the i2c mess. I looked through some other posts regarding i2c -- and I saw some related to not running wires next to or along the ribbon connecting the 2 boards...agh...that's the change that I did make when I added extra cable chains, I changed some wiring paths inside the enclosure. God help me, i2c with duet+duex5 is so glitchy. I re-routed cables around the ribbon, best I could, it's kinda tricky now that the case isn't really designed to do that, but -- OK, I found ways. I just started a print -- same one that failed 2 times yesterday before somehow magically working to completion overnight. Well I am running it now, and all loop times are low -- 3-4ms -- it's early in the print, but that looks promising. So it's not just the heavy grounding wire, but the ribbon needs to be clear of anything -- that might appear to be the issue here -- not defective hardware, or cold solder or firmware -- but cross talk on that ribbon. I'm using some insulated ribbons for my PC -- PCIEX extension ribbons -- they're expensive -- about $30 per, but they have a lot of protection against this kind of cross talk, shielding and such, is that something that is worth considering -- I am going to design a new case to split the duet and duex5 apart from each other the way it's now -- they're one on top of each other (as I believe it's intended) but that leads to some odd wiring runs that make it very difficult to avoid the ribbon.
I will do a triple head print later tonight -- see if that works without a reboot -- so far too far into this duplication print with very low loop times to consider a possible failure. -
@dc42 question -- as it seems (too early to conclude) - that i2c is to blame for the issues -- why is i2c not showing any errors or timeouts in m122 reports -- freeing up the ribbon from interference from heater and other wires appears (for now) to have solved the issue -- but how come all this i2c interference -- slow loop times and no i2c errors... would have been too obvious and easy to investigate that path if there were some - poor SD cards got blamed and they appear to have been faultless.
-
well that was it -- I just did a bunch of starts and stops of a triple headed print -- trying to re-acquire my z offset and I did not reset anything and the current print is running with normal loop times, this would have inevitably triggered high loop times previously. Good to know no physical defect on anything -- just that ribbon cable must be treated like it's a newborn -- I didn't even think about it when I rerouted all the wires to the 2nd cable chain how they were connecting to where they had to connect to. I am still confused why i2c never showed any errors or timeouts or anything -- just increased the loop time tremendously. Thank you for your patience in sorting this out -- I do wish this ribbon came shielded -- I'd gladly pay extra for a shielded 50 pin cable -- I've been searching for one -- honestly at this point $100 for a cable that would not be bothered by things around it would be a bargain.
-
Glad we found the smoking gun.
Doing some googling it would appear that there is shielded ribbon cable available from digikey and mouser, and in certain other hobbies it looks like it's often DIYed with metallic tape.
-
I've thought about doing that, but I know how the PC shielded PCIEX extensions are made -- but I have tried that with similar cheap PCIEX cables -- putting HVAC tape around it and then putting regular insulation so that aluminum tape wouldn't short anything, and it didn't make the cable perform any better. I had purchased some inexpensive similar IDE style extensions which claimed to support PCIE-4X, and PCIE-4X devices would not work with them, 1X would -- Doing the DIY trick of insulting etc, did not help. Getting a cable for 4x the money that was already shielded worked. So I'd have to hunt around for the proper cable that's premade 50 pin and is the right spacing.
-
this can be marked solved -- I don't know how to.