Duet 2.05 memory leak?
-
@dc42 i was able to get it going on again by simply hitting reset -- no power down, and immediately select a file to print. That's how it's printing now. It seems that if I do anything prior to initiating the print -- it starts to show significant max times in loop and then at the 30 minute mark it can't keep up and it starts getting underruns. I tweaked octoprint settings not to wait for OK before I started because it was timing out on basic, stuff -- it still hit underrun at about the same point. And I could see in DWC console when I did m122 -- I'd see loop times in the 50ms -- so it clearly wasn't going well. I check the model and the file for small segments, and there aren't any. I took that set of models into cura with adjustments for minimum travels, and it generated a bigger gcode file than S3D -- I'll try prusa slicer -- not sure what else to try. The problem has gotten progressively worse. The gcode file I'm printing 12 mb, the cura generated one with adjustments for minimum segments is 19mb -- I can print that file next and see what happens?
-
The different size could be Cura being more verbose with comments so its still worth trying.
-
@T3P3Tony I did a Prusa Slicer version -- it's a little bit bigger, but the paths look good -- I'm trying it now -- the previous print just finished -- I only added my custom header and tail to the prusa version to make sure the right stuff happens -- did not reset the board -- just uploaded the prusa version and see what happens.
-
@T3P3Tony Prusa version was doing funny things with perimeters -- sorta failed on it's own, kinda chewed up the perimeters of the model, I canceled it -- I'd need to mess with it to make it work -- Cura is next.
-
@T3P3Tony ok so the cura version -- 3rd print without a reset - right as it got to post5.g -- it was doing it in slow motion -- doing a move -- thinking doing another -- move. So I reset it -- will try again
-
@T3P3Tony ok -- so cura sliced code seemed a bit more sane than Prusa, but it didn't look good, but I let it run -- and it just failed -- underruns and stuttering.
I optimized it for short paths to simplify it -- still...same result. Gonna clean the bed. Reset and kick of the S3D print -- that actually looks good and works after a reset (fingers crossed) -
@T3P3Tony @dc42
Here is where I am at --- I am running 2.05RC -- 2.05.1 is a lot less predictable -- it seems to work ok, then stops working and gets into stuttering underrun condition. I can't trust it from print to print, so I went back to 2.05RC -- it's at repeatable, and I can get shields out now without checking on it every 5 minutes.
- I can tell right away from m122 after the print starts, actually starts laying down plastic -- if I run M122 two times in 30 second intervals, if the 2nd time I get
Slowest loop: 3.99ms; fastest: 0.08ms
Then it will work through the end of the print -- I can tell that within 5 minutes now of starting the print before waiting the full 30 minutes to get underruns ans stuttering .
If I get
Slowest loop: 63.64ms; fastest: 0.07ms
Then it will underrun by 30 minute mark. - This is what works. I preheated the bed, nozzles -- clean everything off - press reset -- then start the S3D print -- and this works. The print will get slowest loop in 5ms range and will run through to conclusion - 5hrs 15 min give or take.
- This procedure works -- and has worked for the entire month I've been printing shields until I started messing around with 2.05.1. I don't have to copy the file over, I can keep using the same file.
- If I do anything prior to starting the print without pressing reset -- I will get the high slowest loop numbers and inevitable underrun.
- The behavior is absolutely repeatable with S3D or Cura prints (in terms of underrun condition). I am sure I can tune the Cura version to look better, and probably the Prusa version, both looked like trash on layer one to varying degrees -- but both were optimized to eliminate small movements, S3D doesn't have that feature, but this print doesn't have any -- or many.
Question is -- is this hardware or software. I have done triplicate prints before without having to do the "three finger salute" before starting it -- so either something died in the hardware, David's documentation on underrruns talks about voltage regulators, etc. I have no indication that something is wrong there, but the only alteration from then and now, is I switched to optical end stops on XYUVWA axes - the others (for the bed -- are still mechanical). I also went from 32 teeth pulleys to 20 teeth. I have done a 40 hour long multi extrusion print with the current setup with no issues.
-
Another update -- I did not power down -- did the exact some procedure -- I removed the shields -- set bed temp, set extruder temp (keep bed hot and set extruders to 150 so I can pull of any stuck on filament) -- then after alcohol wipe down of the build area, i pressed stop on LCD, the select the print -- it's working perfectly -- THE only way I can print in triplicate reliably is to do this process. I don't need to delete and reupload the file. No other tricks involved. If I try printing it again 50/50 chance it will spin up the loop time and will fail due to underruns. If I keep doing this process, I can keep doing a lot of shields with no interruption.
There is a leak, or some missed clean up routine in this triplicate mode -- i have offsets specified such that my bed 0 left front -- I don't use 0,0 at center -- to convoluted on non-delta machines. So my offsets correspond with 0,0 at front left, triplication works perfectly fine -- mirror works fine -- so far as I can demonstrate, triplication in 2.05RC works only after a reset, and in 2.05.1 it works kinda better, but only that i can do more prints before a failure, and I can't predict when it will happen, so kinda not very helpful -- I am doing the shields while doing my day job -- so I can't be monitoring M122 results every few seconds -- it's not practical -- do the setup -- reset the firmware -- press print -- walk away -- that works, rinse repeat. -
Just to confirm: if you do a software reset of the Duet, for example using the Emergency Stop button or M999, it clears the problem sufficiently to do another print?
If that's the case, please can you do the following next time:
- Send M122 after powering up and resetting the printer
- After the print has finished, send M122 again
- Then send M21 and see whether that is sufficient to allow a new print to succeed. If the M21 command reports an error, send M22 and then M21.
Post the M122 results here.
-
@kazolar I had a thought. This only happens when you run the 3rd hot end, in triplication, and not in mirror/duplication. Is the 3rd hot end heater connected to the Duex? I wonder if something about the PWM on the 3rd heater is interfering with the SD card. One obvious candidate is that you have not tied the grounds from the Duet and Duex together; see https://duet3d.dozuki.com/Wiki/Duex2_and_Duex5_Features#Section_Wiring
Alternatively, there may be subtle differences between the firmwares that cause the PWM frequency to be different, or the signalling between the boards to shift, enough to cause a stutter in the command queue (as you have had the same problem streaming from Octoprint/USB). This is just a guess; I think the PWM code has been unchanged for a long time.
Ian
-
@dc42 I'll run the test this weekend, Sunday most likely. I need to get as many shields made as possible today and tomorrow -- I have a pickup of 200 shields for a local hospital on Sunday morning.
@droftarts I actually had some issues with i2c a while back, and David suggested to thicken up the grounding wire between duet and duex5 -- I got the thickest wire that could be possibly inserted into a ferrule and did that -- also David had added some code to reset i2c if it gets an issue -- but my m122 has not shown any i2c errors since I improved the grounding -- but trust me there is a very short 14awg silicone shielded wire crimped in ferrules that is connecting my 2 boards -- so my grounding between them is no longer an issue. -
@dc42 M21 and combined with M22 do not help, I just tried it after I replaced the 4 Z stepper connectors, recall a while back I had an issue with the included stepper connectors getting singed, you suggest molex, and I had those in place for well over a year with no singing until today -- this time the singing was very bad melting the housing. I ended up taking duex5 out and I soldered wires with heavy duty JS style connectors in the place of the Z stepper connectors (I have 4 lead screws -- upon further inspection 2 had developed some singe marks.
I didn't save M122 -- I will make sure to save it on Sunday, I hope not to be dealing with unrelated electrical issues. What was telling that even with M21 being run, the loop times were high -- max loop times until stuttering started were 50-70ms. None were lower than 12ms -- I check M122 every minute or so. When stuttering started I canceled the print -- kept the bed at temperature, cleared everything, then I reset the board -- just M999 (or stop button in this case) -- then I started the same print -- it's running now -- all loop times are 5ms and lower.
-
@dc42
here is the M122 right before it starts failing
M122
What's interesting -- which makes this look more like a hardware issue is that the first print after the machine has been off for a couple of hours fails. High loop times -- then underruns. Then prints done afterwards work. I did not do anything different than preheat -- reset, hit print -- first print fails the same fashion=== Diagnostics ===
RepRapFirmware for Duet 2 WiFi/Ethernet version 2.05 running on Duet Ethernet 1.02 or later + DueX5
Board ID: 08DGM-9T6BU-FG3S0-7JTD4-3S06K-1A4ZD
Used output buffers: 1 of 24 (21 max)
=== RTOS ===
Static ram: 25708
Dynamic ram: 96332 of which 0 recycled
Exception stack ram used: 472
Never used ram: 8560
Tasks: NETWORK(ready,616) HEAT(blocked,1144) DUEX(blocked,164) MAIN(running,1668) IDLE(ready,156)
Owned mutexes: I2C(DUEX)
=== Platform ===
Last reset 00:20:52 ago, cause: software
Last software reset at 2020-04-25 11:06, reason: User, spinning module GCodes, available RAM 8504 bytes (slot 1)
Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0041f000 BFAR 0xe000ed38 SP 0xffffffff Task 0x4e49414d
Error status: 0
Free file entries: 9
SD card 0 detected, interface speed: 20.0MBytes/sec
SD card longest block write time: 0.0ms, max retries 0
MCU temperature: min 25.6, current 26.0, max 26.2
Supply voltage: min 24.1, current 24.4, max 24.6, under voltage events: 0, over voltage events: 0, power good: yes
Driver 0: ok, SG min/max 0/313
Driver 1: standstill, SG min/max not available
Driver 2: standstill, SG min/max 0/252
Driver 3: ok, SG min/max 0/305
Driver 4: ok, SG min/max 0/332
Driver 5: standstill, SG min/max not available
Driver 6: standstill, SG min/max 64/229
Driver 7: standstill, SG min/max 144/295
Driver 8: standstill, SG min/max 88/251
Driver 9: standstill, SG min/max 41/218
Date/time: 2020-04-25 11:27:19
Cache data hit count 2554842663
Slowest loop: 47.93ms; fastest: 0.08ms
I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
=== Move ===
Hiccups: 0, FreeDm: 154, MinFreeDm: 8, MaxWait: 0ms
Bed compensation in use: none, comp offset 0.000
=== DDARing ===
Scheduled moves: 8715, completed moves: 8675, StepErrors: 0, LaErrors: 0, Underruns: 6, 0
=== Heat ===
Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1
Heater 0 is on, I-accum = 1.0
Heater 1 is on, I-accum = 0.6
Heater 2 is on, I-accum = 0.7
Heater 4 is on, I-accum = 0.5
=== GCodes ===
Segments left: 1
Stack records: 2 allocated, 0 in use
Movement lock held by null
http is idle in state(s) 0
telnet is idle in state(s) 0
file is doing "G1 X228.135 Y232.279 E1.2574" in state(s) 0
serial is idle in state(s) 0
aux is idle in state(s) 0
daemon is idle in state(s) 0
queue is idle in state(s) 0
autopause is idle in state(s) 0
Code queue is empty.
=== Network ===
Slowest loop: 7.26ms; fastest: 0.06ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 2 of 8
Interface state 5, link 100Mbps full duplex -
Is there any metalwork (e.g. printer enclosure) close to the SD card socket? If so, is that metalwork connected to Duet ground, either directly or through a resistor? I'm wondering whether static buildup might be a factor.
-
@dc42 nope, printer is in a plastic enclosure -- no metal parts touch the network jack
Here is the actual failure:
12:22:43 PMM122
=== Diagnostics ===
RepRapFirmware for Duet 2 WiFi/Ethernet version 2.05 running on Duet Ethernet 1.02 or later + DueX5
Board ID: 08DGM-9T6BU-FG3S0-7JTD4-3S06K-1A4ZD
Used output buffers: 1 of 24 (22 max)
=== RTOS ===
Static ram: 25708
Dynamic ram: 96332 of which 0 recycled
Exception stack ram used: 448
Never used ram: 8584
Tasks: NETWORK(ready,616) HEAT(blocked,1144) DUEX(suspended,164) MAIN(running,1668) IDLE(ready,156)
Owned mutexes:
=== Platform ===
Last reset 00:37:46 ago, cause: software
Last software reset at 2020-04-25 11:44, reason: User, spinning module GCodes, available RAM 8748 bytes (slot 0)
Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0441f000 BFAR 0xe000ed38 SP 0xffffffff Task 0x4e49414d
Error status: 0
Free file entries: 9
SD card 0 detected, interface speed: 20.0MBytes/sec
SD card longest block write time: 0.0ms, max retries 0
MCU temperature: min 25.9, current 26.2, max 26.6
Supply voltage: min 24.1, current 24.3, max 24.6, under voltage events: 0, over voltage events: 0, power good: yes
Driver 0: ok, SG min/max 0/333
Driver 1: standstill, SG min/max not available
Driver 2: ok, SG min/max 0/1023
Driver 3: ok, SG min/max 0/721
Driver 4: ok, SG min/max 0/332
Driver 5: standstill, SG min/max not available
Driver 6: standstill, SG min/max 26/241
Driver 7: standstill, SG min/max 135/297
Driver 8: standstill, SG min/max 71/249
Driver 9: standstill, SG min/max 29/222
Date/time: 2020-04-25 12:22:39
Cache data hit count 4294967295
Slowest loop: 151.20ms; fastest: 0.08ms
I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
=== Move ===
Hiccups: 0, FreeDm: 120, MinFreeDm: 6, MaxWait: 0ms
Bed compensation in use: none, comp offset 0.000
=== DDARing ===
Scheduled moves: 26743, completed moves: 26706, StepErrors: 0, LaErrors: 0, Underruns: 69, 78
=== Heat ===
Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1
Heater 0 is on, I-accum = 1.0
Heater 1 is on, I-accum = 0.6
Heater 2 is on, I-accum = 0.6
Heater 4 is on, I-accum = 0.5
=== GCodes ===
Segments left: 0
Stack records: 2 allocated, 0 in use
Movement lock held by null
http is idle in state(s) 0
telnet is idle in state(s) 0
file is idle in state(s) 0
serial is idle in state(s) 0
aux is idle in state(s) 0
daemon is idle in state(s) 0
queue is idle in state(s) 0
autopause is idle in state(s) 0
Code queue is empty.
=== Network ===
Slowest loop: 77.03ms; fastest: 0.06ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 1 of 8
Interface state 5, link 100Mbps full duplex -
@kazolar said in Duet 2.05 memory leak?:
Here is the actual failure:
Do you mean after you have just had the SD card read error?
-
@dc42 here is the observable behavior -- happened 3 times in a row.
First print -- high loop times, inevitable failure due to underrun -- I tried to cancel it, looked like it would fail about 15 minutes in, but the next print failed. So basically 30-40 is required.
Next print running now
Is going to be fine -- loop times are great:
M122
=== Diagnostics ===
RepRapFirmware for Duet 2 WiFi/Ethernet version 2.05 running on Duet Ethernet 1.02 or later + DueX5
Board ID: 08DGM-9T6BU-FG3S0-7JTD4-3S06K-1A4ZD
Used output buffers: 3 of 24 (21 max)
=== RTOS ===
Static ram: 25708
Dynamic ram: 96332 of which 0 recycled
Exception stack ram used: 464
Never used ram: 8568
Tasks: NETWORK(ready,748) HEAT(blocked,1236) DUEX(suspended,168) MAIN(running,1668) IDLE(ready,156)
Owned mutexes:
=== Platform ===
Last reset 00:14:25 ago, cause: software
Last software reset at 2020-04-25 12:28, reason: User, spinning module GCodes, available RAM 8584 bytes (slot 1)
Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0441f000 BFAR 0xe000ed38 SP 0xffffffff Task 0x4e49414d
Error status: 0
Free file entries: 9
SD card 0 detected, interface speed: 20.0MBytes/sec
SD card longest block write time: 0.0ms, max retries 0
MCU temperature: min 25.9, current 26.3, max 26.5
Supply voltage: min 24.1, current 24.3, max 24.6, under voltage events: 0, over voltage events: 0, power good: yes
Driver 0: ok, SG min/max 0/319
Driver 1: standstill, SG min/max not available
Driver 2: ok, SG min/max 0/242
Driver 3: ok, SG min/max 0/290
Driver 4: ok, SG min/max 0/301
Driver 5: standstill, SG min/max not available
Driver 6: standstill, SG min/max 59/235
Driver 7: standstill, SG min/max 151/286
Driver 8: standstill, SG min/max 82/248
Driver 9: standstill, SG min/max 47/214
Date/time: 2020-04-25 12:43:13
Cache data hit count 1720212453
Slowest loop: 4.75ms; fastest: 0.08ms
I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
=== Move ===
Hiccups: 0, FreeDm: 152, MinFreeDm: 56, MaxWait: 0ms
Bed compensation in use: none, comp offset 0.000
=== DDARing ===
Scheduled moves: 4720, completed moves: 4708, StepErrors: 0, LaErrors: 0, Underruns: 3, 0
=== Heat ===
Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1
Heater 0 is on, I-accum = 1.0
Heater 1 is on, I-accum = 0.4
Heater 2 is on, I-accum = 0.5
Heater 4 is on, I-accum = 0.4
=== GCodes ===
Segments left: 1
Stack records: 2 allocated, 0 in use
Movement lock held by null
http is idle in state(s) 0
telnet is idle in state(s) 0
file is doing "G1 X110.719 Y48.457 E36.9405" in state(s) 0
serial is idle in state(s) 0
aux is idle in state(s) 0
daemon is idle in state(s) 0
queue is idle in state(s) 0
autopause is idle in state(s) 0
Code queue is empty.
=== Network ===
Slowest loop: 10.72ms; fastest: 0.06ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 1 of 8
Interface state 5, link 100Mbps full duplex -
@dc42 stuttering -- failure of print -- just had it, yes. I reset and started again -- working fine now
-
@dc42 i think we're going around in circles -- 2.5.1 likely would fix the software issue, but there is clearly a hardware fault -- i HAVE to let it run for 30+ minutes and fail before reset and start a series of prints that will succeed. That 30 minutes time frame -- maybe the time that the cold solder joint -- somewhere heats up things start running fine. This board was purchased on Aug 3 2018 -- Filastruder order #36936
It's either covered by warranty or it's not. If not I'll order a new one from filastruder.I said that I was ready to go ahead and do this -- I was asked to slice this in different slicers and try different SD cards -- I have. I am printing face shields for hospital workers. I have a donation of 200 being picked to go to a local hospital tomorrow. I'd rather lose more sleep and waste time troubleshooting what at this point feels like a bad solder joint -- is it worth my time to take the board out look at under a scope and look for it -- it could be on the sd card connector -- another trace. I resolves itself after 30 minutes of printing. The question is at this point how is the replacement being handled.
-
@dc42 what makes you think it's an SD card error still? I had the same underrun error reproduced with Ocotprint. I am using a branded SD card which tests out at the top of speed spectrums. The predictable behavior now is that after the machine has been either idle or off for some time, it takes 1 failed print due to an underrun --which happens around 30-40min (not same spot each time anymore) and then after a reset the next set of prints if I don't let it cool down work fine --so as some suggested a cold solder joint which ends up warming up and working fine after some heat up time is the culprit here. I'm going to wait for a response as to how to proceed -- If I don't get one -- I'll order a new duet 2 from filastruder.
@droftarts Considering the importance of these prints, and my past experience here, I rather disappointed and surprised with the level of support. YouTubers who do questionable videos (I can name some -- who I used to watch) get free boards sent to them, I guess I don't have that kind of a public image -- I will be releasing the build vlog of this printer, which has been long overdue as I've filmed it starting 3 years ago. The last few episodes will feature my experience with using this machine to print these shields. I am donating countless hours of my time, our group is getting daily requests from local hospitals -- and even those further away for ear relievers. I am kinda getting burn out mostly due the issues with the duet board. I was dealing with these behaviors initially -- and was willing to keep troubleshooting this. Now we're at a crossroads. I have played the -- do this, do that steps, which would be fine if we weren't in time crunch in a pandemic and every shield I produce is another doctor/nurse, EMT worker who won't get sick. I don't quite understand the lack of urgency here, or maybe you're not taking this seriously enough -- I had first hand accounts of doctors dying in New York, and young nurses succumbing to the illness in the Bronx.