Duet 2.05 memory leak?
-
@T3P3Tony
1 - it's reproducible with 2.03 RC 2 as well. Curiously enough I did a bunch of triplicate prints in the summer with this version with no issues, so I tried it -- but problem happens there as well.
2 - semi reproducible -- seems that if I keep printing the same file, and the 3rd or 4th print of the same file will trigger the issue. Resetting the board doesn't help. Seems making a fresh copy of the same exact file -- or formatting card (even without resetting or powering the board down, formatting requires a power down) -- gets it to a working state (as it's now)
3. When the problem occurs and I see stuttering -- underruns go into triple digits very quickly, last night 4th print -- at 27 minute mark it was fine, I thought I was in the clear. I was watching a movie -- I checked my phone 10 minutes later underruns were at 200+ It's technically exactly 30 minutes after the lead screw compensation finishes -- it may take a couple of minutes from the moment I hit start to when all temps stabilize and it will do the probing routine. So it's technically closer to 32 --33 minutes, but the 30 minute mark is right after the screen displays the result of lead screw compensation.
4. Setting up this print with another slicer is going to take some time. I have very specific settings in S3D which are tuned to produce these shields as fast as possible. Triplicate mode differs from duplicate primarily that the 2nd Y gantry (technically U) is involved, and printer vibrates more -- it's 200lb machine, but the gantries are more akin to CNC gantries so -- there is a lot of force involved.
I've not had it happen in duplicate mode, but I've only ran duplicate prints a couple of times, it seems to take 3-4 prints to basically render the file unprintable.
5. Whatever it's worth -- with 2.05 -- If I did anything (that involves movement) -- even try to home the machine prior to starting this triplicate print -- I'd get this behavior -- 30 min after lead screw compensation it would start getting underruns and my approach was to simply reset the board -- pre-heat, clean the nozzles -- I need to clean at least tool 0 -- I use piezo to probe. Then I'd start the print and that worked for a while..then even that stopped working and only thing that would resolve things is clean copy or card reformat.
6. Yesterday I had the most success -- 3 error free prints in a row after I formatted an 8 gb sandisk card with 32kb cluster size. After the failure i switched to the new 32gb sandisk card I got yesterday formatted with 64kb cluster size. The first print with that card was fine, 2nd I tried this morning -- didn't work. I deleted the -- uploaded a new copy -- it's running fine 3 hrs 36min -- no underruns worth noting (22 at the beginning, but no ill effect from them), 0 since. -
@kazolar you are not running stock firmware right? you are patching it and compiling yourself?
running few prints using octoprint instead from duet sd card would imo for sure completely remove sd card from the picture (or prove it has to do with sdcard/sdcard socket) so that might be a good test.
as for the processes and extruders ideamaker started as a copy of s3d so has most stuff s3d has + some stuff extra, you might want to test it out. it is not open source but it is free
-
@arhi Yes, I'm compiling my own firmware, this firmware has done successful 40 hour prints with thousands of tool changes. So if I had a bug in the patch I've been moving from each version, I'd have seen it by now. This patch by the way is like 8 lines -- as a programmer you can screw stuff up in one line, but my changes are really minor.
I could setup a pi as test unit. If I get more underruns today on the next print, I'll do that tonight. -
@kazolar said in Duet 2.05 memory leak?:
1 - it's reproducible with 2.03 RC 2 as well. Curiously enough I did a bunch of triplicate prints in the summer with this version with no issues, so I tried it -- but problem happens there as well.
Ok, and you cant try 3.01 b/c you need to port your change?
2 - semi reproducible -- seems that if I keep printing the same file, and the 3rd or 4th print of the same file will trigger the issue. Resetting the board doesn't help. Seems making a fresh copy of the same exact file -- or formatting card (even without resetting or powering the board down, formatting requires a power down) -- gets it to a working state (as it's now).
3. When the problem occurs and I see stuttering -- underruns go into triple digits very quickly, last night 4th print -- at 27 minute mark it was fine, I thought I was in the clear. I was watching a movie -- I checked my phone 10 minutes later underruns were at 200+ It's technically exactly 30 minutes after the lead screw compensation finishes -- it may take a couple of minutes from the moment I hit start to when all temps stabilize and it will do the probing routine. So it's technically closer to 32 --33 minutes, but the 30 minute mark is right after the screen displays the result of lead screw compensation.Combination of #2 and #3 is utterly bizzare to me. Do you run the Z levelling before each print, or just on start up? if its between each print what happens if you run it once on startup, then follow your new file process, and then do not repeat the Z levelling?
Also have you tried downloading the file once it starts giving the errors (preferably without a reset) and then comparing the file on the SD card with the copy on your PC?
- Setting up this print with another slicer is going to take some time. I have very specific settings in S3D which are tuned to produce these shields as fast as possible. Triplicate mode differs from duplicate primarily that the 2nd Y gantry (technically U) is involved, and printer vibrates more -- it's 200lb machine, but the gantries are more akin to CNC gantries so -- there is a lot of force involved.
I've not had it happen in duplicate mode, but I've only ran duplicate prints a couple of times, it seems to take 3-4 prints to basically render the file unprintable. - Whatever it's worth -- with 2.05 -- If I did anything (that involves movement) -- even try to home the machine prior to starting this triplicate print -- I'd get this behavior -- 30 min after lead screw compensation it would start getting underruns and my approach was to simply reset the board -- pre-heat, clean the nozzles -- I need to clean at least tool 0 -- I use piezo to probe. Then I'd start the print and that worked for a while..then even that stopped working and only thing that would resolve things is clean copy or card reformat.
- Yesterday I had the most success -- 3 error free prints in a row after I formatted an 8 gb sandisk card with 32kb cluster size. After the failure i switched to the new 32gb sandisk card I got yesterday formatted with 64kb cluster size. The first print with that card was fine, 2nd I tried this morning -- didn't work. I deleted the -- uploaded a new copy -- it's running fine 3 hrs 36min -- no underruns worth noting (22 at the beginning, but no ill effect from them), 0 since.
What does the MCU temperature read when you are 30+ minutes into a successful print, vs when you 30+ minutes into a print and it starts stuttering?
- Setting up this print with another slicer is going to take some time. I have very specific settings in S3D which are tuned to produce these shields as fast as possible. Triplicate mode differs from duplicate primarily that the 2nd Y gantry (technically U) is involved, and printer vibrates more -- it's 200lb machine, but the gantries are more akin to CNC gantries so -- there is a lot of force involved.
-
- my changes are really small, moving from 2.05 to 2.05.1 took about 20 minutes, mostly it sorting out the different dependancies so I can compile it, applying my changes is equivalent of apply patch, compile -- but to try 3.01x I'd need to sort out the building of it (that part usually takes the most time), and AFIK there are some differences in 3 vs 2 in regards to movement and command orders -- I had linked my startup config gcode earlier and you can see if anything in there is not compatible with v3
- The machine homes before each print -- the is a homing switch on all independent lead screws -- the it runs g32 -- which does the probing to do lead screw compensation -- i do it this way so the position of the bed is known prior to running lead screw compensation, so yes it does it for every print -- because after the print one of the last commands is m18 -- so then we're no longer at a known state. Homing z gets all the lead screws close -- my Z compensation adjustments are < 0.5mm -- I adjusted those end stops to be as close to reality as possible.
- I have not tried to download the file -- good idea. A print just completed and I am about to start another one -- and in 30-32 minutes after that I'll know where I stand. My MCU never goes above 29c -- I have 2 40 mm fans blowing across the board -- I can run the printer for 40 hours straight and the board is 20c off, and that's about as high as it goes when it's active, voltage is rock solid 24-24.7 -- it's a 600w PSU, printer never uses more than 300w -- the bed is AC.
Stay tuned, will start the next print now.
-
@kazolar said in Duet 2.05 memory leak?:
I have not tried to download the file -- good idea.
I'm curious to see the results.
Which DWC version are you using? There were some CRC checking issues in certain versions which I believe were all resolved with FW 2.05.1 and DWC 2.0.7. Are you using that combination?
-
@kazolar said in Duet 2.05 memory leak?:
m18 -- so then we're no longer at a known state. Homing z gets all the lead screws close -- my Z compensation adjustments are < 0.5mm -- I adjusted those end stops to be as close to reality as possible.
If you can remove the M18 from the ends gcode so there is no need to rehome and then see - It would be good if we can tie this down to running that Z levelling, or not.
-
@Phaedrux I actually am not a big fan of the new web UI so I'm using reprap.htm -- and that goes to 1.22.6
The DWC that is loaded is Duet Web Control 2.0.0-RC2, not to be a critic, but the new look is not my thing, i liked the old one better -- glad to still be able to use it.
@T3P3Tony
The problem is my startup gcode does that too, it goes through a cycle of putting every tool into standby -- I'd have to be very careful with what I change in these sequences. The combination tfree5.g tpost0.g -- has some odd consequences. I am not sure what's going on -- I haven't sorted that gcode out -- so I could try and cut out all but the essentials in start and stop gcode -- obviously that I can only do during the day when I'm here to observe. My plan for today is -- if it underruns again -- print is running now only a few minutes in -- I will setup octoprint and feed gcode that way -- if things work via octoprint, then it's pretty clear the fault is in the sdcard circuitry. -
@T3P3Tony by the way lead screw compensation result -- probing 5 points, center, and near each lead screw is 0.004 in the print that just started seen at all zeros before -- gotta love the single start lead screws and a milled aluminum plate.
-
@kazolar said in Duet 2.05 memory leak?:
I actually am not a big fan of the new web UI
Fair enough, but if it's a CRC issue that's corrupting the files it would be good to test with a known working DWC version. If you update to 2.0.7 you can still use the 1.x version as before.
-
@Phaedrux will do after this print either fails or succeeds -- about 10 minutes left to find out
-
40 minutes into the 2nd print of the day -- had to change 2 filament spools -- this thing eats filament like you wouldn't believe. No underruns (if one was going to happen, it would have happened already). I actually read the lead screw compensation result from an earlier print -- this one said 0.001 -- I'd say the bed is flat.
-
@Phaedrux so a late night for me -- very late night, we;ll, actually didn't go to bed. But I got some interesting results.
I did as you asked -- I updated my DWC went with 2.1-- and the very next print -- I got underruns -- I then went ahead and figured I'd bypass the SD card and run from octoprint -- I thought that would solve it -- nope, underruns again, stuttering the full boat -- so I figured I'd look at my firmware changes a bit -- and I removed a few of them to bring things back to more stock -- left just the minimum I need for the config. Again -- underruns -- stuttering and it just ends up freezing and rebooting - not right away, but 30 minutes into the print as usual, -basically crashing with a m122 showing the slowest loop being at over 4 seconds, by then it's complaining about drivers, earth, moon, and the stars. So I figured to return my firmware back to how I had it before -- as it did work several times during the day. Tried again -- underruns.
Next -- I gave a shot with formatting the SD card again -- tried 32kb cluster size, just to see what would happen -- this card doesn't work right at that size -- it's a 32gb card, so that was glitchy from the start -- all gcodes were executing slow -- so I formatted again to 64kb -- and -- long story short -- I'm back to 2.05RC -- not 2.05.1 -- I need to build that again and figure out what I reversed and shouldn't have -- but I broke one of my changes -- and stupidly I didn't back up the binary that was working today -- so I'm back at a firmware that did work (with resets) -- and is working now. I am far enough into the print that I know it will be fine...so here is the interesting part.
During the first 30 minutes when I initially thought it could go either way, well now doing M122 a few times, I know that's not the case. I see that if slowest loop start inching up towards 100ms -- it will eventually fail and start to underrun -- the issue is not the sd card, it's the fact that loop gets bogged down, with something -- now that it's fine slowest loop is 6-10ms -- so hence no underruns. So blaming the SD card was a mistake -- I wish I hadn't pried up the posts on the ethernet adapter to look at the sd card solder joints -- because that's not fully secure anymore.
So it's pretty obvious that the sd card is not at fault -- underruns are happening because the loop gets slow -- why -- that's not something I have clue about, but I can see that DWC upgrade made things worse. I'm almost tempted to go back to the 1.22.6 DWC and remove 2.0x whatever -- it looks like as soon as updated the DWC -- it literally took me all night to get it to start to print successfully. BTW -- I am back to my backed up version of older DWC -- 2.0.0RC2 -- the combination appears to work better. -
@kazolar said in Duet 2.05 memory leak?:
I can see that DWC upgrade made things worse
Not sure this is connected as DWC should not be influencing the loop time.
I think its time to bite the bullet and try a different slicer!
-
@T3P3Tony I started setting up Cura. It's taking some time to get it setup with the extruders -- I kinda remember why I paid for S3D, cura is slow now as it was 5 years ago. I can see it's a more complex geometry -- I'll try to re-slice it in Cura for next print and see where it goes
-
@T3P3Tony so just moving the basics into cura -- it's estimating the time spot on -- but it's generating a gcode that is 50% bigger -- probably that will cause more problems than fix them? I could try it, but my thinking was to eliminate small movements -- well I adjusted the mesh in Fusion 360 also to be less dense, and cut the gcode down a bit - but it still hit underrun errors. I am not sure the point in trying a different slicer if the gcode I am getting is bigger -- either it will do a lot more travels, or create even more small movements
-
@kazolar, when the problem occurs, have you tried uploading a different file to SD card, and then printing the original file again? I'm wondering whether it is the act of writing to the SD card that clears the problem.
Also, are you running with logging (M929) enabled or disabled? Logging will generate occasional writes to the SD card, e.g. when a print starts or finishes.
-
@dc42 i've been trying different things all night -- doesn't seem to be sd card specific. I tried printing from octoprint -- bypassing the SD card, and still hit underruns. I don't have M929 enabled. It seems that a full reset does clear the cobwebs -- I kept trying to preheat then clean the nozzle -- then print, but the print that is working fine -- 2+ hours into it now, I preheated, cleaned the nozzle, then hit reset -- then print -- that worked.
-
@kazolar said in Duet 2.05 memory leak?:
It seems that a full reset does clear the cobwebs
Is that with or without powering down and up again?
@kazolar said in Duet 2.05 memory leak?:
I tried printing from octoprint -- bypassing the SD card, and still hit underruns.
Underruns are likely when printing over USB if the GCode file contains sequences of short segments, unless the sending program knows that when talking to a Duet, it doesn't need to wait for an OK reply to each command. I've seen many reports (not from Duet users) that printing from Octoprint is sometimes slower than printing from SD card. So results using Octoprint are only significant if you compare multiple runs using Octoprint.
-
@dc42 i was able to get it going on again by simply hitting reset -- no power down, and immediately select a file to print. That's how it's printing now. It seems that if I do anything prior to initiating the print -- it starts to show significant max times in loop and then at the 30 minute mark it can't keep up and it starts getting underruns. I tweaked octoprint settings not to wait for OK before I started because it was timing out on basic, stuff -- it still hit underrun at about the same point. And I could see in DWC console when I did m122 -- I'd see loop times in the 50ms -- so it clearly wasn't going well. I check the model and the file for small segments, and there aren't any. I took that set of models into cura with adjustments for minimum travels, and it generated a bigger gcode file than S3D -- I'll try prusa slicer -- not sure what else to try. The problem has gotten progressively worse. The gcode file I'm printing 12 mb, the cura generated one with adjustments for minimum segments is 19mb -- I can print that file next and see what happens?