Duet 2.05 memory leak?
-
@droftarts there are some behaviors with tool changes I don't like, and don't work with my slicers. What you are saying doesn't make sense. I run the print 40 times over the last 2 weeks using the soft reset method between prints, nothing else, if I just home one of the axis before starting the same print the problem starts happening. If it was anything electrical, the problem wouldn't go away with a soft reset. My build changes are the same patch I've been moving since version 1.21 and I've done triplicate printing before, and have done it back to back and the first time that I've had issues with it is with some point Release of 2.0. I'm almost tempted to say the problem started with the 2.05 release. I've had the machine make over 300 shields and it worked perfectly fine last night after a soft reset. I had a brain fart prior to that and did an extra home of a multiple axis, and ran that print and it started complaining about one driver or another and studdering. Soft reset, clean the bed, start again, perfect print.
-
@kazolar my advice would be to return to the firmware version that causes you least problems, probably 2.04, if you want reliability, until you get the chance to update and test your version of 2.05.1.
If it ‘works perfectly’, but not after doing x, y or z, and that is repeatable, then it should be traceable. I appreciate you don’t have time to look into it now, though. I was just trying to suggest other things I know can cause similar issues.
Ian
-
@droftarts I am also kind of in a crunch to dig out the last working build. Soft resets work now...I just was starting a print late last night, and had a brain fart and ran my macro to home axis besides z. Then started a print, came back 30 minutes later to studdering. I opened the ticket, and in the mean time adjusted some nozle calibration, studdering is pretty violent, hit soft reset and have a perfect set of 9 shields on the bed ready this morning
-
@dc42 so the consequence to the 1-byte buffer overflow is the issues I was experiencing. I have applied the firmware this morning. I specifically ran my macro which homes everything except Z to check all movement was fine, then did M18. After that I ran my triplicate print of shields -- worked, 4 hour print, no issues. After that, without a reset I started another triplicate print of 9 shields and I am almost 1 1/2 hours into it, well past the predictable point when errors would have occurred. I could literally time exactly 30 minutes after the lead screw compensation would finish phase warning would start happening. So far, the fix looks like it has worked. Since today is a holiday, I could get the firmware updated and keep printing shields. I'll be able to kick another set of 9 tonight before going to bed, so if 3 in a row succeed...I call it good.
-
@kazolar glad you got it fixed. And thank you for the support you’re giving health workers!
Ian
-
@droftarts doing what I can. I'm right outside of NYC, I'm in contact with a lot of healthcare professionals. Getting a shield out shows them that we care, and they're not alone in this, as well as protecting them. I've personally heard too many stories from front line workers in my area of preventable tragedies, kids who should still have both their parents. The printer is running now. I've given away 350 shields (full sets with acetate and elastic) just since this weekend. It's overwhelming. I'm glad to have resolved this issue and the printers have produced over 100 shields in the last 2 days. I'm doing shields with visors based on feedback from nurses and ems workers. Since Ive gotten such quick turnarounds as a set of 50 being picked up in the morning and end up in the field by night time. I'm getting very fast feedback for design changes. The visor design is the winner. Not the quickest to print, but it's what works. Thank you for the help. The machines have gone through almost 20kg of PETg this week and more is on its way. I hope to be printing more fun things eventually.
-
@kazolar cnc kitchen has done a good video on speeding up prints, might help churn more out! https://youtu.be/_bt1UZAnxnA
Which design are you finding most popular with healthcare professionals? Link to yours or similar?
Ian
-
@droftarts yes, I was already doing all those tips, that's how I am getting the yields I am.
I actually use the design CNC kitchen as inspiration to iterate on the design we started with. I had published the design I've been printing:
Enhanced low weight Modified Prusa Face Shield with a Visor found on #Thingiverse https://www.thingiverse.com/thing:4273009
There is 3 of us in our group, I have much more capacity, so I've distributed many more shields, but other members have been catching up after adding new machines to their effort. Our employer has stepped up to cover our expenses, even paying for other members to get additional machines (sidewinder x1). Since I've switched to the visor design, our other members have also started printing it, and I'm passing on the tips to squeeze as much speed as possible out of every print. These don't have to be pretty, they just have to work. -
@dc42 the problem is not fixed. This time it doesn't complain about any driver issues -- just starts stuttering exactly 30 minutes after starting the print. I am trying it after board reset. I may go back to an earlier version. Not sure what's going on.
EDIT:
Went back to 2.03 RC2 -- so far all is good.
Gonna stay on that for now -
Did you check a M122 to see if there were hiccups?
-
@Phaedrux no hickups -- it just starts stuttering. I am back to version 2.03 RC2 -- 2nd print with no resets is running fine -- now that's no indication that it's bug free, but it's working, so until I see a reason to move from it I'm staying on this build
-
So doing a bunch of M122s during the print, and finally caught the issue -- underruns -- @dc42 the count resets too often, and would be nice to get an error on the screen when it gets critical. I switched to a brand new class 10 sd cards, and stuttering and all weirdness stopped -- back to version 2.05.1. As smart as Duet is -- the fact that an SD card is not up to snuff, and/or is dying, should be something you can detect. Took me over 2 weeks hunting for the issue. Underruns keep resetting, so it's almost impossible to go on that. Now underruns are 0,0 -- and UI on the LCD is more responsive, shows the list of files and macros in an instant.
-
@kazolar Thanks for your persistence, and your report. SD card problems can have strange, and often not very obvious, effects. I don't know if the firmware can be set to detect SD card issues, that's one for @dc42. You can test an SD card with M122 P104 S[file size in MB], usually between 2 and 2.5Mbytes/sec. For me: Duet 2 WiFi - 2.23Mbytes/sec, Duet Maestro 2.42Mbytes/sec for a 10MB file.
Ian
-
@kazolar underruns, and any of the other stats like that, are reset each time you run M122.
-
@droftarts there is gotta be something to respond to underruns of some level. Clearly underruns were getting out of hand, if the firmware simply starts complaining about underruns how it complains about stepper phase warnings and other things of that nature, then it makes troubleshooting a lot easier, and resetting underruns seems to happen more often than just running m122. I canceled the print and all the stats in m122 underrun line was cleared out.
-
@kazolar How are the underruns actually reported in the M122? Is it just with the error status, or does it show in some other field? If you managed to save a copy of an M122 that shows it, that would be useful.
Ian
-
@droftarts here is what an M122 report looks like with underruns. This is from my own print just now. For me, it seems the underruns are from tiny segments created by simplify3d for support structures, combined with high speeds and some amount of PA.
4/13/2020, 9:29:48 AM M122 === Diagnostics === RepRapFirmware for Duet 2 WiFi/Ethernet version 2.05.1.1-simple_dynamic_unretraction running on Duet Ethernet 1.02 or later + DueX2 Board ID: 08DGM-956GU-DJMSN-6J9D4-3SJ6K-1BNBF Used output buffers: 1 of 24 (16 max) === RTOS === Static ram: 25712 Dynamic ram: 93652 of which 0 recycled Exception stack ram used: 480 Never used ram: 11228 Tasks: NETWORK(ready,628) HEAT(blocked,1232) DUEX(suspended,160) MAIN(running,3712) IDLE(ready,160) Owned mutexes: === Platform === Last reset 23:30:20 ago, cause: power up Last software reset at 2020-04-11 22:50, reason: Stuck in spin loop, spinning module GCodes, available RAM 11048 bytes (slot 2) Software reset code 0x4043 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0041f80f BFAR 0xe000ed38 SP 0x20001f4c Task 0x5754454e Stack: 00404463 004047e4 81000000 b0000000 412a3fa5 00000000 00000000 3331bb4c 41880000 3e178897 3e1cd04f bdb7f86e 423985c3 4050ac00 3cce8f96 40a00000 4453b9c2 c0000000 40f4ffb7 20000010 00404459 000003c8 00404aa9 Error status: 0 Free file entries: 9 SD card 0 detected, interface speed: 20.0MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 36.6, current 37.6, max 38.8 Supply voltage: min 23.9, current 24.6, max 25.0, under voltage events: 0, over voltage events: 0, power good: yes Driver 0: ok, SG min/max 0/1023 Driver 1: standstill, SG min/max 0/1023 Driver 2: standstill, SG min/max 0/135 Driver 3: ok, SG min/max 0/1023 Driver 4: standstill, SG min/max not available Driver 5: standstill, SG min/max not available Driver 6: standstill, SG min/max not available Date/time: 2020-04-13 09:29:42 Cache data hit count 4294967295 Slowest loop: 17.11ms; fastest: 0.07ms I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0 === Move === Hiccups: 0, FreeDm: 158, MinFreeDm: 117, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === DDARing === Scheduled moves: 1295584, completed moves: 1295544, StepErrors: 0, LaErrors: 0, Underruns: 595, 0 === Heat === Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1 Heater 0 is on, I-accum = 0.2 Heater 1 is on, I-accum = 0.5 === GCodes === Segments left: 1 Stack records: 1 allocated, 0 in use Movement lock held by null http is idle in state(s) 0 telnet is idle in state(s) 0 file is doing "G1 X-29.037 Y13.502 E0.0004" in state(s) 0 serial is idle in state(s) 0 aux is idle in state(s) 0 daemon is idle in state(s) 0 queue is idle in state(s) 0 autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 15.90ms; fastest: 0.06ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 1 of 8 Interface state 5, link 100Mbps full duplex
-
@bot said in Duet 2.05 memory leak?:
=== DDARing ===
Scheduled moves: 1295584, completed moves: 1295544, StepErrors: 0, LaErrors: 0, Underruns: 595, 0Thanks, I know where to look now!
Ian
-
@droftarts I think I read that the first number is a warning, the 2nd number will cause stutter or a pause if it gets bad. I can tell from switching SD cards, my gcode uploads are faster now -- hitting 700kb/sec -- almost maxing out the 100mb link -- never had over 500 before.
-
@kazolar I think the first value isn't a warning, just an indication that the lookahead function couldn't do something (not sure what) with the time given. It doesn't slow down the print, but is likely not ideal. The second number is a prepare move underrun, which means that the move could not be prepared in time and so the movement must wait. This is much worse than the first one.
Also, since I'm interested in SD card performance at the moment, I noticed your last comment and must correct you somewhat, just for your info: 700 kB/s is not nearly maxing out a 100 Mbps link. 100 Mbps = 12.5 MB/s