Poor print quality with RRF3 - especially 3.2.2.

deckingman

@dc42 Ref the print quality - let's go with bad rather than acceptable. But the way things have been, it's possible that the next print might be worse (or better).

I can confirm that everything is exactly the same - including PA (and it's still the same reel of filament. I tell a lie - the part cooling fans are no longer connected since one of the most acceptable prints was done without any part cooling.

I've downloaded the binaries - it might be a day or two before I get chance to instal them and run another print.

As I'm running a beta, is there anything else you'd like me to check?

droftarts

@deckingman did you swap extruders?

Ian

deckingman

@droftarts said in Poor print quality with RRF3 - especially 3.2.2.:

@deckingman did you swap extruders?

Ian

Er no. I thought it best to only change one thing - in this case the firmware. But thanks for reminding me. I'll do another print "as is" with David's latest binaries, then I'll change to another extrude and repeat.

dc42

@deckingman said in Poor print quality with RRF3 - especially 3.2.2.:

I tell a lie - the part cooling fans are no longer connected since one of the most acceptable prints was done without any part cooling.

What filament are you printing with? If it's PLA then please reconnect the print cooling fans, because in my experience print cooling makes a huge difference with PLA, and some of the issues with the prints you posted before looked to me like the effects of insufficient cooling. If you are using PETG then print cooling seems to me to be far less important.

deckingman

@dc42 That's not born out by previous prints https://drive.google.com/drive/folders/1XmFXYBGnj3rXJRLg1muSNBd18AbP2ZDl?usp=sharing. The reason why I disconnected the fans was because people were talking about intermittent fan issues. So I disconnected them completely to see just how bad it would be, and was surprised when I ended up with one of the best prints of the batch. Therefore I deduced that running without fans would eliminate one potential variable (knowing that decent prints could be produced with that configuration).

I use the auto cooling feature of Slic3R in any case so the fans don't do much if the layer time is less than circa 15 seconds or if it's doing bridges. And the fans are arranged so that no direct air flows over the nozzle. It's something that I have played around with a lot over the years.

Maybe it's because I print temp towers without fans and so end with using lower temperatures than a lot of people. Again, something that I've played around with a lot over the years.

deckingman

@dc42 David,

Tried the binaries from your dropbox. The main board reports being 3.3beta1+1 but the expansion boards come back as being 3.3beta1 (2021-02-14 16:32:08) which I think is the same as last time. Is that what you would expect?

Anyway, as before, after installing all the updates which went as expected with all 4 boards updating and then being prompted to do a reset, I cycled the power (just to be sure) then ran an initial M122 and all 4 boards and downloaded the console text. Then I heated the bed and ran a home all and repeated the M122 reports. Then I started a print which was going well - extremely well - in fact one of the best I've seen for a long time. Part way through, I did another set of M122s.

A short time after that, the printer went into self destruct mode. It looks like a massive shift in the X direction and the carriage was doing it's best to head butt the side of the printer frame into pulp. But the UV axis didn't suffer the same shift which has caused the cabling and Bowden tubes to tear the hot end off it's Kinematic mount.

I had the forethought to pause the print rather than bang the emergency stop button. So at that point, I ran another sequence of M122s on all the boards. I could not however download the console output using DWC - it just didn't do anything. So I managed to copy the text and paste it into notepad++ and save that file.

I then turned the printer off and re-aligned the gantries, but with the hot end dangling on whatever wires haven't been torn out. I didn't see any obvious shorts but it might well have blown a fuse .Then I tried jogging the gantries around and everything seems normal. To be clear, all motors and belts are functioning as normal, so the sudden massive X shift doesn't appear to have been caused by a mechanical failure.

I don't see much from a cursory glance at the M122 reports, but then I don't really know what I'm looking at.

I've uploaded all the files to another folder on my google drive, As before, the console outputs for the initial, post homing and mid print M122s are in separate folders. The console output after I hit pause are in a file called "Console dump after pause" (there are two M122s for board 3 - that was just me sending M122 B3 twice by mistake). I've also posted a picture of the carriage showing the hot end torn out of it's mount and the pile of spaghetti that it tried to produce while it was head butting the printer frame into submission (it lost the fight). But if you look below that, you'll see that start of a print which was looking really quite nice.

BTW, the XY and Z motors are connected to board 3, UVA and B are connected to the main board and the 6 extruders are connected to boards 1 and 2.

Here is a link to google drive folder. https://drive.google.com/drive/folders/1oLJvwhLCwxKpKRt6UN-UCsyjkjdVFOL_?usp=sharing

It'll be some time before I get all the damage repaired and I'm in a position to test anything else.......

deckingman

Further to my last, I've completed a damage assessment. It seems that the belts, motors, idlers etc all seem OK. The gantries can all be jogged around using my joystick. The hot end heats so looks like the heater cartridge, thermistor, and wiring survived.

But the X end stop is permanently triggered so there is an open circuit wire there somewhere. When the hot end got torn off it's mount, it destroyed the part cooling fan plate because it was plastic and got melted. But I had disconnected the fans in any case, so I can live with that for now. Also, the "cob" light I had around the nozzle is broken, but again, I can live without that.

Up until the catastrophic failure, the print was actually looking quite good (ignore the spaghetti caused by the sudden massive X shift).

It looks like I can straighten out the hot end mount without having to make any new parts - it's just a bit twisted. And I can live without fans/lights for now. I need to sort out the open circuit X end stop but that shouldn't take long.

The only thing that changed before this all happened was the firmware. Same print file, same settings. Even the same reel of filament. So obviously, I won't try any more prints with this particular firmware.

Phaedrux

Sorry to hear about that head crash and damage. Hopefully the data collected proves useful.

deckingman

@Phaedrux said in Poor print quality with RRF3 - especially 3.2.2.:

Sorry to hear about that head crash and damage. Hopefully the data collected proves useful.

Yes, fingers crossed ......

deckingman

Having slept on it, I realise that I missed something out. Before installing the "Dropbox binaries" I did a test print with pressure advance enabled to see if I still had the same problems with it. I did not - the print looked good. I only did the first couple of layers then aborted it.

But I left it enabled. So the last test print which had the catastrophic failure with what looks like a large random XY excursion, was done with PA so the settings were not identical to all the other prints.

Apologies for forgetting to mention that. I've no idea if it's significant or not, but it likely explains why the print was looking so good before the failure.

o_lampe

@deckingman

Wondering how you disable PA?

I asked myself if there is a difference beween

;M572 D0 S0.05	; using the semikolon
or
M572 D0 S0.0     ;  setting S value to 0.0

IF there is something wrong with PA, would it still be there when setting S0.0?
I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

deckingman

@o_lampe said in Poor print quality with RRF3 - especially 3.2.2.:

@deckingman

Wondering how you disable PA?

I asked myself if there is a difference beween
;M572 D0 S0.05	; using the semikolon
or
M572 D0 S0.0     ;  setting S value to 0.0
IF there is something wrong with PA, would it still be there when setting S0.0?
I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

I simply commented it out using a semi colon (and removed the semi colon to re-enable it).

oliof

@o_lampe said in Poor print quality with RRF3 - especially 3.2.2.:

I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

I'm no dc42 but I can use code search on github, and it looks like PA is set to 0.0 by default (also here) if it's not configured differently by issuing M572. So by my reading, not setting it or setting it to 0.0 yields exactly the same configuration.

o_lampe

@oliof Which means, if there's a bug in PA-routine, it wouldn't help outcommenting it?

oliof

@o_lampe still looking to see whether PA routine is skipped if value is set to 0.0 or not. I am not that familiar with the RRF code yet, as I have mainly messed around in contained places (Kinematics) so far.

dc42

@deckingman, I'm very sorry to hear that your machine was damaged. I'm glad the damage isn't too severe.

You reported that the expansion board firmware version didn't appear to have changed, so I checked that first. When I attempted to upload those same firmware binaries to my test system, M115 reported a different firmware version from yours. So I checked what version it should have reported, and that was different again: Duet EXP3HC firmware version 3.3beta1+1 (2021-03-02 15:56:53)".

It turned out that although DWC uploads firmware binaries to /firmware, RRF was still fetching binaries from /sys when upgrading expansion board firmware. So the 3.3beta1 expansion board firmware binaries in /sys were being re-installed instead of the newly-uploaded ones. That explains the version number not changing. I will of course fix this in the next beta.

However, the changes between 3.3beta1 and the later expansion board firmware are minor. I have just gone through the commit logs to verify that there have been no critical changes. In particular, the CAN protocols have not changed. So this doesn't explain why your machine crashed.

I next turned to the M122 logs that you posted. Thanks for having the presence of mind to pause the print and take a set of M122 readings before resetting. I was rather expecting to find that board 3 had reset, which would explain some missing X moves. However, in the "Console dump after pause" log, the last reset times of the boards read as follows (ignoring the second M122 B3 after you cancelled the print):

0: Last reset 01:19:09 ago, cause: power up/Last software reset at 2021-03-07 11:22, reason: User
1: Last reset 01:19:18 ago, cause: power up
2: Last reset 01:19:25 ago, cause: power up
3: Last reset 01:19:32 ago, cause: power up

So all four boards were reset at the same time.

Looking in more detail at those logs, I found a couple of interesting parts:

The M122 B3 report when paused shows the following:

Driver 0: position -1182960, 1600.0 steps/mm, standstill, reads 45627, writes 0 timeouts 0, SG min/max 14/326, steps req 4320 done 4320
Driver 1: position 3200, 80.0 steps/mm, ok, reads 45627, writes 0 timeouts 0, SG min/max 0/1019, steps req 1418006 done 1419078
Driver 2: position -145076, 80.0 steps/mm, ok, reads 45626, writes 0 timeouts 0, SG min/max 0/373, steps req 1426723 done 1427794
Moves scheduled 16699, completed 16698, in progress 1, hiccups 0, step errors 0, maxPrep 85, maxOverdue 5, maxInc 2, mcErrs 0, gcmErrs 0

It's reporting 1 move in progress, yet the number of steps done on drivers 1 and 2 is greater than the number of steps requested. However, I think this may be caused by running the previous M122 when the printer was not paused or otherwise idle, so that steps from moves that were in the queue when the previous M122 was run have been included in the steps-done count.

The M122 report for the main board shows this:

Tasks: NETWORK(ready,224) ETHERNET(notifyWait,124) HEAT(delaying,284) CanReceiv(notifyWait,795) CanSender(notifyWait,359) CanClock(delaying,349) TMC(notifyWait,18) MAIN(running,924) IDLE(ready,20)

The numbers are the remaining stack space. I have never seen the TMC stack space go as low as that. It sometimes happens that the actual stack space used is greater than the reported amount due to the compiler allocating stack but not using the bit that is monitored. So it's possible that the TMC task stack is overflowing. There is no other evidence to suggest this, however I will increase the stack size as a precaution.

There are no indications in the reports of lost CAN messages: no send timeouts in the main board M122, and no 'oos' count in the M122 B3.

I plan to proceed as follows:

Review the changes between the beta1 main board firmware and the version I provided on Dropbox, and the CAN transmit fifo driver that the new firmware uses.
I already planned to add a 3HC board to my tool changer and use it to drive the X and Y axes, so that I have a machine (not just a bench setup) that uses a 3HC to drive axes. I will do that and then try your print.

Three questions for you:

Is the amount of X shift in the photo you posted consistent with the amount of shift being the length of the box it was printing? Or so you think it may have been more?
Have you already shared that GCode file, and if so, where?
You said that you were unable to download the console and you had to copy-and-paste it. Do you mean that you tried to click on the list icon at the top right of the console (to get the "Download as text" option), but it didn't respond?

dc42

@oliof said in Poor print quality with RRF3 - especially 3.2.2.:

@o_lampe still looking to see whether PA routine is skipped if value is set to 0.0 or not. I am not that familiar with the RRF code yet, as I have mainly messed around in contained places (Kinematics) so far.

RRF does not distinguish between PA never having been set, and being set to 0.0.

deckingman

@dc42 said in Poor print quality with RRF3 - especially 3.2.2.:

Three questions for you:

Is the amount of X shift in the photo you posted consistent with the amount of shift being the length of the box it was printing? Or so you think it may have been more?

Have you already shared that GCode file, and if so, where?

You said that you were unable to download the console and you had to copy-and-paste it. Do you mean that you tried to click on the list icon at the top right of the console (to get the "Download as text" option), but it didn't respond?

#1 The print is actually a box plus lid. The lid is only 3mm thick (tall) and an unremarkable rectangle, so I haven't included any pics of that before. But at the point of failure, the machine had long finished the lid and was printing just the box. That box is a tad over 70mm wide in X. From the witness mark on the bed, I can see that the right hand edge of that box is about 110mm from the right hand edge of the build plate. So for the head to crash into the frame and try to repeatedly beat the hell out of it, means that there would have to have been a shift of >110 mm - significantly larger than the width of the part it was printing.

#2. No I hadn't shared the file. I have now uploaded it to the folder on Google drive that I last linked too (the one that has the console dump file). In fact, I've uploaded the original version as sliced plus the version which has UVAB moves added. The latter is distinguishable by the "UVAB" suffix which is added to the end of the file name - that's the one I've been using but unless you can simulate a CoreXYUVAB, it won't be much use (hence the inclusion of the original).

Note. You'll need to remove the call to the "pre-print" macro. You'll then have to heat the bed and nozzle and send a "T0" before attempting to print or simulate that file (as well as home the printer). I could post the "pre-print" macro but it calls other macros to set the tool temperatures, home the printer, purge and wipe the nozzle etc, so it all gets complicated and it'll just be easier to manually heat the bed and hot end.

#3. That's correct. In all other instances, I've simply selected the "download as text" option but after pressing "pause", that didn't work. Firefox shows me when a file is being downloaded but that didn't happen - there was no indication on Firefox that at download was happening. And no additional files appeared in the list of downloads (only those console.text files that were downloaded prior to the crash).

I can't remember if I pressed cancel before or after attempting to download the console text. I think, I pressed pause, tried to download, then pressed cancel and tried to download again, but I can't be 100% sure of that. I've uploaded both the pause.g and cancel.g files to the same place. I can't see anything in either of those that would have prevented downloads from working but you might spot something I've missed.

deckingman

For anyone watching this thread, and for those who have contributed, I just want to say that the Duet team and I have opened up the communication medium that we used at the very start (when Gen 3 was still at the pre-production stage), in order to work together to resolve these issues. That's nothing personal - just that these forums are maybe not the best way to post messages rapidly back and forth between us.

I thought it important to state that fact, in case anyone got the impression that I had been abandoned by the Duet team - that isn't the case and we are working together to try and get to the bottom of wtf is happening.

jens55

Thank you for letting us know.