Poor print quality with RRF3 - especially 3.2.2.

deckingman

I've installed 3.3.beta1 on all boards. I cleared the console, ran M122 on all boards after the instal, then downloaded the console output as a text file. Then I homed the printer, and repeated the "clear console - run M122s -download text" sequence again. Then started a print and repeated that M122 sequence again mid print. And finally, one more time at the end. I've created a folder on Google drive and put each of those 4 console.txt files in a separate folder. So it should be self explanatory - here is a link https://drive.google.com/drive/folders/1m1BnwUsU035TQT_76kcl3Yiz4BPApXg8?usp=sharing

I don't see any of the non zero oos on the expansion boards but I do see some send timeouts on the main board for the mid print and post print reports (102 mid-print and 373 post print).

The print itself if still cooling down on the build plate so I haven't looked closely at it but it doesn't look good. Not as bad as some that I've seen, but not as good as "acceptable" ones either.

HTH

deckingman

@droftarts Fortunately for you, I didn't keep a record or your address - so you can come out from under the duvet now.

dc42

@deckingman, thanks for those M122 logs. If the print quality is between the bad ones and the acceptable ones, that is confusing! Can you confirm that you have been running with the same pressure advance setting (perhaps none) throughout all the recent prints?

As you said, you are getting a nonzero number of send timeouts. In theory this should mean that some messages (probably movement messages) have been lost; but in that case the expansion boards should have reported some 'oos' (out-of-sequence) errors. So I am still trying to work out the reason for this apparent conflict. The other user I was working with had a similar effect: a small number of 'oos' errors (and corresponding defects in the print) but several hundred times as many send timeouts.

My suspicion is that the CAN peripheral is delaying the reporting of successful transmissions by longer than the timeout period. However, I would like to eliminate the send timeouts as a possible reason for the issue you are seeing. So please upgrade once more, this time to the firmware binaries at https://www.dropbox.com/sh/q5uqqkjbmhgvlhq/AACYqG0ynLME9ogoLd1zLB2Xa?dl=0. These are later versions of 3.3beta1 and fully compatible with them. So no need to do anything different this time. I expect these binaries to get rid of the send timeouts, and also to get rid of the spurious 'bm' errors reporting on the 3HC after homing.

deckingman

@dc42 Ref the print quality - let's go with bad rather than acceptable. But the way things have been, it's possible that the next print might be worse (or better).

I can confirm that everything is exactly the same - including PA (and it's still the same reel of filament. I tell a lie - the part cooling fans are no longer connected since one of the most acceptable prints was done without any part cooling.

I've downloaded the binaries - it might be a day or two before I get chance to instal them and run another print.

As I'm running a beta, is there anything else you'd like me to check?

droftarts

@deckingman did you swap extruders?

Ian

deckingman

@droftarts said in Poor print quality with RRF3 - especially 3.2.2.:

@deckingman did you swap extruders?

Ian

Er no. I thought it best to only change one thing - in this case the firmware. But thanks for reminding me. I'll do another print "as is" with David's latest binaries, then I'll change to another extrude and repeat.

dc42

@deckingman said in Poor print quality with RRF3 - especially 3.2.2.:

I tell a lie - the part cooling fans are no longer connected since one of the most acceptable prints was done without any part cooling.

What filament are you printing with? If it's PLA then please reconnect the print cooling fans, because in my experience print cooling makes a huge difference with PLA, and some of the issues with the prints you posted before looked to me like the effects of insufficient cooling. If you are using PETG then print cooling seems to me to be far less important.

deckingman

@dc42 That's not born out by previous prints https://drive.google.com/drive/folders/1XmFXYBGnj3rXJRLg1muSNBd18AbP2ZDl?usp=sharing. The reason why I disconnected the fans was because people were talking about intermittent fan issues. So I disconnected them completely to see just how bad it would be, and was surprised when I ended up with one of the best prints of the batch. Therefore I deduced that running without fans would eliminate one potential variable (knowing that decent prints could be produced with that configuration).

I use the auto cooling feature of Slic3R in any case so the fans don't do much if the layer time is less than circa 15 seconds or if it's doing bridges. And the fans are arranged so that no direct air flows over the nozzle. It's something that I have played around with a lot over the years.

Maybe it's because I print temp towers without fans and so end with using lower temperatures than a lot of people. Again, something that I've played around with a lot over the years.

deckingman

@dc42 David,

Tried the binaries from your dropbox. The main board reports being 3.3beta1+1 but the expansion boards come back as being 3.3beta1 (2021-02-14 16:32:08) which I think is the same as last time. Is that what you would expect?

Anyway, as before, after installing all the updates which went as expected with all 4 boards updating and then being prompted to do a reset, I cycled the power (just to be sure) then ran an initial M122 and all 4 boards and downloaded the console text. Then I heated the bed and ran a home all and repeated the M122 reports. Then I started a print which was going well - extremely well - in fact one of the best I've seen for a long time. Part way through, I did another set of M122s.

A short time after that, the printer went into self destruct mode. It looks like a massive shift in the X direction and the carriage was doing it's best to head butt the side of the printer frame into pulp. But the UV axis didn't suffer the same shift which has caused the cabling and Bowden tubes to tear the hot end off it's Kinematic mount.

I had the forethought to pause the print rather than bang the emergency stop button. So at that point, I ran another sequence of M122s on all the boards. I could not however download the console output using DWC - it just didn't do anything. So I managed to copy the text and paste it into notepad++ and save that file.

I then turned the printer off and re-aligned the gantries, but with the hot end dangling on whatever wires haven't been torn out. I didn't see any obvious shorts but it might well have blown a fuse .Then I tried jogging the gantries around and everything seems normal. To be clear, all motors and belts are functioning as normal, so the sudden massive X shift doesn't appear to have been caused by a mechanical failure.

I don't see much from a cursory glance at the M122 reports, but then I don't really know what I'm looking at.

I've uploaded all the files to another folder on my google drive, As before, the console outputs for the initial, post homing and mid print M122s are in separate folders. The console output after I hit pause are in a file called "Console dump after pause" (there are two M122s for board 3 - that was just me sending M122 B3 twice by mistake). I've also posted a picture of the carriage showing the hot end torn out of it's mount and the pile of spaghetti that it tried to produce while it was head butting the printer frame into submission (it lost the fight). But if you look below that, you'll see that start of a print which was looking really quite nice.

BTW, the XY and Z motors are connected to board 3, UVA and B are connected to the main board and the 6 extruders are connected to boards 1 and 2.

Here is a link to google drive folder. https://drive.google.com/drive/folders/1oLJvwhLCwxKpKRt6UN-UCsyjkjdVFOL_?usp=sharing

It'll be some time before I get all the damage repaired and I'm in a position to test anything else.......

deckingman

Further to my last, I've completed a damage assessment. It seems that the belts, motors, idlers etc all seem OK. The gantries can all be jogged around using my joystick. The hot end heats so looks like the heater cartridge, thermistor, and wiring survived.

But the X end stop is permanently triggered so there is an open circuit wire there somewhere. When the hot end got torn off it's mount, it destroyed the part cooling fan plate because it was plastic and got melted. But I had disconnected the fans in any case, so I can live with that for now. Also, the "cob" light I had around the nozzle is broken, but again, I can live without that.

Up until the catastrophic failure, the print was actually looking quite good (ignore the spaghetti caused by the sudden massive X shift).

It looks like I can straighten out the hot end mount without having to make any new parts - it's just a bit twisted. And I can live without fans/lights for now. I need to sort out the open circuit X end stop but that shouldn't take long.

The only thing that changed before this all happened was the firmware. Same print file, same settings. Even the same reel of filament. So obviously, I won't try any more prints with this particular firmware.

Phaedrux

Sorry to hear about that head crash and damage. Hopefully the data collected proves useful.

deckingman

@Phaedrux said in Poor print quality with RRF3 - especially 3.2.2.:

Sorry to hear about that head crash and damage. Hopefully the data collected proves useful.

Yes, fingers crossed ......

deckingman

Having slept on it, I realise that I missed something out. Before installing the "Dropbox binaries" I did a test print with pressure advance enabled to see if I still had the same problems with it. I did not - the print looked good. I only did the first couple of layers then aborted it.

But I left it enabled. So the last test print which had the catastrophic failure with what looks like a large random XY excursion, was done with PA so the settings were not identical to all the other prints.

Apologies for forgetting to mention that. I've no idea if it's significant or not, but it likely explains why the print was looking so good before the failure.

o_lampe

@deckingman

Wondering how you disable PA?

I asked myself if there is a difference beween

;M572 D0 S0.05	; using the semikolon
or
M572 D0 S0.0     ;  setting S value to 0.0

IF there is something wrong with PA, would it still be there when setting S0.0?
I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

deckingman

@o_lampe said in Poor print quality with RRF3 - especially 3.2.2.:

@deckingman

Wondering how you disable PA?

I asked myself if there is a difference beween
;M572 D0 S0.05	; using the semikolon
or
M572 D0 S0.0     ;  setting S value to 0.0
IF there is something wrong with PA, would it still be there when setting S0.0?
I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

I simply commented it out using a semi colon (and removed the semi colon to re-enable it).

oliof

@o_lampe said in Poor print quality with RRF3 - especially 3.2.2.:

I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

I'm no dc42 but I can use code search on github, and it looks like PA is set to 0.0 by default (also here) if it's not configured differently by issuing M572. So by my reading, not setting it or setting it to 0.0 yields exactly the same configuration.

o_lampe

@oliof Which means, if there's a bug in PA-routine, it wouldn't help outcommenting it?

oliof

@o_lampe still looking to see whether PA routine is skipped if value is set to 0.0 or not. I am not that familiar with the RRF code yet, as I have mainly messed around in contained places (Kinematics) so far.

dc42

@deckingman, I'm very sorry to hear that your machine was damaged. I'm glad the damage isn't too severe.

You reported that the expansion board firmware version didn't appear to have changed, so I checked that first. When I attempted to upload those same firmware binaries to my test system, M115 reported a different firmware version from yours. So I checked what version it should have reported, and that was different again: Duet EXP3HC firmware version 3.3beta1+1 (2021-03-02 15:56:53)".

It turned out that although DWC uploads firmware binaries to /firmware, RRF was still fetching binaries from /sys when upgrading expansion board firmware. So the 3.3beta1 expansion board firmware binaries in /sys were being re-installed instead of the newly-uploaded ones. That explains the version number not changing. I will of course fix this in the next beta.

However, the changes between 3.3beta1 and the later expansion board firmware are minor. I have just gone through the commit logs to verify that there have been no critical changes. In particular, the CAN protocols have not changed. So this doesn't explain why your machine crashed.

I next turned to the M122 logs that you posted. Thanks for having the presence of mind to pause the print and take a set of M122 readings before resetting. I was rather expecting to find that board 3 had reset, which would explain some missing X moves. However, in the "Console dump after pause" log, the last reset times of the boards read as follows (ignoring the second M122 B3 after you cancelled the print):

0: Last reset 01:19:09 ago, cause: power up/Last software reset at 2021-03-07 11:22, reason: User
1: Last reset 01:19:18 ago, cause: power up
2: Last reset 01:19:25 ago, cause: power up
3: Last reset 01:19:32 ago, cause: power up

So all four boards were reset at the same time.

Looking in more detail at those logs, I found a couple of interesting parts:

The M122 B3 report when paused shows the following:

Driver 0: position -1182960, 1600.0 steps/mm, standstill, reads 45627, writes 0 timeouts 0, SG min/max 14/326, steps req 4320 done 4320
Driver 1: position 3200, 80.0 steps/mm, ok, reads 45627, writes 0 timeouts 0, SG min/max 0/1019, steps req 1418006 done 1419078
Driver 2: position -145076, 80.0 steps/mm, ok, reads 45626, writes 0 timeouts 0, SG min/max 0/373, steps req 1426723 done 1427794
Moves scheduled 16699, completed 16698, in progress 1, hiccups 0, step errors 0, maxPrep 85, maxOverdue 5, maxInc 2, mcErrs 0, gcmErrs 0

It's reporting 1 move in progress, yet the number of steps done on drivers 1 and 2 is greater than the number of steps requested. However, I think this may be caused by running the previous M122 when the printer was not paused or otherwise idle, so that steps from moves that were in the queue when the previous M122 was run have been included in the steps-done count.

The M122 report for the main board shows this:

Tasks: NETWORK(ready,224) ETHERNET(notifyWait,124) HEAT(delaying,284) CanReceiv(notifyWait,795) CanSender(notifyWait,359) CanClock(delaying,349) TMC(notifyWait,18) MAIN(running,924) IDLE(ready,20)

The numbers are the remaining stack space. I have never seen the TMC stack space go as low as that. It sometimes happens that the actual stack space used is greater than the reported amount due to the compiler allocating stack but not using the bit that is monitored. So it's possible that the TMC task stack is overflowing. There is no other evidence to suggest this, however I will increase the stack size as a precaution.

There are no indications in the reports of lost CAN messages: no send timeouts in the main board M122, and no 'oos' count in the M122 B3.

I plan to proceed as follows:

Review the changes between the beta1 main board firmware and the version I provided on Dropbox, and the CAN transmit fifo driver that the new firmware uses.
I already planned to add a 3HC board to my tool changer and use it to drive the X and Y axes, so that I have a machine (not just a bench setup) that uses a 3HC to drive axes. I will do that and then try your print.

Three questions for you:

Is the amount of X shift in the photo you posted consistent with the amount of shift being the length of the box it was printing? Or so you think it may have been more?
Have you already shared that GCode file, and if so, where?
You said that you were unable to download the console and you had to copy-and-paste it. Do you mean that you tried to click on the list icon at the top right of the console (to get the "Download as text" option), but it didn't respond?

dc42

@oliof said in Poor print quality with RRF3 - especially 3.2.2.:

@o_lampe still looking to see whether PA routine is skipped if value is set to 0.0 or not. I am not that familiar with the RRF code yet, as I have mainly messed around in contained places (Kinematics) so far.

RRF does not distinguish between PA never having been set, and being set to 0.0.