Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.
-
I'll just chime in with one more thing:
When Duet 2 was being developed, near the beginning, it was much like this (except not separated so much -- only DWC and RRF, with the original wifi server or whatever too).
I waited very patiently until the code was mature. I bought several Duet 2 boards (before they were called Duet 2) immediately in their infancy. However, RRF was not at a point that it could really be used for what I wanted to do (IDEX printer).
I just waited! I worked on my own stuff and waited. I felt this was fine. I didn't feel I was owed anything by the developers. If anything, I was super gracious that the developers were working so hard on the code to make it work.
Finally, RRF2 got to a point where it was complete enough and reliable enough to use! Hallelujah!
Then, immediately, all the developers decided to abandon RRF2 in favour of RRF3! RRF2 is not as stable a rock as we think it is, but the developers are going full-bore restarting the "wait and see" cycle for RRF3 users.
What about us RRF2 users? Why abandon that so abruptly?
We need a team that is still working on RRF2, while RRF3 is developed! I don't think RRF2 can or should be left in the state it is in.
-
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
All of that has nothing to do with the pervasive attitude that RRF and DSF are two separate things. From the viewpoint of the end user they are one thing with defined external interactions (Gcode, Web API, etc).
Surely not ideal; but RRF and DWC were two separate things before DSF as well, just more mature, and I don't see any reason to suspect it will not return to that state. I'm also pretty sure its on the Duet3d agenda to unify things as much as possible - the new duet3d github is probably a sign of things to come. As such given maturity who wrote what or who supports what will matter less when the resources to develop and support have a workload thats more matched to the capability.
Meanwhile the user can choose how to deal with.
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I withdraw my comment regarding sudo. It diverted attention from the real issue: What is the GCODE to restart the system ?
This is also as far as I recall a conscious decision to limit gcode's ability to affect the Pi on a system level . At least M550, M552 and and a few others was at least a topic. And to some degree it boils down to lacking SUDO M997 gcode.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
This is also as far as I recall a conscious decision to limit gcode's ability to affect the Pi on a system level. At least M550, M552 and and a few others was at least a topic.
I don't doubt that was a conscious design decision. It fits very firmly with the blind spot that Duet sees these two as somehow separate or different. Again, gcode in one end, movement out the other. Gcode configuration codes in one end, immediate effect on the running device out the other.
Given the statements in the image below, gcodes should "affect the Pi on the system level". M550 (set name) absolutely should set the name of the Duet system network interface, i.e. the Pi itself. Same with 552 (set IP address). And many more such codes.
Or, is Duet explicitly changing the philosophies stated below? Particularly "All settings are done through G-Code"?
-
@bot said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
What about us RRF2 users? Why abandon that so abruptly?
We need a team that is still working on RRF2, while RRF3 is developed! I don't think RRF2 can or should be left in the state it is in.What do you perceive that state to be, and why the reference to "while RRF3 is being developed"? We get few complaints these days of RRF 2.05.1 not doing what it is supposed to do. RRF3 core development was completed months ago, to beyond the point where it provided all the features available in RRF2. Nor did I abandon RRF 2 users: I did the RRF 2.05.1 release when I found important bugs in RRF 2.05. But it no longer makes sense to add new features to RRF 2.
If you look at the bug fix lists in the release notes for the last several RRF 3.01-RC releases, you will see that they are almost all either related to new features in RRF3 that Duet 2 and Duet 3 standalone users don't have to use, or are fixes for minor bugs (some of which are also present in RRF 2). [The exceptions were a couple of new bugs in the 12864 display code for the Duet Maestro.] So RRF 3.01-RC is very nearly as stable and reliable as RRF 2.05.1 even though it provides a lot more functionality. You can run RRF 3.01-RC7 on a Duet 2 or Duet 3 in standalone mode with various versions of DWC, so if you don't feel ready to try DWC 2.1.2 yet you can stay with 2.0.7 or even 1.22.6. Many Duet 2 owners are already running RRF 3.01-RC versions.
The only reason that RRF 3.01 stable hasn't been released yet is that I wanted to finalise changes to the communication mechanism between RRF and DSF, which has meant waiting for some major changes to DSF to be implemented and to settle down. With the release of DSF 1.3.0 and now 1.3.2, that is a step closer.
I realise that some users find the change from RRF2 to (standalone) RRF3 painful. We had to make the major changes between the RRF2 and RRF3 configuration mechanisms both to support the new architecture of Duet 3 and because the RRF2 architecture had become too limiting. Specifically, the fixed pin allocations in RRF2 had become problematic for many users, the kludge of "virtual heaters" had passed its best-before date, the status response returned to DWC was getting too large, and the solution to many of the features that users were asking us for was to implement the object model and provide access to it.
I'm sorry that users of Duet 3 + SBC are having to wait longer than we hoped to get access to all the features now provided by RRF 3 in standalone mode, notably conditional GCode. However, DSF 1.3.x + RRF 3.01 RC6/7 now provide the necessary foundation for conditional GCode in DSF, and implementation of the conditional GCode processor in DSF has now started.
-
RRF3 in standalone may be far more mature than I assumed it to be by observing the forums. Sorry for that assumption.
But RRF2 is, IMO, by no means fully-mature. There are some nearly critical bugs that have yet to be solved (such as this networking issue that I have been trying to document).
I also am receiving lots of M122 responses with "error status: 10" and "error status: 18" but which don't seem to be affecting performance.
I'm sure there are also more bugs to be found, some likely very critical too. I wish dearly that I was as talented as you in the areas of coding and the logic behind the firmware. I would very much volunteer my time to analyzing the firmware for bugs and correct behaviour.
Speaking of correct behaviour, I feel that there have been recent changes made to fundamental behaviour that have not been tested enough, and may be contributing to less-than-desirable results that users are still figuring out.
One such change that I feel was implemented hastily recently, and has not undergone enough testing is this one from 2.02:
Fixed behaviour when moves call for extrusion amounts smaller than one microstep
Another change was the recent removal of quad/octal fallback when step generation was approaching the limits. This wasn't even documented anywhere except by you casually on the forums, I believe.
Until a release of RRF2 has gone through at least a year of "community testing," I would expect there to be a need to have prompt response to bug reports and addressing behaviour that is sub-optimal. Not necessarily by you (dc42) directly, but definitely with your coordination, since I doubt anyone is as familiar with the firmware as you, at this point.
The solid rock that I wish RRF2 to be, is for the behaviour of it to be documented, well tested and understood by its users. I think it's a great chance to create a LTS version of RRF, so that RRF3 can play with whatever it wishes to, while users requiring reliability and predictability can stay with RRF2.
-
I wanted to respond to @Danal , @bot, @gtj0 and others who have expressed your frustrations with how we are both conducting and communicating firmware and software development.
Firstly thanks for all your input, its genuinely valued.
Without getting bogged down in the history its is correct I, and the rest of the team, saw RRF and DSF as two separate entities. This has then manifested itself in user experience that was at times illogical or awkward. It also allowed features to be deployed with a large gap in time between when there were implemented in RRF and DSF, causing further frustration. We spent this afternoon discussing this, confirming that it was an issue, and what we can do to resolve it. @Danal we used your M999 resetting DCS pull request as an example of the blind spot and we now plan to incorporate the change shortly.
Not all of this is fixable overnight. Aspects such as making networking changes to the Pi via gcode have security implications we want to think through before deciding how to implement them. Christian is working hard on closing the feature gap (and thanks to those who are helping with testing).
I also want to reiterate the point made by David that RRF 2 is not abandoned for those that want to use it as a LTS version while RRF 3 is developed further. We will continue to fix bugs found in RRF 2 (@bot David has the networking issue on his list).
Thanks all once again for the feedback, it has triggered us to look at how we coordinate, package, publish and communicate our software and firmware releases going forward.
-
@T3P3Tony said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
We will continue to fix bugs found in RRF 2
On this note, the build instructions say the master branch of RRF is the current RRF2; but it lags the 2.05.1 tag. Similar confusion around the v3-dev branch and the 3.01-RC7 tag.
All thumbs up on the rest of the post.
-
@T3P3Tony Thanks for listening!
-
@T3P3Tony said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
Not all of this is fixable overnight.
No one expects everything to be resolved overnight. This weekend would be soon enough for most of us, I think.
-
Thank you for that response!
I hope I did not come across as too demanding or anything. I'm perfectly happy with the work you all have done and continue to do. You have gone above and beyond what would be expected of you.
I'm excited to see the future of RRF/Duet, and will be happy to continue to support your products no matter the direction you choose. After all, you always seem to end up in the right place.
The responsiveness and methodologies of the duet team, and in particular dc42's contributions, are what drove me to the Duet ecosystem in the first place -- way back to Duet 0.8.5, when he was (I think) working on RRF of his own volition, with no financial interest tied to the Duet 0.8.5 boards.
Please, keep up the good work and don't take my comments as anything but hopeful encouragement.
-
Thanks for considering these topics. Timing "is what it is" with changes this big, that's understood. Thanks for taking a look at some of the fundamental directions.