Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.
-
Also, as regards how easy to "reproduce in stand-alone", I don't have any ethernet that will reach. I am not the first owner of this house, and there is not an inch of Cat anything in it (except about 1 foot (1/2 meter) between the cable modem and the main wireless router). I'm also not real motivated to make a special SD card, run special ether, etc, etc, to run the printer in a mode which I will literally never run. It goes back to DSF and RRF being layers of the same thing, and they should be supported that way.
-
OK, rant over. Sort of.
-
And here I am, just sad that RRF 2 isn't getting the attention it needs and deserves! We need an LTS team!
-
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I agree. From a user's perspective, DSF and RRF are the same thing.
I had my sarcastic rant earlier when I quoted DC42's statement that RRF 3.01-RC6 has no bugs.
I agree, I too am not interested if any particular bug is within the RRF, DCS, or DSF. They should all work together as one whole package.
The issue I have is that like you I love the idea of Duet 3 with SBC but there are many problems at the minute and all I seem to get is "It works ok in standalone mode".
If I wanted "stand alone" I would not have purchased RRF 3 and Raspberry Pi 4!
Rant over!
-
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
and with sudo no less!
this was a topic way back when, sort of intermingled with permissions on the /opt/dsf/sd folder and /dev/spi nodes and the priority was to get it working first, then revisit.
as such i didn't poke in great detail, but as access to the spi node can be solved by group permissions, listening to port 80 (or any port below 1024) sounds like the last hurdle. the easy woraround would be nginx and a reverse proxy which would also ease setting up ssl with sometihng like letsencrypt (even if not exposed to the internet)
There are larger issues to deal with first i guess - but I will say the state of the supporting firmware and software has not been clearly communicated following the release of the hardware.
I believe I in August said I expected to run RRF2 as the stable version for 6-12 months, and the unfortunate truth of it is that with the limited team developing RRF3 + DSF they need the depend on the community for testing to stand a chance at getting ready for main stream use in such a short timeframe.
At the end of the day its up to the user to choose something tried and true, or accept that early adoption comes with a price tag in more than one sense.
-
@gtj0 said
Don't get me started.
Okay, sorry I mentioned it.
@Danal said
OK, rant over. Sort of.
Okay, sorry, won't mention it again! We appreciate all your support!
@chas2706 said
Rant over!
No, really, I'm sorry for suggesting it, I'll never say it again!
@bearer said
At the end of the day its up to the user to choose something tried and true, or accept that early adoption comes with a price tag in more than one sense.
I agree. It's just taking time to get DSF (which is pretty much brand new) up to speed with the rest of the firmware (painstakingly developed over many years). But without community interest and expertise getting it working, reporting bugs and fixing, it will take much longer. So once again, thank you all for your continued support.
Ian
-
@droftarts said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I agree. It's just taking time to get DSF (which is pretty much brand new) up to speed with the rest of the firmware
The activity shown on GitHub regards DSF says it all.
-
@gtj0 said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
It's not always easy to just "run in standalone mode". In my case, I have to remove the covers from the printer to get to the sd card and remove the cable between the Duet and the SBC.
I find that I can switch between standalone and SD mode just by inserting the SD card or not, without removing the SBC cable.
-
@chas2706 said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I had my sarcastic rant earlier when I quoted DC42's statement that RRF 3.01-RC6 has no bugs.
I was sure I typed "no known bugs" when I composed the message, however I composed it on a smartphone and somehow the "known" got lost.
There are now some known bugs in RRF 3.01-RC6 so we are preparing to release 3.01-RC7 along with updated DSF and DWC. See https://github.com/dc42/RepRapFirmware/blob/v3-dev/WHATS_NEW_RRF3.md for the changes to RRF.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
At the end of the day its up to the user to choose something tried and true, or accept that early adoption comes with a price tag in more than one sense.
Disconnects in the way that RRF vs DSF are being handled by Duet the company are equally applicable to the 'full' releases.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
and with sudo no less!
this was a topic way back when, sort of intermingled with permissions on the /opt/dsf/sd folder and /dev/spi nodes and the priority was to get it working first, then revisit.
as such i didn't poke in great detail, but as access to the spi node can be solved by group permissions, listening to port 80 (or any port below 1024) sounds like the last hurdle. the easy woraround would be nginx and a reverse proxy which would also ease setting up ssl with sometihng like letsencrypt (even if not exposed to the internet)
I withdraw my comment regarding sudo. It diverted attention from the real issue: What is the GCODE to restart the system ?
-
@droftarts said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I agree. It's just taking time to get DSF (which is pretty much brand new) up to speed with the rest of the firmware (painstakingly developed over many years). But without community interest and expertise getting it working, reporting bugs and fixing, it will take much longer. So once again, thank you all for your continued support.
This completely misses the points being discussed by at least three or four vocal users. It is a powerful indication of the "blind spot" within Duet that causes me to invest the energy in typing these responses:
Any rational person would expect a new major section of software to climb a maturity curve. Totally agree with you on that. And introducing major new architecture and function in V3.x, I believe we all expect it to take time to stabilize. Regardless of where it runs or what tech stack it uses or... it will just take time, testing, feedback, and improvement. Agreed, D'accord.
All of that has nothing to do with the pervasive attitude that RRF and DSF are two separate things. From the viewpoint of the end user they are one thing with defined external interactions (Gcode, Web API, etc).
There are numerous examples of this mis-perception. All of which are seriously complicating the ability to build, deploy, test with the community, upgrade, downgrade, and generally "deal with" the product that fits under the general header of Duet V3. And the folks at Duet appear to be unable to see or acknowledge this is even happening.
When I said "sort of." above, this is what I meant. I don't believe I am ranting anymore; yet there is still more to discuss. I sincerely hope these strong words are read with the intent they are written: To help Duet get better.
-
@droftarts said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I agree. It's just taking time to get DSF (which is pretty much brand new) up to speed with the rest of the firmware (painstakingly developed over many years). But without community interest and expertise getting it working, reporting bugs and fixing, it will take much longer. So once again, thank you all for your continued support.
Sadly, RRF3 (Duet3 standalone) is significantly more functional than RRF3 (Duet3+SBC.) As long as this is the case, people (such as myself) will use (and test) the standalone code and ignore the SBC/DSF/etc.
For example, do a search for threads where people request PanelDue working properly (for file commands) with a SBC attached to the duet. I've seen "it's easy", "will add that soon" and "it's already there, just need the duet to send the command." Yet.. it hasn't happened. Until that "easy", "will be added soon" and "is already there" bit of functionality works, I won't attach the ribbon cable to my RPi4. (I don't have a computer near my printer, and I won't start a print or macro unless I'm standing near the printer.)
How about the conditional gcode stuff? Is that working in SBC mode yet?
The lack of SBC functionality isn't about community interest, reporting bugs, etc. It's about getting DSF up to a more usable state. I'd happily attach my RPi4 to my duet3 board if I could get a similar level of (even untested) functionality - assuming, of course, I'd have reasonable expectations of bugs getting fixed as fast as dc42 fixes RRF3 bugs. (Which is another gripe: There have been long spans of time where DSF has gone untouched while RRF3 has been moving along.)
I just think it's important to get the ordering of "cause" and "result" correct. The lack of community is a result of lack of development. Not the other way around.
Edit: Just to be clear, I'm not really complaining. I'm happy using my duet3 in stand-alone mode while things move along. However, don't imply that I'm part of the reason that DSF (collectively used to mean all the duet s/w running on the SBC) is lagging so far behind RRF3.
-
I'll just chime in with one more thing:
When Duet 2 was being developed, near the beginning, it was much like this (except not separated so much -- only DWC and RRF, with the original wifi server or whatever too).
I waited very patiently until the code was mature. I bought several Duet 2 boards (before they were called Duet 2) immediately in their infancy. However, RRF was not at a point that it could really be used for what I wanted to do (IDEX printer).
I just waited! I worked on my own stuff and waited. I felt this was fine. I didn't feel I was owed anything by the developers. If anything, I was super gracious that the developers were working so hard on the code to make it work.
Finally, RRF2 got to a point where it was complete enough and reliable enough to use! Hallelujah!
Then, immediately, all the developers decided to abandon RRF2 in favour of RRF3! RRF2 is not as stable a rock as we think it is, but the developers are going full-bore restarting the "wait and see" cycle for RRF3 users.
What about us RRF2 users? Why abandon that so abruptly?
We need a team that is still working on RRF2, while RRF3 is developed! I don't think RRF2 can or should be left in the state it is in.
-
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
All of that has nothing to do with the pervasive attitude that RRF and DSF are two separate things. From the viewpoint of the end user they are one thing with defined external interactions (Gcode, Web API, etc).
Surely not ideal; but RRF and DWC were two separate things before DSF as well, just more mature, and I don't see any reason to suspect it will not return to that state. I'm also pretty sure its on the Duet3d agenda to unify things as much as possible - the new duet3d github is probably a sign of things to come. As such given maturity who wrote what or who supports what will matter less when the resources to develop and support have a workload thats more matched to the capability.
Meanwhile the user can choose how to deal with.
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I withdraw my comment regarding sudo. It diverted attention from the real issue: What is the GCODE to restart the system ?
This is also as far as I recall a conscious decision to limit gcode's ability to affect the Pi on a system level . At least M550, M552 and and a few others was at least a topic. And to some degree it boils down to lacking SUDO M997 gcode.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
This is also as far as I recall a conscious decision to limit gcode's ability to affect the Pi on a system level. At least M550, M552 and and a few others was at least a topic.
I don't doubt that was a conscious design decision. It fits very firmly with the blind spot that Duet sees these two as somehow separate or different. Again, gcode in one end, movement out the other. Gcode configuration codes in one end, immediate effect on the running device out the other.
Given the statements in the image below, gcodes should "affect the Pi on the system level". M550 (set name) absolutely should set the name of the Duet system network interface, i.e. the Pi itself. Same with 552 (set IP address). And many more such codes.
Or, is Duet explicitly changing the philosophies stated below? Particularly "All settings are done through G-Code"?
-
@bot said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
What about us RRF2 users? Why abandon that so abruptly?
We need a team that is still working on RRF2, while RRF3 is developed! I don't think RRF2 can or should be left in the state it is in.What do you perceive that state to be, and why the reference to "while RRF3 is being developed"? We get few complaints these days of RRF 2.05.1 not doing what it is supposed to do. RRF3 core development was completed months ago, to beyond the point where it provided all the features available in RRF2. Nor did I abandon RRF 2 users: I did the RRF 2.05.1 release when I found important bugs in RRF 2.05. But it no longer makes sense to add new features to RRF 2.
If you look at the bug fix lists in the release notes for the last several RRF 3.01-RC releases, you will see that they are almost all either related to new features in RRF3 that Duet 2 and Duet 3 standalone users don't have to use, or are fixes for minor bugs (some of which are also present in RRF 2). [The exceptions were a couple of new bugs in the 12864 display code for the Duet Maestro.] So RRF 3.01-RC is very nearly as stable and reliable as RRF 2.05.1 even though it provides a lot more functionality. You can run RRF 3.01-RC7 on a Duet 2 or Duet 3 in standalone mode with various versions of DWC, so if you don't feel ready to try DWC 2.1.2 yet you can stay with 2.0.7 or even 1.22.6. Many Duet 2 owners are already running RRF 3.01-RC versions.
The only reason that RRF 3.01 stable hasn't been released yet is that I wanted to finalise changes to the communication mechanism between RRF and DSF, which has meant waiting for some major changes to DSF to be implemented and to settle down. With the release of DSF 1.3.0 and now 1.3.2, that is a step closer.
I realise that some users find the change from RRF2 to (standalone) RRF3 painful. We had to make the major changes between the RRF2 and RRF3 configuration mechanisms both to support the new architecture of Duet 3 and because the RRF2 architecture had become too limiting. Specifically, the fixed pin allocations in RRF2 had become problematic for many users, the kludge of "virtual heaters" had passed its best-before date, the status response returned to DWC was getting too large, and the solution to many of the features that users were asking us for was to implement the object model and provide access to it.
I'm sorry that users of Duet 3 + SBC are having to wait longer than we hoped to get access to all the features now provided by RRF 3 in standalone mode, notably conditional GCode. However, DSF 1.3.x + RRF 3.01 RC6/7 now provide the necessary foundation for conditional GCode in DSF, and implementation of the conditional GCode processor in DSF has now started.
-
RRF3 in standalone may be far more mature than I assumed it to be by observing the forums. Sorry for that assumption.
But RRF2 is, IMO, by no means fully-mature. There are some nearly critical bugs that have yet to be solved (such as this networking issue that I have been trying to document).
I also am receiving lots of M122 responses with "error status: 10" and "error status: 18" but which don't seem to be affecting performance.
I'm sure there are also more bugs to be found, some likely very critical too. I wish dearly that I was as talented as you in the areas of coding and the logic behind the firmware. I would very much volunteer my time to analyzing the firmware for bugs and correct behaviour.
Speaking of correct behaviour, I feel that there have been recent changes made to fundamental behaviour that have not been tested enough, and may be contributing to less-than-desirable results that users are still figuring out.
One such change that I feel was implemented hastily recently, and has not undergone enough testing is this one from 2.02:
Fixed behaviour when moves call for extrusion amounts smaller than one microstep
Another change was the recent removal of quad/octal fallback when step generation was approaching the limits. This wasn't even documented anywhere except by you casually on the forums, I believe.
Until a release of RRF2 has gone through at least a year of "community testing," I would expect there to be a need to have prompt response to bug reports and addressing behaviour that is sub-optimal. Not necessarily by you (dc42) directly, but definitely with your coordination, since I doubt anyone is as familiar with the firmware as you, at this point.
The solid rock that I wish RRF2 to be, is for the behaviour of it to be documented, well tested and understood by its users. I think it's a great chance to create a LTS version of RRF, so that RRF3 can play with whatever it wishes to, while users requiring reliability and predictability can stay with RRF2.
-
I wanted to respond to @Danal , @bot, @gtj0 and others who have expressed your frustrations with how we are both conducting and communicating firmware and software development.
Firstly thanks for all your input, its genuinely valued.
Without getting bogged down in the history its is correct I, and the rest of the team, saw RRF and DSF as two separate entities. This has then manifested itself in user experience that was at times illogical or awkward. It also allowed features to be deployed with a large gap in time between when there were implemented in RRF and DSF, causing further frustration. We spent this afternoon discussing this, confirming that it was an issue, and what we can do to resolve it. @Danal we used your M999 resetting DCS pull request as an example of the blind spot and we now plan to incorporate the change shortly.
Not all of this is fixable overnight. Aspects such as making networking changes to the Pi via gcode have security implications we want to think through before deciding how to implement them. Christian is working hard on closing the feature gap (and thanks to those who are helping with testing).
I also want to reiterate the point made by David that RRF 2 is not abandoned for those that want to use it as a LTS version while RRF 3 is developed further. We will continue to fix bugs found in RRF 2 (@bot David has the networking issue on his list).
Thanks all once again for the feedback, it has triggered us to look at how we coordinate, package, publish and communicate our software and firmware releases going forward.
-
@T3P3Tony said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
We will continue to fix bugs found in RRF 2
On this note, the build instructions say the master branch of RRF is the current RRF2; but it lags the 2.05.1 tag. Similar confusion around the v3-dev branch and the 3.01-RC7 tag.
All thumbs up on the rest of the post.