Any tips on debugging techniques for out of memory?

Phaedrux

Have you considered selling the Duet 2, and picking up a more memory capable Duet 3 board?

DonStauffer

@Phaedrux Yes, I've thought about it. It's time and money, mainly.

Anyway I'm almost there. I was able to print a simple job, and it's only when I went to 2 tools I had the crash, and then only when I got near the end of the job startup script, where it's most complex. The job proper has very little code supporting the NeoPixels, so once I get there I should be home free.

I may still upgrade later, but right now I need to address the challenge at hand. Almost there.

OwenD

@DonStauffer
Are you compiling the firmware yourself?
If you do, you could turn off all other kinematic types other than the one you're using and likely other functions.
This should free up some memory.

In terms of tracking the failure, you can only do what @oliof has suggested.
You'll need the source code to trace back from there. I don't know the full details or whether it will tell you which macro was running. I suspect more likely which routine in the source.

infiniteloop

@DonStauffer
You’re fighting windmills. I’ve followed your posts closely, as I’m quite familiar with the memory management of operating systems and micro controllers.

As you’ve rightly observed, you contend with other processes for the rare memory resources: any unforeseen combination of print moves, PA and I/O (SD card reads, comms with PanelDue, MFM or WiFi) can deprive you of just the last byte needed by your script - without profound knowledge of how RRF and RTOS work (i.e. studying the sources) you can’t even guess how many bytes could be available at a given time.
If your scripts barely stay within the limits: what happens if you update the RRF firmware, add a filament sensor or change something in your config.g or related macros - finally, the purpose of the Duet is to deliver good-quality prints? You’ll end up like Sisyphus, pushing the memory rock uphill again and again.
The macros use a scripting language which doesn’t allow for any kind of memory control. The interpreter is responsible for local memory management, not you. So you’re left with tricks to save the moment. Do not confuse tricks with programming. Good programmers try to avoid tricks, they look for a better algorithm instead.

@Phaedrux and @OwenD have outlined two possible solutions - switch to more capable hardware or dive into the RRF sources, maybe even modify and recompile the firmware. I’d like to add a third proposal: transfer the illumination task to a dedicated (Arduino or ESP) controller. These devices are pretty cheap, can easily be controlled by the Duet and don’t require exquisite programming skills.

For the reasons listed above, you can’t win the battle for the last byte in RAM. Time to think different: Depending on your skills and preferences, select one of the above proposals.

Edit:

but right now I need to address the challenge at hand. Almost there.

Hear Sisyphus saying: "almost there"…
Apropos challenge: For a man with a hammer, every task looks like a nail… – given all your posts on this topic for the last couple of months, you should perhaps consider to rethink your tools.

DonStauffer

@Phaedrux This seems a bit odd.

There's a macro called by daemon.g which does call a couple other macros. Not extremely intense memory usage there. The macro's purpose is to manage the LED lighting when changing tools, so you can see the old tool cooling and the new tool heating as chaser lights. I use this during my job starting script to draw a line to purge each tool, and it works fine, even thought there probably a couple dozen variables defined in the macros running at the time.

I use the same daemon.g call to the same macro during a tool change, to show the old tool cooling and the new tool heating. But my tool change script has no variables at all. It's all conditionals and hard-coded values. Of course, the same 10 variables defined at the beginning of the starting script are still there, but none of the variables in the macro calls the startup does, because the tool change doesn't do complicated stuff like drawing purge lines. Really hardly more than changing the active tool, with LEDs.

Guess where it decided to spontaneously reboot? There should be LESS memory usage there, not more! I'm baffled by this.

If I'm gradually losing memory even when it should be released, this bodes ill for the situation. I could pare down memory usage to get everything working for a short job, and think everything's fine, then run a 100 hour job and have it crash on hour 99. I'm concerned about this. If the starting script ran OK, the tool change should too, as far as I can see.

On the other hand, I could infer that ActivateTool macro (or its LED macros called from daemon.g) may be the culprit. So I could pare that down a little, but it still seems mysterious, and I'm still worried about bigger jobs, though I try to keep my tool changes down near the bed to avoid huge prime pillars.

gloomyandy

@DonStauffer You are not the only user of memory. RRF will itself allocate more memory in certain situations, most of these allocations take place during system startup and configuration, but some may happen during normal operation of the printer, so it may well be that some of the movements that take place between your start script being run and a tool change have consumed more memory. Often such allocations will not be released.

DonStauffer

@gloomyandy That's helpful to know. So, other than "have a whole mess of available memory and hope it's enough", how does one know when there's enough available memory?

Hopefully the answer isn't on a job by job basis - "if it fails, you don't have enough". That would mean a long job would have to be attempted to find out if it will have enough memory.

DonStauffer

@gloomyandy You know, I'm starting to think it's not really ever getting to the tool change. It was crashing pretty far away from it, with a whole inside of the letter "D" to print. Then I was able to kill off 2 variables, and it got a little further - into that feature it never started before. Then it crashed. So all it's executing at the time of the crashes is a bunch of G1 commands, with an occasional G92. Just normal slicer stuff. I guess RFF could have another thread running that periodically does stuff asynchronously that uses memory. But this was in the midst of a long bunch of code like this - about 5 layers worth. Not where I'd expect memory problems to crop up. Not even my code. Simplify3D.

This is a bit confusing, because the actual job file has only 8 variables in it, all in the starting part. The macro called by daemon.g has one single variable. All variables are integers. daemon.g itself has 12 variables. Then there are my globals. I could look at cutting back daemon.g's variable use. But I'm not sure how much I need. It seems like I could get this short test print working and still not know if a bigger print job would crash.

Update: I got rid of 10 of the 12 variables used by daemon.g. This didn't change anything at all. It crashed at exactly the same place. I'm not sure there's anything else I can drop. The uncertainty is the problem. That would be a problem still even with a little more memory; I'd never know when there would be a crash. The odds would just be better.

gloomyandy

@DonStauffer In some situations RRF will need to allocate more memory to allow it to process moves. What does M122 report after starting a print and before it runs out of memory?

DonStauffer

@gloomyandy I could embed an M122 in the print file and find out. How far before the failure point, or do you want more than one? I can pinpoint where the failure happens in the file and insert the command before that.

DonStauffer

@gloomyandy Interesting development:

I noticed there's a whole part of the tree of one of my globals which only ever gets used in the job starting script. So at the end of that, I set the root of that branch to null, apparently thus freeing all of the 15 integers below it (and perhaps a bit of array overhead). This appears to have gotten me past the spot it rebooted before, and I've been through several tool changes since. Need some adjustment to the tool change - red plastic shows well on black! But it's running.

And the LEDs are performing basically as intended. Also probably a little fine tuning needed there, but actually, pretty decent.

This project has taken about 15 months of full time work, actually. I really, really wanted this to work.

DonStauffer

@gloomyandy

The LEDs did a nice job of graphing heating & cooling of tools & bed, leveling & probing the bed (with error values), and showing percent progress (2 digits!) through the print. Everything to do with tool 0 was green; tool 1 was blue. They had 3 intensities, for off, standby and active. The bed is yellow. Probing has its own colors.

The LEDs handled the transition from printing to tool change and back again well. The worst thing about them is a slight lag, maybe about a second or so. So you can look at the web interface and see 60%, but the LEDs still show 58%. Acceptable.

In the process I rewrote all my routines, so they still need fine tuning as far as retraction & priming, and of course, tool change ooze. I'll get there. I did it once before.

I think I know the object model by heart by now!

dc42

@DonStauffer how about getting your macro to also echo the amount of free memory at various points? It's available in the object model.

DonStauffer

@dc42 OK, about my "knowing the object model by heart" ... except that!

boards[0].freeRam?

dc42

@DonStauffer yes that's the one. It's the same as Never Used RAM in the M122 report.

DonStauffer

@dc42 Thanks. I'll look at it, though my immediate problem seems to be solved. But undoubtedly I'll run out of memory again at some point. But I'm pretty happy now.

dc42

@DonStauffer I'd be interested to know whether you have less or more free memory after running a print with 3.6 alpha compared to 3.5.2.

DonStauffer

@dc42 I'm a little afraid to install an alpha though. Is reverting easy?

dc42

@DonStauffer assuming you are running in standalone mode not SBC mode, reverting is very easy. To make it even easier you can install just the 3.6 Duet2CombinedFirmware.bin file and continue running DWC 3.5.2. You can ignore the warning about incompatible software versions in this case.

gloomyandy

@DonStauffer Just running M122 in the console should provide an idea of what the memory situation is before the print starts and after your startup code has run.