Crashes during printing - "SPI connection has been reset".
-
@jbjhjm Those log settings are far from ideal and in fact I got the same symptoms with persistent logging and long prints as well. The reason is that systemd flushes lots and lots of messages to the SD card in regular intervals, which probably stalls IO access and/or the stdout line at some point (due to the massive amount of log messages; more than 3x the regular G-code file length per print). Either reset LogLevel back to "info" or change the journald storage to "volatile".
When DCS becomes unresponsive at some point (probably during a full SPI transfer and longer than 500ms), RRF thinks the Pi lost communication so it invalidates everything.
If the resets persist with the standard log level, try out a different SD card and/or reduce IO load on the Pi as far as possible.
-
@chrishamm oh dear, thanks for the warning. Will revert the log settings to be more lightweight.
I did only tweak these yesterday, so all crashes until now happened with standard log settings.
I'm using the SD card shipped with the 6HC, but can try to get a different one.So your guess is the error appears because the Pi is too busy to respond to RRF in time?
Besides the RRF communication it handles streaming camera data. I can try to lower fps/resolution. -
@jbjhjm Yes, I think so. If the logging provider hangs during SPI transfers, it's likely to reset the connection state.
-
Experienced a crash again, this time it's been different though.
Duet suddenly stated the print was 100% done. No other errors.I was not able to do a M115 / M122 this time, but the Raspi still has persistent logging enabled.
I will check these logs later and see if anything useful is in there.One thing making me suspicious is I'm having a terrible lot of network disconnects.
Whenever a print fails, DWC is offline too, and outside of erronous situations DWC/Webcam sometimes reacts very slow too.I'll do some investigation on how to monitor CPU and network load on the Pi.
-
@jbjhjm its also worth ensuring you are giving the Pi enough voltage. look for undervoltage events in the logs
-
@t3p3tony thank you, will do!
I applied a number of changes today, let's see how they work out:
- using 3.4.0b5 now
- disabled logging as suggested by @chrishamm - my yesterday's print logged a whopping 1.5 GB.
- reduced webcam resolution
- installed htop to track CPU/Mem performance
- [external] modified my wifi setup; a wifi repeater was causing network performance issues. It would not surprise me if it also affected DWC / pi performance
htop stats say that 70-90 % of CPU load is caused by a chromium process. Unfortunately chromium always runs many parallel processes so it is difficult to investigate what this is actually doing.
-
@jbjhjm if you are not running DWC local to the Pi then you can obviously not run chromium at all. If you are running DWC on that Pi then see what its at with only DWC open
-
@t3p3tony no voltage issues reported so far by vcgencmd.
I'm not sure I understand what you meant with your last comment.DWC is running on the Pi (at least as far as I understand Duet's SPC mode, all that is handled by the Pi while the mainboard only handles printing and reports back values?).
DWC also is the only opened chromium tab.
Nontheless chromium runs a bunch of different processes.
In Time/CPU columns you can see though that there is just one chromium process that uses lots of CPU time. -
@jbjhjm I mean are you running the Pi headless and connecting via a network interface on the Pi to the webserver, or do you have a screen connected to a pi and running DWC in a browser on the Pi?
-
@t3p3tony ah now I get you. It's both, the pi has a permanently attached screen, and I often access DWC through network too for more complex tasks and if I'm not in the same room.
-
ok it seems that the pi's network connection has again crashed just a few minutes ago; it's still listed in the routers active devices list, responds to pings, but DWC does not load anymore. The print is still being executed though.
So I checked what happened on the touch panel: Chrome displayed a white page + a note that it has crashed and if it should reload.
Now this is weird: I dismissed that message, exited fullscreen and then saw another instance of Chrome running below the crashed instance!
I have no clue why it is there. I did not tweak the startup routine provided by duetPi.
After closing the crashed chrome window, it seems that the network connection was recreated too...
Nothing really useful in journal (re-enabled logging hoping to hunt down networking issues). Just way too many network connection losses and reconnects. This is related to the bad wifi that I still have to improve. Disallowed auto-switching frequency bands and 2.4/5Ghz, hopefully that will make the connection a bit more robust. -
@jbjhjm I hope @chrishamm has some ideas about what to look for in the logs as a cause of this.
-
@t3p3tony when my next print is completed I'll do a full restart and check chrome status right after, if two instances are running and such.
If someone can point me into the right direction for finding the duetPi startup script, I'll check if there's anything unusual going on.Attached bootlog.txt by the way.
Don't know enough about raspi + linux to spot anything useful unfortunately. -
The duplicate chromium seems to be related to beta5.
Just did a full restart and the screen showed a crashed chromium window right away.
This has never happened before so I'm quite sure it has to do with beta5.
Opened a but report to discuss this separately.
https://forum.duet3d.com/topic/25542/3-4-b5-bug-chromium-crashes-on-startup-sbc -
@jbjhjm its not really crashed as such as I outlined in the other thread, rather its showing that chrome was not shutdown properly. I will leave discussion of that to the tother thread, but the huge number of chrome tasks is unusual and I am not seeing those.
-
@t3p3tony chrome shutdown/crash is fixed by the solution proposed in other topic!
About the number of processes, that's my fault. I just noticed that htop was showing not only processes but every thread too. I do still see a dozen processes but that is not unusual for chrome. -
@jbjhjm ok, so we still need to see if you have SPI disconnect errors now.
-
@t3p3tony I will let you know if anything new occurs. Maybe b5 and the tweaked raspi settings helped to make it go away. As the error did not occur often in the past, I'll continue and observe for some days.