Expansion board 1 stopped sending status
-
Hi, I have a problem with my EXP3HC expansion board.
Everything works as it should until the error message "Error: Expansion board 1 stopped sending status" appears. I am running a MB6HC v1.01 (3.5.2) with an EXP3HC v1.01 (3.5.2). This happens randomly and I do not understand why. Sometimes everything works all day and then there are days when I lose the connection every few minutes. It seems that the System load does not matter since it happened in idle and mid print. I am a bit lost since I can’t find any cause for this and I hope somebody knows what’s going on. I have already searched for similar problems but found nothing.M122 after the connection loss:
M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.5.2 (2024-06-11 17:13:58) running on Duet 3 MB6HC v1.01 (standalone mode) Board ID: 08DJM-956BA-NA3TJ-6JTDG-3SS6M-TUBGV Used output buffers: 3 of 40 (30 max) === RTOS === Static ram: 155360 Dynamic ram: 122744 of which 0 recycled Never used RAM 64960, free system stack 138 words Tasks: NETWORK(1,ready,38.7%,161) ETHERNET(5,nWait 7,0.1%,316) HEAT(3,nWait 6,0.0%,321) Move(4,nWait 6,0.0%,211) CanReceiv(6,nWait 1,0.0%,796) CanSender(5,nWait 7,0.0%,329) CanClock(7,delaying,0.0%,348) TMC(4,nWait 6,9.6%,55) MAIN(1,running,51.5%,444) IDLE(0,ready,0.0%,29), total 100.0% Owned mutexes: === Platform === Last reset 00:05:57 ago, cause: software Last software reset at 2024-10-27 17:56, reason: User, Gcodes spinning, available RAM 64448, slot 2 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0044a000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 MCU temperature: min 38.2, current 38.3, max 55.8 Supply voltage: min 23.6, current 23.7, max 23.8, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 11.9, current 11.9, max 12.0, under voltage events: 0 Heap OK, handles allocated/used 99/0, heap memory allocated/used/recyclable 2048/32/32, gc cycles 0 Events: 1 queued, 1 completed Driver 0: standstill, SG min n/a, mspos 1016, reads 65457, writes 16 timeouts 0 Driver 1: standstill, SG min n/a, mspos 1016, reads 65457, writes 16 timeouts 0 Driver 2: standstill, SG min 0, mspos 904, reads 65455, writes 19 timeouts 0 Driver 3: standstill, SG min n/a, mspos 8, reads 65463, writes 11 timeouts 0 Driver 4: standstill, SG min n/a, mspos 72, reads 65460, writes 14 timeouts 0 Driver 5: standstill, SG min n/a, mspos 632, reads 65458, writes 16 timeouts 0 Date/time: 2024-10-27 18:02:46 Slowest loop: 6.00ms; fastest: 0.07ms === Storage === Free file entries: 20 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 4.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 3, maxWait 208522ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 no step interrupt scheduled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 7, completed 7, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === DDARing 1 === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters 4 5 -1 -1, ordering errs 0 === GCodes === Movement locks held by null, null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000008 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0000000 Code queue 1 is empty === CAN === Messages queued 3267, received 5792, lost 0, errs 1, boc 0 Longest wait 1ms for reply type 6018, peak Tx sync delay 371, free buffers 50 (min 49), ts 1788/1787/0 Tx timeouts 0,0,0,0,0,0 === Network === Slowest loop: 13.78ms; fastest: 0.03ms Responder states: MQTT(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 1 of 8 = Ethernet = Interface state: active Error counts: 0 0 0 1 0 0 Socket states: 5 2 2 2 2 0 0 0 === Multicast handler === Responder is inactive, messages received 0, responses 0
M122 B1 Diagnostics for board 1: Duet EXP3HC rev 1.01 or earlier firmware version 3.5.2 (2024-06-10 13:24:31) Bootloader ID: SAME5x bootloader version 2.11 (2024-08-09) All averaging filters OK Never used RAM 155752, free system stack 178 words Tasks: Move(3,nWait 7,0.0%,145) HEAT(2,nWait 6,0.0%,89) CanAsync(5,nWait 4,0.0%,62) CanRecv(3,nWait 1,0.0%,73) CanClock(5,nWait 1,0.0%,63) TMC(2,nWait 6,6.8%,59) MAIN(1,running,91.7%,283) IDLE(0,ready,0.0%,39) AIN(2,delaying,1.4%,259), total 100.0% Owned mutexes: Last reset 00:16:09 ago, cause: software Last software reset at 2024-09-06 14:55, reason: HardFault imprec, available RAM 155896, slot 0 Software reset code 0x0060 HFSR 0x40000000 CFSR 0x00000400 ICSR 0x00430803 BFAR 0xe000ed38 SP 0x20004e58 Task HEAT Freestk 156 ok Stack: 00000000 002c0000 20000960 00000000 2000097c 20000980 0002607a 01006000 200056ca 00000001 200056d2 0004f876 00000001 00029ccd 00000000 0002943b 3e178897 ffffffff 00000000 20004b68 20003830 20003780 20003780 00000000 20004f18 0002606f 20004f18 Driver 0: pos -784072, 1600.0 steps/mm, standstill, SG min 0, mspos 744, reads 28317, writes 19 timeouts 0, steps req 0 done 407669 Driver 1: pos -784072, 1600.0 steps/mm, standstill, SG min 0, mspos 440, reads 28318, writes 19 timeouts 0, steps req 0 done 408422 Driver 2: pos -784072, 1600.0 steps/mm, standstill, SG min 0, mspos 232, reads 28319, writes 19 timeouts 0, steps req 0 done 406929 Moves scheduled 4, completed 4, in progress 0, hiccups 0, segs 6, step errors 0, maxLate 0 maxPrep 21, maxOverdue 0, maxInc 0, mcErrs 0, gcmErrs 0, ebfmin 0.00 max 0.00 Peak sync jitter -1/10, peak Rx sync delay 186, resyncs 0/0, no timer interrupt scheduled VIN voltage: min 24.0, current 24.0, max 24.1 V12 voltage: min 12.2, current 12.2, max 12.2 MCU temperature: min 31.8C, current 32.1C, max 33.2C Last sensors broadcast 0x00000070 found 3 112 ticks ago, 0 ordering errs, loop time 0 CAN messages queued 13921, send timeouts 109, received 7768, lost 0, errs 311701, boc 0, free buffers 38, min 38, error reg 110000 Last cancelled message type 4519 dest 0 dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 405, adv 36990/37082
thanks in advance
-
@chris94 said in Expansion board 1 stopped sending status:
The M122 reports this as the reason for the crash, on the 3HC:
reason: HardFault imprec
The main reason we see these is due to static discharge causing memory corruption. This is generally caused by insufficient grounding of hot end and extruder parts, and stepper motors. See https://docs.duet3d.com/en/User_manual/Connecting_hardware/Power_wiring#grounding
Ian
-
@chris94 from the console output it is clear that an Emergency Stop was commanded. The "Failed to switch off remote heater" and "Expansion board stopped sending status" messages came slightly later and were consequences of the Emergency Stop.
The Hard Fault that @droftarts refers to occurred over a month ago so it not related to tis event.
So the question is: what commanded the Emergency Stop? Here are some possible reasons:
- M112 was encountered in the GCode file, or was received through an input channel e.g. USB or Telnet (if enabled)
- STOP button pressed on an attached PanelDue
- The wires connecting PanelDue are picking up interference (e.g. from nearby stepper motors wires) and you have not enabled CRC protection in the corresponding M575 command in config.g
- Emergency Stop button pressed in DWC
HTH David
-
@dc42 said in Expansion board 1 stopped sending status:
The Hard Fault that @droftarts refers to occurred over a month ago
Good point!
Ian
-
Thanks for the quick answers.
@droftarts I am pretty sure that everything is grounded but I am not a professional.
@dc42 I don't think the M112 comes from a G-code or macro as it also occurs when the printer is in idle. I am sitting 2m away from my machine and I was alone in this room so I would say that no EMO button was pressed.
The PanelDue uses a ~1m long unshielded cable so I guess I could start there. Is there a way to check my system for interference?
I do not really understand the difference between the CRC modes but I am currently running my M575 with P1 S1 B57600(Great learning experience for me)
Chris -
@chris94 What version of the PanelDue firmware is it running? Check on the 'Setup' screen. S1 sets either checksum or CRC, depending on the PanelDue firmware, CRC is set on PD firmware from v3.4.1 and later.
Do you have an emergency stop button? If so, how is that wired? If it's NO (normally open) as most of them are, that could also be picking up interference.
Ian
-
@droftarts I have to admit (full of shame) I have never checked my PD firmware and it is still on 3.2.11 but I will update it as soon as possible.
I do have an emergency stop button and it is normally open. -
@chris94 Try disabling the emergency stop button (comment out the trigger in config.g), and/or updating the PanelDue (see https://docs.duet3d.com/User_manual/RepRapFirmware/Updating_PanelDue).
If that seems to work stably, re-enable the emergency stop, but check the wiring, move it as far as possibly from the stepper motor and heater wires.
Ian
-
@droftarts Thanks for the help.
I am already working on the update and I will disable the EMO after that. It will probably take a few days before I can reliably say whether this solves the problem or not. Sometimes everything runs without problems for days, but I will let you know as soon as I know more. -
@chris94 It's also possible that any housing you have on the PanelDue is moving just enough (usually with temperature changes) to put pressure on the PandelDue Emergency stop. Users have had that problem in the past, too.
Ian
-
@chris94 when you've updated your PanelDue firmware, also change the M575 P1 command in config.g from S1 to S4. This ensures that RRF will only recognise a valid CRC from PaneDue. With S1 it recognises either a valid CRC (which is good protection against interference) or a valid checksum (which is not).
-
Thanks again for the help
@droftarts I have a clearance gap of 0.5mm between the display and the case. However, I should be able to hear the display beep when something triggers the display and I did not hear anything.
@dc42 PD update is done and I’ve changed my config as you suggested.
The EMO trigger is still active, but I changed the trigger G-code to just display a message and play a sound. I'll let the printer idle for the rest of the day to see if anything happens. If nothing happens today, I will run a test print tomorrow if I have time. Rerouting the EMO cable is on my to-do list this evening.
I guess it's time to wait and see what happens. -
OK, the two testing days are over and the title of this topic is probably wrong. I thought that the emergency stop was caused by the loss of connection but now I have learned that it was the other way around.
It looks like all of my NO switches are affected by this problem. I can't say why, but I suspect that my printer is not directly the cause of the switch activations because I noticed that the problems almost always occur around the same time.
If I understand this correctly and I replace all NO switches with NC switches, the problem should be solved right? The only problem that would remain is that my Z-Probe cannot be replaced by an NC switch because I am using a CS067A-HT1 from Metrol, which is very expensive. So, I guess I also have to find the cause for this.
The two possible causes I can imagine are either interference from other devices or main power problems. Unfortunately, I don't know how to find out where it comes from. Can I even figure this out by myself and if so, how? -
@chris94 said in Expansion board 1 stopped sending status:
It looks like all of my NO switches are affected by this problem.
Generally, endstops are only checked while homing, and the rest of the time they are ignored. Interference on these shouldn't be causing emergency stops or glitches, because they are not checked, unless you have them set to trigger for 'over-travel' events.
@chris94 said in Expansion board 1 stopped sending status:
I can't say why, but I suspect that my printer is not directly the cause of the switch activations because I noticed that the problems almost always occur around the same time.
If it is at a specific time of day, I'd guess it's an electrically-noisy heating or air conditioning pump turning on, something like that. You may get around this by putting a surge protector on the mains input, as the interference seems to be external to the machine. What time of day does it cause the switch activations? See this post, which had an issue with a motor causing electrical interference (though the noisy motor was actually part of the machine): https://forum.duet3d.com/post/346470
@chris94 said in Expansion board 1 stopped sending status:
The only problem that would remain is that my Z-Probe cannot be replaced by an NC switch because I am using a CS067A-HT1 from Metrol, which is very expensive.
I doubt you're using the probe during the print, and it seemed that it was the emergency stop that was causing the issue because it is specifically monitored all of the time. So I wouldn't worry too much about this. If you think it is causing issues, you could remove the configuration for it from config.g, and only configure it within macros that need to use it, and un-configure it at the end of the macro.
Ian
-
@droftarts said in Expansion board 1 stopped sending status:
Generally, endstops are only checked while homing, and the rest of the time they are ignored. Interference on these shouldn't be causing emergency stops or glitches, because they are not checked, unless you have them set to trigger for 'over-travel' events.
Yes I do have one as an over-travel safety endstop because I crashed the axis few time and that’s not really healthy on a double spindle driven axis. If I had been smart I would have installed it as an NC
I doubt you're using the probe during the print, and it seemed that it was the emergency stop that was causing the issue because it is specifically monitored all of the time. So I wouldn't worry too much about this. If you think it is causing issues, you could remove the configuration for it from config.g, and only configure it within macros that need to use it, and un-configure it at the end of the macro.
Thanks for the Idea. I will take a look at it.
If it is at a specific time of day, I'd guess it's an electrically-noisy heating or air conditioning pump turning on, something like that. You may get around this by putting a surge protector on the mains input, as the interference seems to be external to the machine. What time of day does it cause the switch activations?
Mostly it occurs at 6 o'clock in the evening but there have also been days when it has happened 3 times in a row in the afternoon. Unfortunately, my neighbours aren't very helpful when it comes to topics like this. For example, one of my neighbours doors has been squeaking for 2 years and they don't care (so I'm on my own). The surge protector is already on my list, but I still need to find a place to put it.
I think that the issue is solved for now and if the problem occurs again after I have implemented everything that was discussed here, I will reach out again. Thanks again for the help and for the ideas from both of you.
-
-