Duet 2 Ethernet WC 3.3.0 crashes, have to reset to reconnect
-
@trobison It was a long shot. I'm really not sure what the source of the connection drops could be at this point. There's no problem with the config and we've replaced the module. Reason would say it's a network issue, but where or how is hard to determine. It's not impossible that the new module couldn't have the same issue as the old one, but we're not sure what the mechanism of failure could be.
-
@phaedrux Is there something like top in the Duet 2? I have hooked a single network cable from printer to computer, so I don't think it's anything on the network or funky broadcasts on the network. If there was, I would see this issue while the printer was idle.
It prints well, and finishes nicely, I just have difficulty looking at the web page during an active print. I typically load up all the jobs and start them via the screen now. It's not the best solution but I can see the estimated remaining times or height of Z.
I will continue to look at the wiring. All the harnesses were supplied by E3D, and they look great.
Regards, -
@trobison
Are you running any DWC plugins?
Have you posted your config.g yet? -
@phaedrux The only plugin I have run is the G-Code Viewer.
I have replaced the SD Card with and SanDisk Extreme, but no change.
The next change will be swapping out the power supply. I have a spare on the bench and I will install that in a few days.Here is my config.g file
; Configuration file for Duet WiFi (firmware version 3.3) ; executed by the firmware on start-up ; ; generated by RepRapFirmware Configuration Tool v3.3.5 on Sat Nov 13 2021 18:20:50 GMT+1100 (Australian Eastern Daylight Time) ; General preferences M111 S0 ; Debugging off G21 ; Work in millimetres M575 P1 S1 B57600 ; enable support for PanelDue G90 ; send absolute coordinates... M83 ; ...but relative extruder moves M550 P"ToolChanger" ; set printer name M669 K1 ; select CoreXY mode ; Network M552 S1 P192.168.10.60 ; enable network and set IP address M553 P255.255.255.0 ; set netmask M554 P192.168.10.1 ; set gateway M586 P0 S1 ; Enable HTTP M586 P1 S0 ; Disable FTP M586 P2 S1 T0 ; Enable Telnet ; Drives M569 P0 S0 ; X physical drive 0 goes forwards M569 P1 S0 ; Y physical drive 1 goes forwards M569 P2 S1 ; Z physical drive 2 goes backwards M569 P3 S1 ; E0 physical drive 3 goes backwards M569 P4 S1 ; E1 physical drive 4 goes backwards M569 P5 S1 ; E2 physical drive 5 goes backwards M569 P6 S1 ; E3 physical drive 6 goes backwards M569 P7 S0 ; Coupler physical drive 7 goes forwards M569 P8 S0 ; Unused - physical drive 8 goes forwards M569 P9 S0 ; Unused - physical drive 9 goes forwards M584 X0 Y1 Z2 C7 E3:4:5:6 ; set drive mapping M350 X16 Y16 Z16 E16:16:16:16 I1 ; configure microstepping with interpolation M350 C16 I10 ; Configure microstepping without interpolation M92 X100 Y100 Z800 C91.022 E395:395:395:395 ; set steps per mm M566 X900 Y900 Z8 C2 E800:800:800:800 ; set maximum instantaneous speed changes (mm/min) - Jerk Settings M203 X35000 Y35000 Z1200 C5000 E5000:5000:5000:5000 ; set maximum speeds (mm/min) M201 X6000 Y6000 Z400 C500 E3000:3000:3000:3000 ; set accelerations (mm/s^2) M906 X1800 Y1800 Z1330 I30 ; Idle motion motors to 30% M906 E1000:1000:1000:1000 C500 I10 ; Idle extruder motors to 10% M84 S30 ; Set idle timeout ; Axis Limits M208 X-10 Y-7 Z0 S1 ; set axis minima M208 X350 Y275 Z300 S0 ; set axis maxima M208 C-45:360 ; Endstops M574 X1 Y1 S3 ; Set X / Y endstop stall detection M574 C0 Z0 ; No C Z endstop ; Z probe M558 P8 C"zstop" H3 F360 I0 T20000 ; Set Z probe type to switch, the axes for which it is used and the dive height + speeds G31 P200 X0 Y0 Z0 ; Set Z probe trigger value, offset and trigger height M557 X30:333 Y50:249 S40 ; Set probing points ; Heaters M308 S0 P"bedtemp" Y"thermistor" A"Bed" T100000 B4138 ; configure sensor 0 as thermistor on pin bedtemp M950 H0 C"bedheat" T0 q10 ; create bed heater output on bedheat and map it to sensor 0 M307 H0 B0 S1.00 ; disable bang-bang mode for the bed heater and set PWM limit M140 H0 ; map heated bed to heater 0 M143 H0 S200 ; set temperature limit for heater 0 to 200C M308 S1 P"e0temp" Y"thermistor" A"T0" T100000 B4725 C7.06e-8 ; configure sensor 1 as thermistor on pin e0temp M950 H1 C"e0heat" T1 ; create nozzle heater output on e0heat and map it to sensor 1 M307 H1 B0 S1.00 ; disable bang-bang mode for heater and set PWM limit M143 H1 S300 ; set temperature limit for heater 1 to 300C M308 S2 P"e1temp" Y"thermistor" A"T1" T100000 B4725 C7.06e-8 ; configure sensor 2 as thermistor on pin e1temp M950 H2 C"e1heat" T2 ; create nozzle heater output on e1heat and map it to sensor 2 M307 H2 B0 S1.00 ; disable bang-bang mode for heater and set PWM limit M143 H2 S300 ; set temperature limit for heater 2 to 300C M308 S3 P"duex.e2temp" Y"thermistor" A"T2" T100000 B4725 C7.06e-8 ; configure sensor 3 as thermistor on pin duex.e2temp M950 H3 C"duex.e2heat" T3 ; create nozzle heater output on duex.e2heat and map it to sensor 3 M307 H3 B0 S1.00 ; disable bang-bang mode for heater and set PWM limit M143 H3 S300 ; set temperature limit for heater 3 to 300C M308 S4 P"duex.e3temp" Y"thermistor" A"T3" T100000 B4725 C7.06e-8 ; configure sensor 4 as thermistor on pin duex.e3temp M950 H4 C"duex.e3heat" T4 ; create nozzle heater output on duex.e3heat and map it to sensor 4 M307 H4 B0 S1.00 ; disable bang-bang mode for heater and set PWM limit M143 H4 S300 ; set temperature limit for heater 4 to 300C ; Fans M950 F0 C"fan0" Q0 ; create fan 0 on pin fan0 and set its frequency - Fan Not used M106 P0 S0 H-1 ; set fan 0 value. Thermostatic control is turned off - Fan not used M950 F1 C"fan1" Q500 ; create fan 1 on pin fan1 and set its frequency M106 P1 S1 H1 T70 ; set fan 1 value. Thermostatic control is turned on M950 F2 C"fan2" Q0 ; create fan 2 on pin fan2 and set its frequency M106 P2 S0 H-1 ; set fan 2 value. Thermostatic control is turned off M950 F3 C"duex.fan3" Q500 ; create fan 3 on pin duex.fan3 and set its frequency M106 P3 S1 H2 T70 ; set fan 3 value. Thermostatic control is turned on M950 F4 C"duex.fan4" Q0 ; create fan 4 on pin duex.fan4 and set its frequency M106 P4 S0 H-1 ; set fan 4 value. Thermostatic control is turned off M950 F5 C"duex.fan5" Q500 ; create fan 5 on pin duex.fan5 and set its frequency M106 P5 S1 H3 T70 ; set fan 5 value. Thermostatic control is turned on M950 F6 C"duex.fan6" Q0 ; create fan 6 on pin duex.fan6 and set its frequency M106 P6 S0 H-1 ; set fan 6 value. Thermostatic control is turned off M950 F7 C"duex.fan7" Q500 ; create fan 7 on pin duex.fan7 and set its frequency M106 P7 S1 H4 T70 ; set fan 7 value. Thermostatic control is turned on M950 F8 C"duex.fan8" Q0 ; create fan 8 on pin duex.fan8 and set its frequency M106 P8 S0 H-1 ; set fan 8 value. Thermostatic control is turned off ; Tools M563 P0 S"T0" D0 H1 F2 ; define tool 0 G10 P0 X0 Y0 Z0 ; set tool 0 axis offsets G10 P0 R0 S0 ; set initial tool 0 active and standby temperatures to 0C M563 P1 S"T1" D1 H2 F4 ; define tool 1 G10 P1 X0 Y0 Z0 ; set tool 1 axis offsets G10 P1 R0 S0 ; set initial tool 1 active and standby temperatures to 0C M563 P2 S"T2" D2 H3 F6 ; define tool 2 G10 P2 X0 Y0 Z0 ; set tool 2 axis offsets G10 P2 R0 S0 ; set initial tool 2 active and standby temperatures to 0C M563 P3 S"T3" D3 H4 F8 ; define tool 3 G10 P3 X0 Y0 Z0 ; set tool 3 axis offsets G10 P3 R0 S0 ; set initial tool 3 active and standby temperatures to 0C ;tool offsets ; !ESTIMATED! offsets for: ; Hemera-tool: X20 Y43.5 Z-6 ; G10 P0 X-0.10 Y-1.0 Z-7 - Use this as initial starting point for leveling nozzle heights to avoid crashing nozzles into bed ; Tool Offsets G10 P0 X-0.00 Y0.00 Z-5.31 ; Tool 0 G10 P1 X-0.32 Y-0.76 Z-5.40 ; Tool 1 G10 P2 X-0.20 Y-0.98 Z-5.42 ; Tool 2 G10 P3 X-0.32 Y-0.96 Z-5.53 ; Tool 3 ; Pressure advance was turned off by default. M572 D0 S0.020 ; pressure advance T0 Hemera M572 D1 S0.020 ; pressure advance T1 M572 D2 S0.020 ; pressure advance T2 M572 D3 S0.020 ; pressure advance T3 ; Custom settings - Switch in tool grabber ;Safty Switches M950 J0 C"!^e0stop" ; NC Microswitch - In use on Tool Grabber ;M950 J1 C"!^e1stop" ; NO Microswitch Attached yet (Free) ;Duet 2 WiFi and Ethernet: Use this command to tell RRF about the accelerometer: ;M955 P0 C"spi.cs4+spi.cs3" M593 F42.2 ; cancel ringing at 42.2Hz (https://forum.e3d-online.com/threads/accelerometer-and-resonance-measurements-of-the-motion-system.3445/) M593 P"daa" F42.2 ; use DAA to cancel ringing at 42.5Hz ; Pebble Wiper Config M950 S0 C"duex.pwm5" ; Miscellaneous M501 ; load saved parameters from non-volatile memory
-
@trobison I replaced the power supply, this has not fixed the problem.
Replaced the SD card.
I put the whole system on a 3000 VA UPS to check if there is noisy power causing an issue. This has not fixed the issue.
I have disconnected and reconnected all plugs on the board as well. Same issue.
The issue seems to be getting worse. I have gone through the wiring a few times and can not see any issues. The printer can set for yours without an issue. When I send a print, disconnects occur.
Is there anything else I can check?
25/03/2022, 16:34:55 Connection established 25/03/2022, 16:34:54 Connection interrupted, attempting to reconnect... Operation failed (Reason: Service Unavailable) 25/03/2022, 16:34:42 File 0:/gcodes/Benchy_N2_PetG.gcode selected for printing 25/03/2022, 16:34:41 M32 "0:/gcodes/Benchy_N2_PetG.gcode" File 0:/gcodes/Benchy_N2_PetG.gcode selected for printing 25/03/2022, 16:09:49 Connection established
The space the printer is located is 20 degrees. This is a shot of the main board. Not sure it this helps. The network daughter board is not hot ~35 C. There is no back cover on the ToolChanger yet, so air flow should not be an issue.
Regards -
What area of the board is 71c?
Have we tried replacing your actual board yet? Is it still under warranty?
-
@phaedrux These are the chips driving the steppers. The board is running an E3DTool Changer - four extruders, X / Y / Z / Coupler. Yes, the board was replaced some time ago. I believe it is within warranty.
-
@trobison Is there any place where I could look at logging on the Duet 2? The console has many disconnects and that's about it. I hooked a PI to it, but I can't see how I can investigate further.
Is there a way to programmatically exercise just one stepper at a time? Perhaps that can help narrow it down. -
Is there any active fan cooling on the Duet board? If not, can you add some? See if that makes a difference?
You can see some info about logging here, but I don't think it will capture anything useful in this case.
https://docs.duet3d.com/en/User_manual/Troubleshooting/Logging
-
@phaedrux I fired off a print benchy. The stepper drivers hit 74C and the network has working. From a PI, I can get M122
[18:33:11:449] === Diagnostics ===␊ [18:33:11:449] RepRapFirmware for Duet 2 WiFi/Ethernet version 3.4.0 (2022-03-15 18:58:31) running on Duet Ethernet 1.02 or later + DueX5␊ [18:33:11:489] Board ID: 0JD0M-9P6M2-NWNS0-7J9DJ-3SJ6S-K90RJ␊ [18:33:11:489] Used output buffers: 12 of 24 (24 max)␊ [18:33:11:489] === RTOS ===␊ [18:33:11:489] Static ram: 23868␊ [18:33:11:489] Dynamic ram: 73160 of which 0 recycled␊ [18:33:11:489] Never used RAM 11236, free system stack 96 words␊ [18:33:11:489] Tasks: NETWORK(ready,318.8%,216) HEAT(notifyWait,0.9%,307) Move(notifyWait,51.0%,283) DUEX(notifyWait,0.0%,24) MAIN(running,974.3%,442) IDLE(ready,0.2%,30), total 1345.3%␊ [18:33:11:489] Owned mutexes: USB(MAIN)␊ [18:33:11:489] === Platform ===␊ [18:33:11:489] Last reset 01:22:29 ago, cause: power up␊ [18:33:11:489] Last software reset at 2022-03-27 11:46, reason: User, GCodes spinning, available RAM 14368, slot 2␊ [18:33:11:489] Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0041f000 BFAR 0xe000ed38 SP 0x00000000 Task MAIN Freestk 0 n/a␊ [18:33:11:489] Error status: 0x0c␊ [18:33:11:489] Aux0 errors 0,1,0␊ [18:33:11:489] Step timer max interval 0␊ [18:33:11:489] MCU temperature: min 22.2, current 28.1, max 42.9␊ [18:33:11:489] Supply voltage: min 24.0, current 24.1, max 24.2, under voltage events: 0, over voltage events: 0, power good: yes␊ [18:33:11:489] Heap OK, handles allocated/used 99/1, heap memory allocated/used/recyclable 2048/76/0, gc cycles 0␊ [18:33:11:489] Events: 0 queued, 0 completed␊ [18:33:11:489] Driver 0: ok, SG min 0␊ [18:33:11:489] Driver 1: ok, SG min 0␊ [18:33:11:489] Driver 2: ok, SG min 0␊ [18:33:11:489] Driver 3: standstill, SG min n/a␊ [18:33:11:489] Driver 4: ok, SG min 0␊ [18:33:11:489] Driver 5: standstill, SG min n/a␊ [18:33:11:489] Driver 6: standstill, SG min n/a␊ [18:33:11:489] Driver 7: standstill, SG min 0␊ [18:33:11:489] Driver 8: standstill, SG min n/a␊ [18:33:11:489] Driver 9: standstill, SG min n/a␊ [18:33:11:489] Driver 10: ␊ [18:33:11:489] Driver 11: ␊ [18:33:11:489] Date/time: 2022-03-27 18:33:05␊ [18:33:11:489] Cache data hit count 4294967295␊ [18:33:11:489] Slowest loop: 172.26ms; fastest: 0.16ms␊ [18:33:11:489] I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0␊ [18:33:11:489] === Storage ===␊ [18:33:11:489] Free file entries: 9␊ [18:33:11:489] SD card 0 detected, interface speed: 20.0MBytes/sec␊ [18:33:11:489] SD card longest read time 6.2ms, write time 15.5ms, max retries 0␊ [18:33:11:489] === Move ===␊ [18:33:11:489] DMs created 83, segments created 40, maxWait 161965ms, bed compensation in use: mesh, comp offset 0.000␊ [18:33:11:489] === MainDDARing ===␊ [18:33:11:489] Scheduled moves 170745, completed 170705, hiccups 0, stepErrors 0, LaErrors 0, Underruns [22, 0, 1], CDDA state 3␊ [18:33:11:489] === AuxDDARing ===␊ [18:33:11:489] Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1␊ [18:33:11:489] === Heat ===␊ [18:33:11:489] Bed heaters 0 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0␊ [18:33:11:489] Heater 0 is on, I-accum = 0.1␊ [18:33:11:489] Heater 2 is on, I-accum = 0.7␊ [18:33:11:489] === GCodes ===␊ [18:33:11:489] Segments left: 1␊ [18:33:11:489] Movement lock held by null␊ [18:33:11:489] HTTP is idle in state(s) 0␊ [18:33:11:489] Telnet is idle in state(s) 0␊ [18:33:11:489] File is doing "G1 X144.864 Y107.353 E0.28781" in state(s) 0␊ [18:33:11:489] USB is ready with "m122" in state(s) 0␊ [18:33:11:489] Aux is idle in state(s) 0␊ [18:33:11:489] Trigger is idle in state(s) 0␊ [18:33:11:489] Queue is idle in state(s) 0␊ [18:33:11:489] LCD is idle in state(s) 0␊ [18:33:11:489] Daemon is idle in state(s) 0␊ [18:33:11:489] Autopause is idle in state(s) 0␊ [18:33:11:489] Code queue is empty␊ [18:33:11:489] === DueX ===␊ [18:33:11:489] Read count 1, 0.01 reads/min␊ [18:33:11:489] === Network ===␊ [18:33:11:489] Slowest loop: 119.43ms; fastest: 0.00ms␊ [18:33:11:489] Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions␊ [18:33:11:489] HTTP sessions: 0 of 8␊ [18:33:11:489] Interface state disabled, link down␊ [18:33:11:489] ok␊
I have put a large fan blowing on the board, and the temperature has dropped 20C, but the network has failed to come back. Nothing on the console of the screen. I bought the screen because the network has been unreliable. That allows me to cancel or adjust stuff.
After the print, the network failed to come back. I have a reboot macro, and ran that from the screen. Access to the web page returned again. This seems to be getting worse.I started another print with the fan blowing across the Duet 2 on from the beginning. A short time into the print, the network is disconnecting again. The only interface is the screen. It will be impossible to load prints until I reboot the Duet. Disabling and enabling the network from the PI has not effect.
-
The print finished, but not at many crashes. They are around 10-15 min apart with a large pedestal fan blowing on the board. Before the fan, every 30 seconds I would have a network disconnect.
-
@trobison Hey, have you done the tap test? Tap the board with a non conductive item and see if the vibration causes a drop??
-
@airscapes I can give it a go. There is a print on there now, with...
Connection interrupted, attempting to reconnect...
Operation failed (Reason: Service Unavailable) every 30 seconds.
Once this has started, the only way to recover is to cycle the Duet 2. -
@trobison
I conducted a few more tests. The first was the tap tests as suggested. No effect that I could detect. The next test was a vibration test. I made an apparatus to send vibrations into the printer frame. This had no effect that I could detect.Then I had the idea of removing the only interface that allowed me to cancel or control the printer when the network stopped, the screen. The network stayed up. Then I thought this must be where the interference was coming from over the unshielded four-wire connection. I cut up a shielded USB cable and created another cable (shielded) to run from the Duet2 to the screen. It faulted but had significantly fewer errors dropping from every 30 seconds to around 2 per hour.
The next test was to remove the four-wire cable and try the 10 wire ribbon cable. I still get network errors. I didn’t try shielding this cable at this point.
I have no issues with the printer after submitting the print job. But when the network dropouts start, this effectively removes any ability to interact with the printer until I cycle the power via the webpage. I can stop and start the network with a PI and telnet session, that has no effect on correcting the problem. The issue remains until I cycle the Duet.
I still have to determine where the issue is. Is it the screen? Is it a noisy chip on the board? Is it a stepper motor?
I put a clamp on the 24v rail to record the current draw. 1.8 amps while printing. I didn't see any significant deviation in the current draw while printing (0.1 amp).
I can leave the printer powered on for hours and I have no issues with the network. It’s only after some time into a print the network becomes a real issue. Can a faulty screen knock the printer’s network out? I have experienced the same results with shielded and unshielded four wire connection, and with the 10 wire ribbon connection.
I have not been able to get my servo running for cleaning nozzles since I removed everything to simplify testing. The configuration has not changed. The servo definition was the same as I had before upgrading to version 3.4 from 3.3. It worked for months under 3.3, but stopped under 3.4.
After I reconnected the wires, the servo did not work. I tried with a hobby servo in its place as a test and this did not work either.
I tried another port by changing my configuration:
M950 S0 C"duex.pwm5" to M950 S0 C"duex.pwm3"
After the change, the servo works using duex.pwm3. I tried to restore the original configuration back to "duex.pwm5" the original port with the original setting that worked for months. I can't get it to work now. I also tried C"duex.pwm4" and that does not drive the servo either. Is there a way to test "duex.pwm5" and "duex.pwm4".
Without the Panel/Screen connected, the network chip is still warm. Is this within spec? What is considered too hot? I have included a photo.
This is the network chip in the image.Sorry for the long post.
M98 P"config.g" HTTP is enabled on port 80 FTP is disabled TELNET is enabled on port 23 Warning: Heater 0 predicted maximum temperature at full power is 551°C Warning: Heater 1 predicted maximum temperature at full power is 495°C Warning: Heater 2 predicted maximum temperature at full power is 470°C Warning: Heater 4 predicted maximum temperature at full power is 542°C ```![ChipTemp.jpg](/assets/uploads/files/1648775193199-chiptemp.jpg)
-
52c doesn't seem to bad. Usually if there's a chip with a fault it would be noticeably hot.
I'll ping DC42 to see if he has any insight.
I don't think 3.4 has changed anything for servo support that I can see from the change log. If you roll back to 3.3 does it become functional again?
-
@phaedrux Ok, but I'm not keen on rolling it back. It is printing nicely. Do I just update to version 3.3, and that downgrades it?
-
@trobison I downgraded to version 3.3. I then reinstalled the SD Card with my version 3.3 config files. This version had a working servo on PWM5 on the DuetX5 board. After the downgrade, I could not get this working nor PWM5. I performed a M122 all looked good. It was running 3.3 not errors.
I performed another upgrade to 3.4. I performed another M122. This verified that I was now running version 3.4. An interesting observation is that M122 causes a network disconnect and reconnect. I did not get this with version 3.3. Perhaps M122 now cycles the network now.
I have two ports that are not driving my servos (PWM4 and PWM5). Is there a way to test these ports? I have not connected a servo to PWM4 until PWM5 stopped working. I have a functioning servo on PWM3 used to clean nozzles on tool changes.
-
Have you tested for network drops with the Duex disconnected? Not sure how feasible that would be for printing given the tool changer. I'm just wondering it there's an interaction.
-
@phaedrux I'm not sure if I can even print without the Duex. Functionality is spread across the two boards. What is concerning is the PWM4 and PWM4 ports can't drive a servo. PWM5 worked before, but PWM4 was never tested until I was looking for a functioning port. Can these be tested. I tried downgrading and that did not work. I am back at Version 3.4 and switched my servo to PWM3. PWM1 and PWM2 are in use .
-
Can you describe the servo in use and show your config? Are you saying it works in one port, but not another?