Duet 2 Ethernet WC 3.3.0 crashes, have to reset to reconnect
-
@trobison Is there any place where I could look at logging on the Duet 2? The console has many disconnects and that's about it. I hooked a PI to it, but I can't see how I can investigate further.
Is there a way to programmatically exercise just one stepper at a time? Perhaps that can help narrow it down. -
Is there any active fan cooling on the Duet board? If not, can you add some? See if that makes a difference?
You can see some info about logging here, but I don't think it will capture anything useful in this case.
https://docs.duet3d.com/en/User_manual/Troubleshooting/Logging
-
@phaedrux I fired off a print benchy. The stepper drivers hit 74C and the network has working. From a PI, I can get M122
[18:33:11:449] === Diagnostics ===␊ [18:33:11:449] RepRapFirmware for Duet 2 WiFi/Ethernet version 3.4.0 (2022-03-15 18:58:31) running on Duet Ethernet 1.02 or later + DueX5␊ [18:33:11:489] Board ID: 0JD0M-9P6M2-NWNS0-7J9DJ-3SJ6S-K90RJ␊ [18:33:11:489] Used output buffers: 12 of 24 (24 max)␊ [18:33:11:489] === RTOS ===␊ [18:33:11:489] Static ram: 23868␊ [18:33:11:489] Dynamic ram: 73160 of which 0 recycled␊ [18:33:11:489] Never used RAM 11236, free system stack 96 words␊ [18:33:11:489] Tasks: NETWORK(ready,318.8%,216) HEAT(notifyWait,0.9%,307) Move(notifyWait,51.0%,283) DUEX(notifyWait,0.0%,24) MAIN(running,974.3%,442) IDLE(ready,0.2%,30), total 1345.3%␊ [18:33:11:489] Owned mutexes: USB(MAIN)␊ [18:33:11:489] === Platform ===␊ [18:33:11:489] Last reset 01:22:29 ago, cause: power up␊ [18:33:11:489] Last software reset at 2022-03-27 11:46, reason: User, GCodes spinning, available RAM 14368, slot 2␊ [18:33:11:489] Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0041f000 BFAR 0xe000ed38 SP 0x00000000 Task MAIN Freestk 0 n/a␊ [18:33:11:489] Error status: 0x0c␊ [18:33:11:489] Aux0 errors 0,1,0␊ [18:33:11:489] Step timer max interval 0␊ [18:33:11:489] MCU temperature: min 22.2, current 28.1, max 42.9␊ [18:33:11:489] Supply voltage: min 24.0, current 24.1, max 24.2, under voltage events: 0, over voltage events: 0, power good: yes␊ [18:33:11:489] Heap OK, handles allocated/used 99/1, heap memory allocated/used/recyclable 2048/76/0, gc cycles 0␊ [18:33:11:489] Events: 0 queued, 0 completed␊ [18:33:11:489] Driver 0: ok, SG min 0␊ [18:33:11:489] Driver 1: ok, SG min 0␊ [18:33:11:489] Driver 2: ok, SG min 0␊ [18:33:11:489] Driver 3: standstill, SG min n/a␊ [18:33:11:489] Driver 4: ok, SG min 0␊ [18:33:11:489] Driver 5: standstill, SG min n/a␊ [18:33:11:489] Driver 6: standstill, SG min n/a␊ [18:33:11:489] Driver 7: standstill, SG min 0␊ [18:33:11:489] Driver 8: standstill, SG min n/a␊ [18:33:11:489] Driver 9: standstill, SG min n/a␊ [18:33:11:489] Driver 10: ␊ [18:33:11:489] Driver 11: ␊ [18:33:11:489] Date/time: 2022-03-27 18:33:05␊ [18:33:11:489] Cache data hit count 4294967295␊ [18:33:11:489] Slowest loop: 172.26ms; fastest: 0.16ms␊ [18:33:11:489] I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0␊ [18:33:11:489] === Storage ===␊ [18:33:11:489] Free file entries: 9␊ [18:33:11:489] SD card 0 detected, interface speed: 20.0MBytes/sec␊ [18:33:11:489] SD card longest read time 6.2ms, write time 15.5ms, max retries 0␊ [18:33:11:489] === Move ===␊ [18:33:11:489] DMs created 83, segments created 40, maxWait 161965ms, bed compensation in use: mesh, comp offset 0.000␊ [18:33:11:489] === MainDDARing ===␊ [18:33:11:489] Scheduled moves 170745, completed 170705, hiccups 0, stepErrors 0, LaErrors 0, Underruns [22, 0, 1], CDDA state 3␊ [18:33:11:489] === AuxDDARing ===␊ [18:33:11:489] Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1␊ [18:33:11:489] === Heat ===␊ [18:33:11:489] Bed heaters 0 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0␊ [18:33:11:489] Heater 0 is on, I-accum = 0.1␊ [18:33:11:489] Heater 2 is on, I-accum = 0.7␊ [18:33:11:489] === GCodes ===␊ [18:33:11:489] Segments left: 1␊ [18:33:11:489] Movement lock held by null␊ [18:33:11:489] HTTP is idle in state(s) 0␊ [18:33:11:489] Telnet is idle in state(s) 0␊ [18:33:11:489] File is doing "G1 X144.864 Y107.353 E0.28781" in state(s) 0␊ [18:33:11:489] USB is ready with "m122" in state(s) 0␊ [18:33:11:489] Aux is idle in state(s) 0␊ [18:33:11:489] Trigger is idle in state(s) 0␊ [18:33:11:489] Queue is idle in state(s) 0␊ [18:33:11:489] LCD is idle in state(s) 0␊ [18:33:11:489] Daemon is idle in state(s) 0␊ [18:33:11:489] Autopause is idle in state(s) 0␊ [18:33:11:489] Code queue is empty␊ [18:33:11:489] === DueX ===␊ [18:33:11:489] Read count 1, 0.01 reads/min␊ [18:33:11:489] === Network ===␊ [18:33:11:489] Slowest loop: 119.43ms; fastest: 0.00ms␊ [18:33:11:489] Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions␊ [18:33:11:489] HTTP sessions: 0 of 8␊ [18:33:11:489] Interface state disabled, link down␊ [18:33:11:489] ok␊
I have put a large fan blowing on the board, and the temperature has dropped 20C, but the network has failed to come back. Nothing on the console of the screen. I bought the screen because the network has been unreliable. That allows me to cancel or adjust stuff.
After the print, the network failed to come back. I have a reboot macro, and ran that from the screen. Access to the web page returned again. This seems to be getting worse.I started another print with the fan blowing across the Duet 2 on from the beginning. A short time into the print, the network is disconnecting again. The only interface is the screen. It will be impossible to load prints until I reboot the Duet. Disabling and enabling the network from the PI has not effect.
-
The print finished, but not at many crashes. They are around 10-15 min apart with a large pedestal fan blowing on the board. Before the fan, every 30 seconds I would have a network disconnect.
-
@trobison Hey, have you done the tap test? Tap the board with a non conductive item and see if the vibration causes a drop??
-
@airscapes I can give it a go. There is a print on there now, with...
Connection interrupted, attempting to reconnect...
Operation failed (Reason: Service Unavailable) every 30 seconds.
Once this has started, the only way to recover is to cycle the Duet 2. -
@trobison
I conducted a few more tests. The first was the tap tests as suggested. No effect that I could detect. The next test was a vibration test. I made an apparatus to send vibrations into the printer frame. This had no effect that I could detect.Then I had the idea of removing the only interface that allowed me to cancel or control the printer when the network stopped, the screen. The network stayed up. Then I thought this must be where the interference was coming from over the unshielded four-wire connection. I cut up a shielded USB cable and created another cable (shielded) to run from the Duet2 to the screen. It faulted but had significantly fewer errors dropping from every 30 seconds to around 2 per hour.
The next test was to remove the four-wire cable and try the 10 wire ribbon cable. I still get network errors. I didn’t try shielding this cable at this point.
I have no issues with the printer after submitting the print job. But when the network dropouts start, this effectively removes any ability to interact with the printer until I cycle the power via the webpage. I can stop and start the network with a PI and telnet session, that has no effect on correcting the problem. The issue remains until I cycle the Duet.
I still have to determine where the issue is. Is it the screen? Is it a noisy chip on the board? Is it a stepper motor?
I put a clamp on the 24v rail to record the current draw. 1.8 amps while printing. I didn't see any significant deviation in the current draw while printing (0.1 amp).
I can leave the printer powered on for hours and I have no issues with the network. It’s only after some time into a print the network becomes a real issue. Can a faulty screen knock the printer’s network out? I have experienced the same results with shielded and unshielded four wire connection, and with the 10 wire ribbon connection.
I have not been able to get my servo running for cleaning nozzles since I removed everything to simplify testing. The configuration has not changed. The servo definition was the same as I had before upgrading to version 3.4 from 3.3. It worked for months under 3.3, but stopped under 3.4.
After I reconnected the wires, the servo did not work. I tried with a hobby servo in its place as a test and this did not work either.
I tried another port by changing my configuration:
M950 S0 C"duex.pwm5" to M950 S0 C"duex.pwm3"
After the change, the servo works using duex.pwm3. I tried to restore the original configuration back to "duex.pwm5" the original port with the original setting that worked for months. I can't get it to work now. I also tried C"duex.pwm4" and that does not drive the servo either. Is there a way to test "duex.pwm5" and "duex.pwm4".
Without the Panel/Screen connected, the network chip is still warm. Is this within spec? What is considered too hot? I have included a photo.
This is the network chip in the image.Sorry for the long post.
M98 P"config.g" HTTP is enabled on port 80 FTP is disabled TELNET is enabled on port 23 Warning: Heater 0 predicted maximum temperature at full power is 551°C Warning: Heater 1 predicted maximum temperature at full power is 495°C Warning: Heater 2 predicted maximum temperature at full power is 470°C Warning: Heater 4 predicted maximum temperature at full power is 542°C ```![ChipTemp.jpg](/assets/uploads/files/1648775193199-chiptemp.jpg)
-
52c doesn't seem to bad. Usually if there's a chip with a fault it would be noticeably hot.
I'll ping DC42 to see if he has any insight.
I don't think 3.4 has changed anything for servo support that I can see from the change log. If you roll back to 3.3 does it become functional again?
-
@phaedrux Ok, but I'm not keen on rolling it back. It is printing nicely. Do I just update to version 3.3, and that downgrades it?
-
@trobison I downgraded to version 3.3. I then reinstalled the SD Card with my version 3.3 config files. This version had a working servo on PWM5 on the DuetX5 board. After the downgrade, I could not get this working nor PWM5. I performed a M122 all looked good. It was running 3.3 not errors.
I performed another upgrade to 3.4. I performed another M122. This verified that I was now running version 3.4. An interesting observation is that M122 causes a network disconnect and reconnect. I did not get this with version 3.3. Perhaps M122 now cycles the network now.
I have two ports that are not driving my servos (PWM4 and PWM5). Is there a way to test these ports? I have not connected a servo to PWM4 until PWM5 stopped working. I have a functioning servo on PWM3 used to clean nozzles on tool changes.
-
Have you tested for network drops with the Duex disconnected? Not sure how feasible that would be for printing given the tool changer. I'm just wondering it there's an interaction.
-
@phaedrux I'm not sure if I can even print without the Duex. Functionality is spread across the two boards. What is concerning is the PWM4 and PWM4 ports can't drive a servo. PWM5 worked before, but PWM4 was never tested until I was looking for a functioning port. Can these be tested. I tried downgrading and that did not work. I am back at Version 3.4 and switched my servo to PWM3. PWM1 and PWM2 are in use .
-
Can you describe the servo in use and show your config? Are you saying it works in one port, but not another?
-
@phaedrux This was my working config.g with Version 3.3
; Servo Config M950 S0 C"duex.pwm5"
and to exercise the servo:
M280 P0 S180 ; Set Servo Position for Wipe to 160 degrees G4 P600 ; Pause 600 ms M280 P0 S1-0 ; Set Servo Position for Wipe to 0 degrees G4 P600 ; Pause 600 ms M280 P0 S180 ; Return to Servo Position for Wipe
After the update to version 3.4, I couldn't get the servo working. I have moved to duex.pwm3. duex.pwm4 did not work, this port never had anything connected to it previously.
Current config.g has this definition.
M950 S0 C"duex.pwm3" ; Port 0
and to exercise the servo:
M280 P0 S180 ; Set Servo Position for Wipe to 160 degrees G4 P600 ; Pause 600 ms M280 P0 S1-0 ; Set Servo Position for Wipe to 0 degrees G4 P600 ; Pause 600 ms M280 P0 S180 ; Return to Servo Position for Wipe
I have been testing with a small hobby 9G servo. It works in duex.pwm3 but not in duex.pwm4 or duex.pwm5.
-
@Phaedrux Just wanted to updated my issue. With the replacement board and replacement Ethernet the network has been very stable. I can still not reproduce this on demand, however yesterday I ran a job that did not have the code to extend the build plate for part removal and I slowly pulled the bed out. Everything was fine, I just reran the job forgetting to upload the new gcode after I put the end script code in. Once that job finished I moved the bed in the same slow manner and this time, after returning to the disk the network was disconnected. After reconnection I physically tried to reproduce this but moving the bed.. slow, fast, in starts and stop.. could not get the problem to reproduce. Very odd. BTW when I tired the disable/enable I had to do it 2 times before it worked. Something very odd is going on but at least I can live with it like this. Have a good one!
-
@airscapes Since I removed the screen, the network has been stable.
I can send an M122 and that will cause the network to disconnect and reconnect at any time. I can work with that. It was convenient to have a screen without a PC near to control the printer if required.My most pressing issue are the ports used to control the servos. I can't get them to function and have no idea how to test this on the expansion board, DueX5.
-
@trobison there are two types of port used to generate PWM on the Duet and DueX. The code to drive servos was changed in RRF3.4 to support variable servo refresh frequency. So I am wondering whether a bug has crept in that affects just one of the two types of port.
-
@dc42 G'day dc42. Can you provide a way to test the ports on the DueX? I can move the servo to the next port, reconfigure the config.g and send a command to move the servo, but nothing happens. When I configure them from the console, no errors are present. It looks like hardware, but you are correct. It could be a bug.
-
@trobison I have tested this on my setup and cant see the difference between 3.3 and 3.4
As a complete test you could try temporarily downgrading to 3.3 and see if it changes the result.
-
@t3p3tony I completed this task earlier on 4 April. There was no difference with the servos, but the network does not disconnect and reconnect with a M122. I still can't get Ports PWM5 and PWM5 to function.