Connection dropping using Duet Web API from python script
-
I guess this might be a question best addressed to @chrishamm
I'm the current keeper of Danal's TAMV/DuetWebAPI project for the Jubilee community, and I've run into some sort of buffer error constantly while using the python libraries we've got up and running.
I'm testing them out on my Duet2/Duex5 setup, running RRF3.2 and the latest DWC/etc. Our python library that runs calls to rr_gcode is now causing buffer errors on the controller constantly, and I've been getting dropped connections to the board (not the Wifi, but to DWC itself) as a result too. These issues never used to happen on the RRF2.0.5 and the old DWC at all, so I'm assuming something has changed under the hood with how API calls are processed that leads to the error.
Wifi reception is stable around -57dB, wifi module doesn't drop at all when I see these errors, and no errors are reported on a browser concurrently running a DWC session to the print, nor are they appearing anywhere on my PanelDue (using a 4 wire connector, no ribbon cable).
I've tried mitigating this issue by running the following sequence for each gcode being issued:
- start a session using rr_connect and a blank password
- call rr_gcode with no parameters and check the buffer response (usually almost always at 255)
- call rr_gcode with the command to be sent
- check for "200" status reply
- call rr_disconnect
In previous iterations of the code, we simply used a fire & forget call to rr_gcode. With or without the buffer check I'm now running, I'm getting 0x0c error codes thrown by the board, and I'm getting dropped connections to DWC (again, controller is fine, not rebooting at all). FWIW I have AJAX retries set to 5 in DWC (although I'm not sure this affects the code I'm running from python at all)
Any ideas what may be going on, or how I can troubleshoot this further?
The python procedure issuing the call can be found at [this repo file on line 191].(https://github.com/HaythamB/TAMV/blob/master/DuetWebAPI.py)
An M122 dump right after a disconnect, if it may be helpful:
m122 === Diagnostics === RepRapFirmware for Duet 2 WiFi/Ethernet version 3.2 running on Duet WiFi 1.02 or later + DueX5 Board ID: 08DGM-917NK-F2MS4-7J1F0-3S46R-KGS4D Used output buffers: 11 of 24 (24 max) === RTOS === Static ram: 23460 Dynamic ram: 75628 of which 60 recycled Never used RAM 12932, free system stack 99 words Tasks: NETWORK(ready,163) HEAT(blocked,309) DUEX(blocked,35) MAIN(running,291) IDLE(ready,19) Owned mutexes: === Platform === Last reset 00:16:49 ago, cause: power up Last software reset at 2021-02-15 13:55, reason: User, GCodes spinning, available RAM 12852, slot 1 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0041f000 BFAR 0xe000ed38 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x0c Aux0 errors 0,0,0 MCU temperature: min 34.0, current 34.3, max 34.6 Supply voltage: min 24.2, current 24.3, max 24.4, under voltage events: 0, over voltage events: 0, power good: yes Driver 0: position 39845, standstill, SG min/max not available Driver 1: position 3486, standstill, SG min/max not available Driver 2: position 115646, standstill, SG min/max not available Driver 3: position 0, standstill, SG min/max not available Driver 4: position 0, standstill, SG min/max not available Driver 5: position 0, standstill, SG min/max not available Driver 6: position 0, standstill, SG min/max not available Driver 7: position 0, standstill, SG min/max not available Driver 8: position 0, standstill, SG min/max not available Driver 9: position 0, standstill, SG min/max not available Driver 10: position 0 Driver 11: position 0 Date/time: 2021-02-15 20:39:08 Cache data hit count 1228322176 Slowest loop: 6.12ms; fastest: 0.21ms I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0 === Storage === Free file entries: 9 SD card 0 detected, interface speed: 20.0MBytes/sec SD card longest read time 3.3ms, write time 0.0ms, max retries 0 === Move === DMs created 83, maxWait 0ms, bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 401, completed moves 401, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 3 is on, I-accum = 0.0 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 Daemon is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 201.98ms; fastest: 0.11ms Responder states: HTTP(2) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions HTTP sessions: 2 of 8 - WiFi - Network state is active WiFi module is connected to access point Failed messages: pending 0, notready 0, noresp 4 WiFi firmware version 1.25 WiFi MAC address 60:01:94:2e:9f:97 WiFi Vcc 3.39, reset reason Turned on by main processor WiFi flash size 4194304, free heap 26208 WiFi IP address 192.168.1.202 WiFi signal strength -57dBm, mode 802.11n, reconnections 0, sleep mode modem Clock register 00002002 Socket states: 0 0 0 0 0 0 0 0 === DueX === Read count 0, 0.00 reads/min
-
Also, M115 output:
FIRMWARE_NAME: RepRapFirmware for Duet 2 WiFi/Ethernet FIRMWARE_VERSION: 3.2 ELECTRONICS: Duet WiFi 1.02 or later + DueX5 FIRMWARE_DATE: 2021-01-05
And the event log is clearly showing the session start and end events from the pi I'm running the python code on:
2021-02-15 20:25:46 [warn] HTTP client 192.168.1.204 login succeeded 2021-02-15 20:25:48 [warn] HTTP client 192.168.1.204 disconnected 2021-02-15 20:26:36 [warn] HTTP client 192.168.1.204 login succeeded 2021-02-15 20:26:38 [warn] HTTP client 192.168.1.204 disconnected 2021-02-15 20:26:38 [warn] HTTP client 192.168.1.204 login succeeded 2021-02-15 20:26:39 [warn] HTTP client 192.168.1.204 disconnected
-
And one last interesting thing to note: if I run an M122 from the DWC console while the script is running, instant disconnection from my python script...
-
And just found that the panelDue console displays "slowing down poll rate"
-
As color commentary ....
Some users reported the same type of issues with DuetLapse3. It only uses http GET calls so I ended up restructuring the code to use a single calling function with retry logic around it.
This works well since these connection issues ( assuming they are the same as described here) are transient - maybe lasting ~ 1 sec.
It also has the advantage of being resilient to emergency stops and the like.
Just a workaround though .... -
@stuartofmt that's a good idea, I'll give it a shot as well. I do have random hangs now and then in the code that requires stopping the session and restarting the script. Thanks for the tip!
-
@H2B said in Connection dropping using Duet Web API from python script:
@stuartofmt that's a good idea, I'll give it a shot as well. I do have random hangs now and then in the code that requires stopping the session and restarting the script. Thanks for the tip!
Here is my code. Understand that I am not a Python (or professional) programmer :-). In the calling code, depending on what it is - I decide whether to retry again or not (hence the loop counter is only 2 attempts).
I tried to simulated errors by dropping the network, turning duet off, hitting emergency stop multiple ties from DWC etc. I have only seen the ConnectionError with the above tests (understandable). I have not tried loading the Duet with many other requests because this was "good enough".
This link may be useful too:https://stackoverflow.com/questions/16511337/correct-way-to-try-except-using-python-requests-module
def urlCall(url,timelimit): loop = 0 while (loop < 2): try: r = requests.get(url, timeout=timelimit) break except requests.ConnectionError as e: logger.info('') logger.info('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') logger.info('There was a network failure: '+str(e)) logger.info('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') logger.info('') loop += 1 error = 'Connection Error' except requests.exceptions.Timeout as e: logger.info('') logger.info('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') logger.info('There was a timeout failure: '+str(e)) logger.info('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') logger.info('') loop += 1 error = 'Timed Out' time.sleep(1) if (loop >= 2): #Create dummy response class r: ok = False status_code = 9999 reason = error return r
-
Whenever you call rr_connect or rr_gcode without a set password you create a new HTTP session which implies that you must fetch rr_reply in regular intervals. However once you call rr_disconnect the previously created session should be cleared and the saved G-code reply should be freed.
Can you explain what you mean by 0x0c error codes? If it's actually rr_reply not being released in time, you should see HTTP 503 for the duration of roughly 8 seconds.
-
So yes the code is fetching rr_reply after each call before disconnecting.
And the 0x0c error is what shows up for the Duet Error code in the M122 output.
I've been able to work around this issue by waiting in the code until the buffer size is greater than 150, which seems to work so far in my testing, but I'm still not clear on why I'm having these issues.
What do you recommend I try next?