guild icon
Toit
#Exception in TCP/MQTT context
Thread channel in help
kaxori
kaxori 01/31/2025 12:17 PM
Application rises an exception after some runtime.

A measurement loop reads data from sensor and report them to mqtt broker.

The longer the loop period, the longer it last until exception. In order to archieve it as quickly as possible, the prtiod was set to 10ms.


Has anyone an idea how I could find the source of the problem ?
kaxori
kaxori 01/31/2025 12:18 PM
exception log
floitsch
floitsch 01/31/2025 12:44 PM
How frequent are these disconnects?
DEBUG: closing connection {reason: null} DEBUG: attempting to (re)connect DEBUG: connected to broker DEBUG: connection established INFO: disconnect from server
floitsch
floitsch 01/31/2025 12:45 PM
From looking at the stacktraces it seems like there is a 'write' while the connection is disconnected.
kaxori
kaxori 01/31/2025 12:52 PM
it takes
loop period | run time until exception:
2s | 235 min (~4h)
100 ms | 37 min Stresstest
floitsch
floitsch 01/31/2025 12:53 PM
But these DEBUG... lines are more frequent. right?
kaxori
kaxori 01/31/2025 12:54 PM
no there appear before the exception
floitsch
floitsch 01/31/2025 12:58 PM
Let me rephrase: I see some "DEBUG closing connection ..." before the actual stack trace.
I can also see that there are some writes of the temperature between these DEBUG logs.
So it looks like these DEBUG prints are within a very short period of time.
Is that correct?

Are there other "DEBUG closing connection" prints before the ones that you sent?
kaxori
kaxori 01/31/2025 12:58 PM
new log with more debug
floitsch
floitsch 01/31/2025 01:00 PM
These are good, but the individual DEBUG lines don't have a timestamp, so I can't tell how frequent they are.
Also, I would like to know if these DEBUG lines only happen when things go south, or if they appear frequently before the crash.
kaxori
kaxori 01/31/2025 01:04 PM
The debug lines appear only before crash
floitsch
floitsch 01/31/2025 01:12 PM
ok. So what seems to happen is that the connection breaks.
It's not clear to me why. And normally there are attempts to reconnect (as can be seen).
There might be two things going wrong: the reconnection attempts seem to be without delays. There are no timestamps on the logs, but I think I have seen you print the temperature with a time and there were several attempts within a second.
That's probably not good.
Also, if it tries to reconnect, then it should not crash.

I'm currently travelling, and might not have time to look into this over the next few days, but I will try to do so next week.
In the meantime try to catch the exception (at the publish end, but probably also at the start of the MQTT client with --on-error). Hopefully you can just restart the client.
kaxori
kaxori 01/31/2025 01:37 PM
Thank you, I have set up a test client to find the problem.
(The working sensor- clients use watchdog, I observed restarts < 24h)
kaxori
kaxori 01/31/2025 08:14 PM
after loosing the connection it crashes in line 124 the "background task"
floitsch
floitsch 01/31/2025 08:17 PM
Ok. That's not fatal. The catch --trace means that it will print the stacktrace, but it will still catch it.
It then calls the on-error callback in the next line. However, it will shut down the task. So the on-error callback would need to restart the connection.
Also, this doesn't really help for the other task that is trying to send something.
kaxori
kaxori 01/31/2025 09:09 PM
What stands "AS_CHECK_FAILED" for ?
floitsch
floitsch 01/31/2025 09:10 PM
Either an explicit as X failed, or a type was written with foo/Type and the automatic type check failed.
This can also happen when the type was Foo (not Foo?) and the variable was null.
kaxori
kaxori 01/31/2025 09:12 PM
aha remembers me to the 'guru meditation' error of the late 80ties :πŸ˜…:
😁1
kaxori
kaxori 01/31/2025 10:39 PM
starting a monitor ala 'jag monitor --port COM11 -a' work,
but if I close the monitor window the target restarts - why?
floitsch
floitsch 01/31/2025 10:40 PM
Interesting. I have never encountered that. Could be a bug on windows (or I just never noticed)
floitsch
floitsch 02/03/2025 02:46 PM
I just released a new version of the MQTT package. It probably doesn't solve your issue, but it should avoid constant connect/disconnects as we are seeing. If the client is disconnected within a short period of time after it managed to connect, then the client will back off and not try to connect immediately.
kaxori
kaxori 02/04/2025 10:42 AM
I think my clumsy coding causes the trouble. Are there function to get the state of current consumed resources (ram, heap, stack, ...) ?
kaxoriOPkaxori
I think my clumsy coding causes the trouble. Are there function to get the state of current consumed resources (ram, heap, stack, ...) ?
kaxori
kaxori 02/06/2025 09:07 AM
I created a test app without much own code - and have same problems.
` loop:
print "loop"
while true:
loop-counter += 1
publish-key "loops" "$loop-counter"
print "$loop-counter"



//mqtt-client.publish "mqttNet/stateDisplay/led" "{"led":$MQTT-STATUS-LED"rgb":$MQTT-STATUS-COLOR-ON}"
publish-led MQTT-STATUS-COLOR-ACTIVE


sleep --ms=510

//mqtt-client.publish "mqttNet/stateDisplay/led" "{"led":$MQTT-STATUS-LED"rgb":$MQTT-STATUS-COLOR-OFF}"

publish-led MQTT-STATUS-COLOR-ON
sleep --ms=5
100
floitsch
floitsch 02/06/2025 09:08 AM
Is this with the latest package?
kaxori
kaxori 02/06/2025 09:09 AM
Version: v1.47.0
SDK version: v2.0.0-alpha.174
Build date: 2024-12-10T10:55:47Z
floitsch
floitsch 02/06/2025 09:09 AM
I don't expect the latest version to fix you problems, but I would expect a different output with it.
floitsch
floitsch 02/06/2025 09:09 AM
I'm talking about the mqtt package.
floitsch
floitsch 02/06/2025 09:09 AM
Run jag pkg update to get the latest versions of the packages.(edited)
floitsch
floitsch 02/06/2025 09:10 AM
You can see the used version in the package.lock file.
kaxori
kaxori 02/06/2025 09:10 AM
mqtt - 2.10.0
floitsch
floitsch 02/06/2025 09:10 AM
Yes. that's the latest version.
floitsch
floitsch 02/06/2025 09:10 AM
The given output was with that version?
floitsch
floitsch 02/06/2025 09:12 AM
That said, I just noticed the sleep of 5s in your code. That could work. Within the 5 seconds it tries to reconnect 3 times, which is acceptable. (It used to just try to reconnect without delay).
floitsch
floitsch 02/06/2025 09:13 AM
Can you check on your mqtt server why the client is kicked all the time? There must be some way to log it.
kaxori
kaxori 02/06/2025 09:13 AM
=> stress test with 500ms !
kaxori
kaxori 02/06/2025 09:15 AM
server is a mqtt broker running on Raspi, ... will have a look later ...
floitsch
floitsch 02/06/2025 09:16 AM
The mqtt package should deal better with the disconnect, but there is still something flaky/bad with the server, since it kicks the client.
kaxori
kaxori 02/07/2025 09:53 AM
the stress test above (local mqtt server, 500ms loop) crashed after (201844 loops) 3 hours.

started a long term test on "test.mosquitto.org" ...
(edited)
kaxoriOPkaxori
the stress test above (local mqtt server, 500ms loop) crashed after (201844 loops) 3 hours. started a long term test on "test.mosquitto.org" ...(edited)
floitsch
floitsch 02/07/2025 12:15 PM
Maybe run with a local version of Mosquitto? This way you can have the server logs too.
kaxori
kaxori 02/09/2025 01:45 PM
```
1739102223: New client connected from 172.17.0.1:53318 as AIR-F2222C (p2, c0, k60).
1739102223: Client AIR-F2222C disconnected due to protocol error.
1739102230: New connection from 172.17.0.1:53322 on port 1883.
1739102230: New client connected from 172.17.0.1:53322 as AIR-F2222C (p2, c0, k60).
1739102230: Client AIR-F2222C disconnected due to protocol error.

=> happened after 5957 measurement loops
(edited)
floitsch
floitsch 02/09/2025 03:04 PM
Hmmm. interesting
floitsch
floitsch 02/09/2025 03:04 PM
I will try to find the time and try to reproduce
kaxori
kaxori 02/10/2025 04:49 PM
I installed MQTT brokers (besides raspi, on PC and NAS). the toit mqtt test app produces connection errors on all brokers.
2025-02-10T13:27:11: Sending PUBACK to AIR-F2222C (m65535, rc0) 2025-02-10T13:27:11: Client AIR-F2222C disconnected due to protocol error. 2025-02-10T13:31:22: New connection from 172.17.0.1:41168 on port 1883. 2025-02-10T13:31:22: New client connected from 172.17.0.1:41168 as AIR-F2222C (p2, c0, k60). 2025-02-10T13:31:22: Will message specified (28 bytes) (r1, q1). 2025-02-10T13:31:22: mqttNet/stateDisplay/led 2025-02-10T13:31:22: Sending CONNACK to AIR-F2222C (1, 0) 2025-02-10T13:31:22: Client AIR-F2222C disconnected due to protocol error.
the first error occurs, after the loop counter reaches 65535.
(edited)
floitsch
floitsch 02/10/2025 05:14 PM
Interesting. So maybe we are not correctly resetting a counter. Thanks for the debugging
floitsch
floitsch 02/10/2025 05:46 PM
Once I knew what to look for, it was easy to fix. PR under review.
Thanks!
πŸ‘1
floitsch
floitsch 02/10/2025 06:02 PM
New version with the fix is in the process of being published.
kaxori
kaxori 02/10/2025 06:15 PM
:πŸ‘πŸ»: Wie immer ultraschnelle Reaktion und Reparatur !
Vielen Dank dafΓΌr. Installiert - Tests laufen ...
πŸ™2
kaxori
kaxori 02/10/2025 09:56 PM
:🀩: previous thresholds of the loop-counters without disconnects exceeded => looks very good :πŸ‘πŸ»:
πŸ₯³2
kaxori
kaxori 02/24/2025 09:58 AM
=> total improvement of wifi/mqtt link stability !
since > 12 days no further loss of connection and no watchdog bites
πŸ₯³1
50 messages in total