guild icon
Toit
#MQTT client doesn't realize that it lost its connection to the broker
Thread channel in help
tplux
tplux 11/11/2022 02:04 PM
What happens if the ESP doesn't lose its wifi connection but the wifi router is no longer connected to the internet?
When we unplug our router from the internet, the ESP is still happy about its connection and calls to mqtt.publish don't give any errors. After about 10 seconds and several calls to publish, it seems to hang in the publish call, and when we reconnect the router, the ESP unhangs and resumes normally after a similar period. It's like the mqtt client doesn't realize that it is not online, which is supported by the fact that when it finally comes back online it doesn't need to call mqtt.connect to be able to publish normally.
(We are using version 2.0.4 of the mqtt client.)
floitsch
floitsch 11/13/2022 05:03 PM
I did a bit of work to improve the stability of the MQTT package on Friday.
I will continue with it tommorow (Monday), and will have a look at your scenario.
👍3
tplux
tplux 11/14/2022 06:35 AM
Thanks :🙂:
floitsch
floitsch 11/14/2022 01:09 PM
Fyi: I'm feeling under the weather, so I won't make any progress on this today.
👍1
tplux
tplux 11/21/2022 07:59 AM
Hi Florian. Any progress on this?
floitsch
floitsch 11/21/2022 10:18 AM
I'm going to work on this today.
🙂1
floitsch
floitsch 11/21/2022 03:47 PM
I just looked into this.
floitsch
floitsch 11/21/2022 03:51 PM
TCP connections have a really long delay before they consider a connection dead.
I tried on my Linux and an ESP32 and they both took 15-30 minutes to realize that something is wrong.
I googled a bit to find out whether one can reduce that delay, but didn't yet find anything for the ESP. If there is, we can try to expose it, so that users can set it.

Independently, I just implemented (but not tested) a max_inflight option. For QoS=1 packets, the client would block a publish if there were too many outstanding packets.
Together with a with_timeout on the publish this could yield a faster detection of dropped connections.
👍1
tplux
tplux 11/22/2022 06:43 AM
That could explain it.
What about QoS=0? (which is what we are using)
floitsch
floitsch 11/22/2022 08:27 AM
QoS=0 would just send into the void.
So there wouldn't be any difference to the current behavior.
The window size on the esp32 is relatively small so you can probably still detect it quite early with a with_timeout
(edited)
floitsch
floitsch 11/22/2022 08:29 AM
(at least I think it's the window size that makes the call blocking at some point)
tplux
tplux 11/29/2022 08:58 AM
Hi Florian
with_timeout solved the issue, but we could probably benefit from the changes you made to the mqtt package. Do you plan to make a new package release with the changes?
/Tommy
floitsch
floitsch 11/29/2022 08:58 AM
Yes. Thanks for reminding me.
tplux
tplux 11/29/2022 08:59 AM
No problem :😄:
floitsch
floitsch 11/29/2022 11:50 AM
Released
tplux
tplux 11/29/2022 11:51 AM
Thanks :🙂:
Rikke
Rikke 11/29/2022 01:04 PM
It looks like the new mqtt package requires 2.0.0-alpha.41, do you remember what exactly is required from it? :🙂:
RikkeRikke
It looks like the new mqtt package requires 2.0.0-alpha.41, do you remember what exactly is required from it? :🙂:
floitsch
floitsch 11/29/2022 01:05 PM
Mostly minor things.
floitsch
floitsch 11/29/2022 01:05 PM
Like an implementation of monitor.Signal and similar.
floitsch
floitsch 11/29/2022 01:06 PM
I could "backport" it and release another version that still runs on v1.
floitsch
floitsch 11/29/2022 01:07 PM
Which version do you have?
Rikke
Rikke 11/29/2022 01:07 PM
We are using 2.0.0-alpha.35 currently, but we do need to upgrade in the near future anyway, so we are just discussing atm :🙂:
floitsch
floitsch 11/29/2022 01:08 PM
I'm pretty sure it should just work if there aren't any warnings during compilation.
floitsch
floitsch 11/29/2022 01:08 PM
Maybe 35 is recent enough.
floitsch
floitsch 11/29/2022 01:08 PM
I just picked the one I was testing with.
Rikke
Rikke 11/29/2022 01:08 PM
I am getting The SDK constraint defined in the package.lock file is not satisfied: v2.0.0-alpha.35 < ^2.0.0-alpha.41 when i try to build
Rikke
Rikke 11/29/2022 01:09 PM
Or when it compiles*
floitsch
floitsch 11/29/2022 01:10 PM
I uploaded a branch with alpha.35 as requirement. Let's see if it the build goes green.
floitsch
floitsch 11/29/2022 01:10 PM
If it does, then I can just release a new version with lowered constraints.
Rikke
Rikke 11/29/2022 01:11 PM
I'm not sure how to use a package from a specific branch :🤔:
floitsch
floitsch 11/29/2022 01:11 PM
Looks clean.
Rikke
Rikke 11/29/2022 01:11 PM
Oh okay, you were trying it
floitsch
floitsch 11/29/2022 01:12 PM
Just fyi: if you want to experiment with a modified version of a package, the easiest is generally to check the package out, and then to install it with jag pkg install --local --name mqtt ../mqtt
floitsch
floitsch 11/29/2022 01:12 PM
where mqtt is the name under which it would be used in your project, and ../mqtt is the path to the checked out version.
👍1
floitsch
floitsch 11/29/2022 01:13 PM
Since I just uploaded a branch with lower constraints that worked, I will request a review and then release a new version for the package manager soon after.
floitsch
floitsch 11/29/2022 01:13 PM
so no need to do this on your side
Rikke
Rikke 11/29/2022 01:13 PM
Okay thank you very much! :🙂:
👍1
floitsch
floitsch 11/29/2022 01:18 PM
v2.1.1 should be live.(edited)
floitsch
floitsch 11/29/2022 01:18 PM
Please let me know if you see weird things.
Rikke
Rikke 11/29/2022 01:20 PM
I will start testing now :🙂:
Rikke
Rikke 11/30/2022 07:38 AM
I noticed our fix with with_timeout around publish does not work anymore with v2.1.1, it never times out and the esp never recovers, not even when internet is back :🤔: but I haven't looked further into this yet, but I will today
floitsch
floitsch 11/30/2022 10:33 AM
Thanks for testing.
Interesting. If you have the time to look into it a bit more, that would be great.
Let me know when you want me to have a look.
Rikke
Rikke 11/30/2022 12:12 PM
I will have to push it to tomorrow, but here's some info:
I am blocking the mqtt port 1883 to disconnect the esp from mqtt, and when I unblock it again, it reports this:
2022-11-30 11:48:39,808 DEBUG: Connected to broker 2022-11-30 11:48:39,814 DEBUG: Attempting to (re)connect 2022-11-30 11:48:39,825 Heap report @ out of memory: 2022-11-30 11:48:39,838 ┌───────────┬─────────┬───────────────────────┐ 2022-11-30 11:48:39,843 │ Bytes │ Count │ Type │ 2022-11-30 11:48:39,855 ├───────────┼─────────┼───────────────────────┤ 2022-11-30 11:48:39,860 │ 5256 │ 4 │ external byte array │ 2022-11-30 11:48:39,866 │ 102400 │ 19 │ toit │ 2022-11-30 11:48:39,871 │ 8952 │ 37 │ lwip │ 2022-11-30 11:48:39,876 │ 7992 │ 723 │ heap overhead │ 2022-11-30 11:48:39,881 │ 7864 │ 51 │ event source │ 2022-11-30 11:48:39,886 │ 32872 │ 313 │ thread/other │ 2022-11-30 11:48:39,891 │ 22304 │ 21 │ thread/spawn │ 2022-11-30 11:48:39,896 │ 23280 │ 159 │ untagged │ 2022-11-30 11:48:39,901 │ 33864 │ 81 │ wifi │ 2022-11-30 11:48:39,914 └───────────┴─────────┴───────────────────────┘ 2022-11-30 11:48:39,921 Total: 244784 bytes in 685 allocations (82%), largest free 52k, total free 54k 2022-11-30 11:48:39,924 DEBUG: Connection established 2022-11-30 11:48:39,926 INFO: disconnect from server 2022-11-30 11:48:39,929 DEBUG: Attempting to (re)connect 2022-11-30 11:48:39,937 DEBUG: Attempting to (re)connect
It keeps giving the Heap report @ out of memory, so I think it tries to reconnect multiple times with the same client, and having multiple clients with same name will disconnect each other.
floitsch
floitsch 11/30/2022 12:14 PM
The new client is probably more aggressive in trying to reconnect.
floitsch
floitsch 11/30/2022 12:14 PM
If you don't close it explicitly, it will continue to do so.
floitsch
floitsch 11/30/2022 12:14 PM
The old one gave up after some tries.
floitsch
floitsch 11/30/2022 12:14 PM
So if you create a new client each time the publish doesn't work but don't close the old one, you might end up with many clients trying to run in parallel.
Rikke
Rikke 11/30/2022 12:15 PM
Yup, the old one completely disconnects after some tries
Rikke
Rikke 11/30/2022 12:15 PM
We dont run publish if we are disconnected from mqtt
Rikke
Rikke 11/30/2022 12:17 PM
But I think the problem lies with the catch around publish, it does not seem to work anymore. It didnt catch with_timeout, but I didnt have time to look further into this. So if we cant detect a disconnect with catch, we may be using publish when disconnected, and making multiple clients.
floitsch
floitsch 11/30/2022 12:19 PM
There are a few things here:
- the client uses a reconnection-strategy. That one is configurable, but is now (by default) the "tenacious" strategy, which will never give up. It just tries over and over again, incrementing the delay between attempts by 1 second.
- as long as the reconnection strategy is trying, a publish call doesn't fail. However, the publish is blocked at that point.
floitsch
floitsch 11/30/2022 12:20 PM
I don't know why you are seeing the out-of-memory. If it's related to the mqtt client, then that's clearly something to look into.
floitsch
floitsch 11/30/2022 12:20 PM
How did you catch exceptions from the publish? Was it with a with_timeout or was it a different one?
floitsch
floitsch 11/30/2022 12:21 PM
Note, that you can go back to the old reconnection strategy by passing it into the constructor or start function. (Don't remember which one).
Rikke
Rikke 11/30/2022 12:25 PM
We are now using with_timeout to catch, but I think we will go back to old strategy for connecting. I will look into the new strategy later, but for now we want the client to close.
floitsch
floitsch 11/30/2022 12:26 PM
In that context: the new client also has an option to set the maximum number of inflight packets.
floitsch
floitsch 11/30/2022 12:26 PM
That is, packets that haven't been acked yet.
floitsch
floitsch 11/30/2022 12:26 PM
This makes the publish block if there are too many inflight messages.
floitsch
floitsch 11/30/2022 12:27 PM
If you want to, we can schedule a VC tomorrow, and discuss what you need, and how best to achieve it.
Rikke
Rikke 11/30/2022 12:29 PM
Yeah the inflight packets is what we are excited for :🙂: but I dont know when I have time to look at it. A VC next week would be great I think, I will see tomorrow if thats okay
floitsch
floitsch 11/30/2022 12:29 PM
Sure. Just ping me.
Rikke
Rikke 11/30/2022 12:29 PM
Great thanks! :👍:
Rikke
Rikke 12/01/2022 01:05 PM
We would like to have a chat on Monday @floitsch :🙂: So around 9:00-10:00 on Monday ?
floitsch
floitsch 12/01/2022 01:08 PM
Would it be possible to move it to a bit later?
Rikke
Rikke 12/01/2022 01:09 PM
Sure, when do you have time?
floitsch
floitsch 12/01/2022 01:09 PM
Anything after 10:00 works. Preferred is 11:00+
Rikke
Rikke 12/01/2022 01:12 PM
How about 13:00?
floitsch
floitsch 12/01/2022 01:14 PM
Perfect
👌1
68 messages in total