guild icon
Toit
#Regular DEADLINE_EXCEEDED when using MQTT
Thread channel in help
PeterJ
PeterJ 10/21/2024 06:22 AM
Is anyone else getting a lot of DEADLINE_EXCEEDED in conjunction with mqtt using cellular, like this:
[cellular] DEBUG: <- +QIRD [1320]
[upload.mqtt] DEBUG: closing connection {reason: DEADLINE_EXCEEDED}
[cellular] DEBUG: -> AT+QICLOSE=0,0
[cellular] DEBUG: -> AT+QICLOSE=1,0

**
Decoding by jag, device has version <2.0.0-alpha.163>
**
EXCEPTION error.
AT_COMMANDTIMEOUT
0: Session.send.<block> <pkg:cellular>\base\at.toit:290:22
1: Session.send
<pkg:cellular>\base\at.toit:334:22
2: Session.send <pkg:cellular>\base\at.toit:289:12
3: UdpSocket.close.<block> <pkg:cellular>\modules\quectel\quectel.toit:309:14
4: Locker.do.<monitor-block> <pkg:cellular>\base\at.toit:511:12
PeterJOPPeterJ
Is anyone else getting a lot of DEADLINE_EXCEEDED in conjunction with mqtt using cellular, like this: [cellular] DEBUG: <- +QIRD [1320] [upload.mqtt] DEBUG: closing connection {rea...
floitsch
floitsch 10/21/2024 10:01 AM
The stacktrace seems to be from this line (not 100% certain, since 'main' and your code seem to be different versions):
https://github.com/toitware/cellular/blob/f93494904352edd71746024a07c0af5c6b1b327b/src/modules/quectel/quectel.toit#L398C28-L398C33

So basically, we don't seem to get the OK back when shutting down the modem.

Not clear what that could be. Maybe earlier errors that bring the modem into a bad state.

My guess is that there is a timeout when writing a message, which leads to the [upload.mqtt] DEBUG: closing connection ... line, which in turn tries to shut down the modem, which then too hits a timeout.
So fundamentally it looks like the modem is not responding. The information here doesn't give enough information on what could be the reason.

What version of the cellular package are you using? (just to know whether there were bug-fixes that have been committed since then).

Also: is that something new, or did the behavior change with a recent update?
Contribute to toitware/cellular development by creating an account on GitHub.
PeterJ
PeterJ 10/21/2024 11:24 AM
I'm using this branch of the cellular https://github.com/toitware/cellular/tree/kasperl-merge-upstream/src
This branch have not been updated for the past 5 months
(edited)
Contribute to toitware/cellular development by creating an account on GitHub.
PeterJOPPeterJ
I'm using this branch of the cellular https://github.com/toitware/cellular/tree/kasperl-merge-upstream/src This branch have not been updated for the past 5 months(edited)
floitsch
floitsch 10/21/2024 11:39 AM
@bitphlipphar Do you remember if there were important bug-fixes that were committed to main since then?
PeterJ
PeterJ 11/20/2024 11:08 AM
Hi @floitsch do we have an update on this issue?
floitsch
floitsch 11/20/2024 11:12 AM
Not really. @bitphlipphar ping.

Also, in order to debug this we need to know whether this is something that is new or if that behavior was always there.

From the information we have here it just looks like some AT commands are timing out.
PeterJ
PeterJ 11/20/2024 11:13 AM
It has always been there. It is causing a lot of reconnects on our devices as they are not able to empty their telemetry and incident channels and is a bit annoying :๐Ÿ™‚:
floitsch
floitsch 11/20/2024 11:17 AM
Does it happen on every device?
Does it depend on the provider?
Do all logs have the same sequence of events that lead to this error?
Is there anything that sticks out that could cause this issue?

Fundamentally we can't really do anything without more information or access to a device that reproduces this. (Unless Kasper has an idea).
PeterJ
PeterJ 11/20/2024 11:23 AM
It happens on all devices regardless of the network provider. US, MY, AUS, EU same.
I can come by and show it to you or @bitphlipphar if you want?
floitsch
floitsch 11/20/2024 11:29 AM
We just need a device and instructions on how to repro.
10 messages in total