guild icon
Toit
#Jaguar routing problem, followed by container stopping / not restarting
Thread channel in help
addshore
addshore 06/16/2025 02:12 PM
I have added a watchdog to the container now so I expect it wont cause me an issue again, however I'm confused about what has happeend here

The script was running, as shown by the INFO log line
Then jaguar seems to have a panic, Routing problem and tries to reconnect? fails? and backs off?
Then I see my container print udptocsv after which it tries to emit this over UDP
Then there are logs to do with wifi reconnecting? After which jaguar seems to be back up?

However the container is then stuck?
I guess it is stuck here
socket.send udp.Datagram row.to-byte-array address
addshore
addshore 06/16/2025 02:12 PM
Logs for the story above...

INFO: Took 43.358ms, Sleeping for additional 1.956642s [jaguar] WARN: running Jaguar failed due to 'Routing problem' (1/3) [jaguar.http] INFO: running Jaguar device 'clever-weather' (id: '510b3170-cbd5-4a05-a82a-709b975e45bc') on 'http://192.168.68.71:9000' [jaguar] WARN: running Jaguar failed due to 'Routing problem' (2/3) [jaguar.http] INFO: running Jaguar device 'clever-weather' (id: '510b3170-cbd5-4a05-a82a-709b975e45bc') on 'http://192.168.68.71:9000' [jaguar] WARN: running Jaguar failed due to 'Routing problem' (3/3) [jaguar] INFO: backing off for 5s udptocsv,1,2c54204d,2025-06-16 13:51:58,0, 36.799999999999997158, 3.6335000000000001741, 0.17612500000000000377 Decoding by jag, device has version <2.0.0-alpha.174> EXCEPTION error. 0: udp-send_ <sdk>\net\modules\udp.toit:167:37 2: Socket.send <sdk>\net\modules\udp.toit:88:12 3: emitData udpAds.toit:194:10 4: innerMain udpAds.toit:170:7 5: main udpAds.toit:119:3 ****************************************************************************** [wifi] DEBUG: closing E (7210621) wifi:NAN WiFi stop [wifi] DEBUG: connecting [wifi] DEBUG: connected [wifi] INFO: network address dynamically assigned through dhcp {ip: 192.168.68.71} [wifi] INFO: dns server address dynamically assigned through dhcp {ip: [8.8.8.8, 1.1.1.1]} [jaguar.http] INFO: running Jaguar device 'clever-weather' (id: '510b3170-cbd5-4a05-a82a-709b975e45bc') on 'http://192.168.68.71:9000'
floitsch
floitsch 06/16/2025 02:38 PM
My initial guess is that the script also hit a WiFi issue, and then crashed when it couldn't send the UDP.
addshore
addshore 06/16/2025 09:49 PM
So my initial thought (as this is a container) would have been that the EXCEPTION error. would trigger a crash and for the container to restart?
floitsch
floitsch 06/16/2025 09:50 PM
Jaguar doesn't automatically restart containers iirc
addshore
addshore 06/17/2025 07:34 AM
oh right :๐Ÿ˜›:
So to acheive that ultimately I need something like this..?
https://docs.toit.io/tutorials/containers/toit-sdk#starting-containers
And to continually check my containers are running?

I have potentially been overlooking something this whole time!
addshore
addshore 06/17/2025 07:38 AM
And or this can in another way be done via watchdogs I guess?
as if each container always has a watchdog that should be expected, then it would always restart on any container failure?
floitsch
floitsch 06/17/2025 07:41 AM
You don't need to continuously check: you are already waiting for it (container.wait), so you can just put that code into a loop.
Alternatively, you can also provide a patch to Jaguar to do that for you. If it's small enough I would accept it.

Watchdogs would work too. If your program is supposed to check in every few seconds/minutes, but doesn't the device will eventually reset.
addshore
addshore 06/17/2025 07:44 AM
yeah, watchdogs feel a little fragile, as for example if a crash happens in a brief moment that you happen to have stopped then, that isnt ideal and I guess an expected state would then never recover unless you have something else checking?
addshore
addshore 06/17/2025 07:47 AM
What does the --critical part of a container install result in?(edited)
floitsch
floitsch 06/17/2025 07:47 AM
Not sure I understand.
You generally start a watchdog at a good moment, and then let some important part of your code feed the dog. Depending on the needs of your device the interval at which the dog needs to be fed can be relatively large. The main thing is that you want to recover from a stuck device (or a device where your program isn't running).
For example, if you are supposed to offload data every hour, then you can set a watchdog and set it to 5h and feed it whenever data has been offloaded. If that doesn't happen you will reset and get back into a fresh state.
floitsch
floitsch 06/17/2025 07:48 AM
If your program crashes and you restart the container, that's fine. Watchdogs are name-based. So if you reopen the same watchdog and feed it again, the dog is happy. It doesn't need to be the same original container that does the feeding.
addshore
addshore 06/17/2025 07:49 AM
So far our usecases are more on the side of, a device should be actively working constantly, multiple things happening a second, and if it breaks for some unexpected reason, we really want that to restart asap
addshoreOPaddshore
yeah, watchdogs feel a little fragile, as for example if a crash happens in a brief moment that you happen to have stopped then, that isnt ideal and I guess an expected state would...
floitsch
floitsch 06/17/2025 07:49 AM
iirc: 'critical' means that the program should run all the time.
addshoreOPaddshore
So far our usecases are more on the side of, a device should be actively working constantly, multiple things happening a second, and if it breaks for some unexpected reason, we rea...
floitsch
floitsch 06/17/2025 07:49 AM
you shouldn't rely on watchdogs alone.
floitsch
floitsch 06/17/2025 07:50 AM
in fact they are typically your last recourse.
addshore
addshore 06/17/2025 07:50 AM
If your program crashes and you restart the container, that's fine. Watchdogs are name-based. So if you reopen the same watchdog and feed it again, the dog is happy. It doesn't need to be the same original container that does the feeding.

Right, but that implies that you then already have something else watching to restart the container?
If so, I think I might look at adding that to jag, seems like it might make sense
addshore
addshore 06/17/2025 07:50 AM
for our usecase right now, we count all containers as critical, as when they shouldnt be running, the second processor turns off the ESP anyway :๐Ÿ˜„:
floitsch
floitsch 06/17/2025 07:51 AM
If you install containers without Jaguar, then --critical should do. I think.
floitsch
floitsch 06/17/2025 07:51 AM
Otherwise, it's Jaguar that needs to do the loop.
addshore
addshore 06/17/2025 07:52 AM
aaah right, so perhaps there is another thing I misunderstood.
When you said
Jaguar doesn't automatically restart containers iirc
Does that mean normally they do restart on a system without jaguar? (Or is that then when --critical comes into play?
floitsch
floitsch 06/17/2025 07:52 AM
Note that you sometimes need to be careful: having a hello world container as critical would lead to infinite restarts of the container with "hello world" flooding your output.
๐Ÿ‘1
addshoreOPaddshore
aaah right, so perhaps there is another thing I misunderstood. When you said Jaguar doesn't automatically restart containers iirc Does that mean normally they do restart on a sys...
floitsch
floitsch 06/17/2025 07:53 AM
On other systems (like Artemis), and maybe with the envelope installation there are other means, but none automatically restart without additional flags.
๐Ÿ‘1
floitsch
floitsch 06/17/2025 07:53 AM
In Artemis I tend to do an interval: 1s.(edited)
floitsch
floitsch 06/17/2025 07:54 AM
If the container is unable to run (for example, because some peripheral isn't working), then it has a delay of 1s before it tries again, instead of thousands of attempts.
๐Ÿ‘1
addshore
addshore 06/17/2025 07:54 AM
I'm looking for where this interval goes in artemis now, as I havnt use it yet, and we also might not!
floitsch
floitsch 06/17/2025 07:55 AM
It's a trigger.
floitsch
floitsch 06/17/2025 07:55 AM
Last line of this document:
https://docs.toit.io/getstarted/fleet/pods
addshore
addshore 06/17/2025 07:56 AM
Right, yes, so that seems like the kind of behaviour I want, but possibly without needing / using artemis?
addshore
addshore 06/17/2025 07:56 AM
Or perhaps I should be using artemis even if I am not using it for OTA updates etc?
floitsch
floitsch 06/17/2025 08:00 AM
I have been planning, for a long time, to extract the pod-specification from Artemis and move/copy it to Toit.
If you don't use its updating mechanism, I would probably not use it. Feels like unnecessary resources. But a lot of the ideas of Artemis are really nice.
I will think a bit more about it. Maybe I will change my opinion. My biggest concern is that Artemis connects to the server when it boots. So you kind-of run into the issue that you need some kind of server.
floitsch
floitsch 06/17/2025 08:00 AM
Btw how do you do firmware updates?
floitschfloitsch
Btw how do you do firmware updates?
addshore
addshore 06/17/2025 08:01 AM
So we have our primary processor put the ESP in bootloader, flash the stub, then flash the fw
the second processor retrieves if over its cellular connection, and send to esp over i2c
(edited)
floitsch
floitsch 06/17/2025 08:02 AM
I see.
floitsch
floitsch 06/17/2025 08:02 AM
Funny. Usually I would have expected it to be the other way round.
addshore
addshore 06/17/2025 08:05 AM
Yeah, for various reasons the primary processor for now will remain in ultimate control, and ultimately its its jobs to make sure that connectivity works, and also manage battery usage etc

So a couple of common usecases are
1) Having the ESP powered on and actively working for lets say an 8h period, where its sending messages back and forth over i2c 10s a second, including sending messages that get forwarded over cellular
2) Having the ESP mostly powered off, powering it on at a set interval, having container start, run a bunch of customer define logic within a sec time period and power window, and then turning it off again

There will be / is also then a base firmware that is on the ESP prior to client code living there that does a baseline set of messaging, to do with wifi scans, BT scans etc
addshore
addshore 06/17/2025 08:07 AM
So for now likely a "good" pattern for me to try, might be a management container that is run with --critical which in turn just slowly loops making sure containers are running, very similar to what artemis is doing with its interval
floitsch
floitsch 06/17/2025 08:08 AM
If you are using Jaguar, I would probably try to add an interval flag.
If you are using envelopes, then I would probably just install the container with --critical.
addshore
addshore 06/17/2025 08:09 AM
Is the interval part of the container install command? Or jag.interval?
floitsch
floitsch 06/17/2025 08:09 AM
I would make it part of the container install
addshore
addshore 06/17/2025 08:09 AM
oh, or you mean I add it :๐Ÿ˜„:
addshore
addshore 06/17/2025 08:09 AM
yeah, ok!
floitsch
floitsch 06/17/2025 08:09 AM
If it is too complicated, don't bother.
floitsch
floitsch 06/17/2025 08:10 AM
But might not be hard to add.
addshore
addshore 06/17/2025 08:10 AM
and --critical already works with container install correct? Its just undocumented in the CLI?
floitsch
floitsch 06/17/2025 08:10 AM
It works for the envelope container install.
๐Ÿ‘1
floitschfloitsch
It works for the envelope container install.
addshore
addshore 06/17/2025 08:10 AM
that explains it, im looking at the wrong command again
floitsch
floitsch 06/17/2025 08:10 AM
it is documented:
floitsch
floitsch 06/17/2025 08:11 AM
แ… toit tools firmware container install -h Add a container to the envelope. Usage: toit tool firmware --envelope=<file> container install [<options>] [--] <name> <image:file> Aliases: add Options: --assets file Add assets to the container. --critical Reboot system if the container terminates. -h, --help Show help for this command. -o, --output file Set the output envelope. --trigger none|boot Trigger the container to run automatically. (default: boot) Rest: image file (required) name string (required) Global options: -e, --envelope file Set the envelope to work on. (required) --output-format human|plain|json Specify the format used when printing to the console. (default: human) --verbose Enable verbose output. Shorthand for --verbosity-level=verbose. --verbosity-level debug|info|verbose|quiet|silent Specify the verbosity level. (default: info)
floitsch
floitsch 06/17/2025 08:11 AM
That said: apparently it's "reboot system if the container terminates".
floitsch
floitsch 06/17/2025 08:11 AM
Would need to experiment if that's the case or if it just restarts the container.
๐Ÿ‘1
addshore
addshore 06/17/2025 08:16 AM
Yeah, a slightly different option would be another flag entirely
something similar to https://docs.docker.com/engine/containers/start-containers-automatically/#use-a-restart-policy
How to start containers automatically
addshore
addshore 06/17/2025 08:16 AM
which feels slightly different to trigger or critical right now
floitsch
floitsch 06/17/2025 08:25 AM
Would work too. We should probably use that for Artemis to make it more similar.
addshore
addshore 06/17/2025 09:47 AM
So, whats the right way to develop jag both in terms of the CLI but then also the jag container on a device
Should I just be updating the container on the ESP by "installing" my modified container over the top of it?
addshore
addshore 06/17/2025 09:49 AM
well, I guess that won't work, as I'd be using jag to update jag :๐Ÿคฃ:
SO I guess it is re flash each time?
addshore
addshore 06/17/2025 09:49 AM
AAah, so probably it makes the most sense to use use the emulated device stuff
floitsch
floitsch 06/17/2025 09:54 AM
I typically jag firmware update.
Could be that there are better ways.
addshore
addshore 06/17/2025 09:58 AM
I assume I need to do a full make for deving both the cli and container
But again running into https://pastebin.com/v9azKjKJ
Which I worked around last time bu just doing make jag instead
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
addshore
addshore 06/17/2025 10:09 AM
so, that's fixable after this hack?
(master) cd toit/tools && jag toit pkg install (master) cd toit/system/extensions/host && jag toit pkg install (master) make

Then toit makes, and then make works in the jaguar repo again too
Now sure how to correctly put a fix in for that, or document it though
floitsch
floitsch 06/17/2025 10:24 AM
So we are missing the install-packages when doing make? Sounds easy to fix
๐Ÿ’Ÿ1
addshore
addshore 06/17/2025 10:44 AM
Adding --interval to jaguar https://github.com/toitlang/jaguar/pull/624
This mimics the interval options that currently exist
within artemis.
You can pass a time period, which jaguar will use to ensure
that your container is running. If it isn&amp;#39;t running its sta...
addshore
addshore 06/17/2025 10:45 AM
i'd be happy to implement some sort of restart policy too, however interval probably works well enough for me right now :๐Ÿ˜„:
floitsch
floitsch 06/17/2025 11:07 AM
Edit: removed message that was for another thread.(edited)
addshore
addshore 06/17/2025 11:23 AM
I was about to say that this device isn't doing any I2C stuff, but it is!
The only E: related message we saw is the wifi one

[wifi] DEBUG: closing E (7210621) wifi:NAN WiFi stop [wifi] DEBUG: connecting
addshoreOPaddshore
I was about to say that this device isn't doing any I2C stuff, but it is! The only E: related message we saw is the wifi one ``` [wifi] DEBUG: closing E (7210621) wifi:NAN WiFi st...
floitsch
floitsch 06/17/2025 12:00 PM
my mistake. I answered to the wrong thread.
66 messages in total