guild icon
Toit
#Constant Out of Memory errors - How to optimize?
Thread channel in help
AmusedGrape
AmusedGrape 03/28/2024 09:24 PM
I keep getting tons of memory crashes. How can I optimize my code? Its already pretty minimal so I'm not sure what to do.

Crash log example:
Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 640 │ 1 │ external byte array │ │ 2088 │ 12 │ tls/bignum │ │ 53248 │ 15 │ toit processes │ │ 16384 │ 4 │ system 0 b476094f-1061-696e-a209-3f0ad2a0162d │ │ 24576 │ 6 │ current 1 68a2c931-1e01-4f20-cfbd-6a64b500a955 │ │ 12288 │ 3 │ other 2 836308cc-d8d0-e73d-6648-17a1bcd6d444 │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 6088 │ 21 │ lwip │ │ 9608 │ 837 │ heap overhead │ │ 1944 │ 29 │ event source │ │ 50280 │ 390 │ thread/other │ │ 30984 │ 25 │ thread/spawn │ │ 51368 │ 226 │ untagged │ │ 11936 │ 56 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 238664 bytes in 775 allocations (93%), largest free 2k, total free 19k
floitsch
floitsch 03/28/2024 09:28 PM
@erikcorry might have suggestions.

To me it looks like there is some free memory, but it's not contiguous.

It also looks like the device is in the process of starting a TLS connection. Could be the one for the Artemis update check.

If you still have max-offline set to 0s consider increasing it, do that you doing need a full TLS connection all the time.
floitsch
floitsch 03/28/2024 09:28 PM
Is this happening at boot time, or later?
floitsch
floitsch 03/28/2024 09:28 PM
If you want to, you can send me your code and I could look whether memory is lost somewhere.
floitsch
floitsch 03/28/2024 09:29 PM
My email is [email protected].
floitsch
floitsch 03/28/2024 09:31 PM
Also. If you don't use Bluetooth you can recover some memory by using a different envelope. (That memory is reserved even if you don't use BLE)(edited)
floitsch
floitsch 03/28/2024 09:32 PM
If you add envelope: esp32-no-ble to your pod specification that would use the envelope without BLE support.
AmusedGrape
AmusedGrape 03/28/2024 09:59 PM
Could be the max-offline. I do need BLE though unfortunately. I'll increase the max-offline time and report back.
AmusedGrape
AmusedGrape 03/29/2024 06:26 PM
I don't mind sending the code here- it'll be open sourced anyways:

import ble import net import http import encoding.hex import certificate-roots import artemis SCAN-DURATION ::= Duration --s=20 main args: certificate-roots.install-common-trusted-roots adapter := ble.Adapter central := adapter.central network := net.open client := http.Client network headers := http.Headers if args[1] == null or args[1] == "": throw "Authorization header is required" exit 1 headers.add "Authorization" args[1] headers.add "Device-ID" artemis.device.id.to-string try: session := client.web-socket --uri=args[0] --headers=headers // ws://192.168.0.58:8080/ws try: central.scan --duration=SCAN-DURATION: | device/ble.RemoteScannedDevice | session.send (hex.encode device.data.manufacturer-data) finally: session.close finally: network.close

I can't think of any reason why I would continue to get OOM errors- maybe the hex encoding?

OOM Error:
Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 2176 │ 2 │ external byte array │ │ 16936 │ 23 │ tls/bignum │ │ 49152 │ 14 │ toit processes │ │ 16384 │ 4 │ system 0 b476094f-1061-696e-a209-3f0ad2a0162d │ │ 24576 │ 6 │ other 1 68a2c931-1e01-4f20-cfbd-6a64b500a955 │ │ 8192 │ 2 │ current 4 836308cc-d8d0-e73d-6648-17a1bcd6d444 │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 8048 │ 25 │ lwip │ │ 8136 │ 657 │ heap overhead │ │ 1968 │ 30 │ event source │ │ 44592 │ 186 │ thread/other │ │ 36600 │ 29 │ thread/spawn │ │ 52344 │ 228 │ untagged │ │ 12232 │ 59 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 252664 bytes in 596 allocations (98%), largest free 1k, total free 6k
AmusedGrape
AmusedGrape 03/29/2024 06:27 PM
I'm triggering the container every 60 seconds however the error happens more often than that
AmusedGrape
AmusedGrape 03/29/2024 06:29 PM
And for context- I'm working on a Crowd Tracker for my (small) college, basically just scanning BLE devices and sending the manuf data to the server for data parsing
floitsch
floitsch 03/29/2024 06:30 PM
I'm looking.
🙏1
floitsch
floitsch 03/29/2024 06:32 PM
The looks benign.
I'm starting to worry that we leak memory with central.scan.
floitsch
floitsch 03/29/2024 06:32 PM
That said. With BLE you never know...
AmusedGrape
AmusedGrape 03/29/2024 06:32 PM
That's what I'm thinking
AmusedGrape
AmusedGrape 03/29/2024 06:32 PM
right hah
floitsch
floitsch 03/29/2024 06:34 PM
The tls/bignum memory (~17k) seems to indicate that Artemis is in the process of connecting to the broker.
AmusedGrape
AmusedGrape 03/29/2024 06:36 PM
If this helps-
[artemis.scheduler] INFO: job started {job: container:tracker} Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 2176 │ 2 │ external byte array │ │ 16936 │ 23 │ tls/bignum │ │ 53248 │ 15 │ toit processes │ │ 20480 │ 5 │ system 0 b476094f-1061-696e-a209-3f0ad2a0162d │ │ 24576 │ 6 │ other 1 68a2c931-1e01-4f20-cfbd-6a64b500a955 │ │ 8192 │ 2 │ current 12 836308cc-d8d0-e73d-6648-17a1bcd6d444 │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 6080 │ 21 │ lwip │ │ 8136 │ 657 │ heap overhead │ │ 1968 │ 30 │ event source │ │ 45584 │ 206 │ thread/other │ │ 36600 │ 29 │ thread/spawn │ │ 50536 │ 225 │ untagged │ │ 12232 │ 59 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 253976 bytes in 610 allocations (99%), largest free 2k, total free 5k ****************************************************************************** Decoding by `jag`, device has version <2.0.0-alpha.142> ****************************************************************************** Allocation failed:. 0: Session.read-handshake-message_ <sdk>/tls/session.toit:640:15 1: Session.handshake_.<block> <sdk>/tls/session.toit:338:11 2: Task_.with-deadline_.<block> <sdk>/core/task.toit:223:16 3: Task_.with-deadline_ <sdk>/core/task.toit:217:3 4: with-timeout <sdk>/core/utils.toit:182:24 5: with-timeout <sdk>/core/utils.toit:165:12 6: Session.handshake_ <sdk>/tls/session.toit:337:9 7: Session.handshake.<block> <sdk>/tls/session.toit:281:7 8: Session.handshake <sdk>/tls/session.toit:227:3 9: Socket.handshake <sdk>/tls/socket.toit:69:14 10: Client.try-to-reuse_.<block>.<block> <pkg:pkg-http-2.5.1>/client.toit:651:24 11: catch.<block> <sdk>/core/exceptions.toit:124:10 12: catch <sdk>/core/exceptions.toit:122:1 13: catch <sdk>/core/exceptions.toit:85:10 14: Client.try-to-reuse_.<block> <pkg:pkg-http-2.5.1>/client.toit:646:9 15: Client.try-to-reuse_ <pkg:pkg-http-2.5.1>/client.toit:634:3 16: Client.web-socket_.<block> <pkg:pkg-http-2.5.1>/client.toit:367:7 17: SmallInteger_.repeat <sdk>/core/numbers.toit:1194:3 18: Client.web-socket_ <pkg:pkg-http-2.5.1>/client.toit:364:19 19: Client.web-socket <pkg:pkg-http-2.5.1>/client.toit:336:12 20: main.<block> Client/tracker.toit:27:23 21: main Client/tracker.toit:10:1 ****************************************************************************** [artemis.scheduler] INFO: job stopped {job: container:tracker} [artemis.synchronize] INFO: synchronized [artemis.synchronize] INFO: synchronized [artemis.synchronize] INFO: synchronized
floitsch
floitsch 03/29/2024 06:37 PM
You could try to disable Artemis-synchronization while running your program:
import artemis main args: with-timeout (Duration --m=2): artemis.run --offline: actual-main args actual-main args: // Original main ...
AmusedGrape
AmusedGrape 03/29/2024 06:38 PM
ill try that
floitsch
floitsch 03/29/2024 06:39 PM
Ah.
I see now that the websocket server you connect to is a TLS server.
floitsch
floitsch 03/29/2024 06:39 PM
So the memory comes from there.
AmusedGrape
AmusedGrape 03/29/2024 06:39 PM
Yeah the production version connects to that, dev connects to an insecure server
floitsch
floitsch 03/29/2024 06:40 PM
Do you know where these names ("other", "current") come from?:
│ 24576 │ 6 │ other 1 68a2c931-1e01-4f20-cfbd-6a64b500a955 │ │ 8192 │ 2 │ current 12 836308cc-d8d0-e73d-6648-17a1bcd6d444 │
AmusedGrape
AmusedGrape 03/29/2024 06:40 PM
I don't unfortunately
floitsch
floitsch 03/29/2024 06:40 PM
I'm guessing 68a2c931... is the tracker.toit ?
AmusedGrape
AmusedGrape 03/29/2024 06:40 PM
Might be?
AmusedGrape
AmusedGrape 03/29/2024 06:40 PM
How could I check
floitsch
floitsch 03/29/2024 06:41 PM
good question. I'm not sure we print it when we build the pod.
AmusedGrape
AmusedGrape 03/29/2024 06:41 PM
Ah- current is the app ID
AmusedGrape
AmusedGrape 03/29/2024 06:42 PM
DEBUG: current state is changed {changes: { apps: +{tracker: {id: 836308cc-d8d0-e73d-6648-17a1bcd6d444, triggers: {interval: 60}, arguments: [wss://..., ...]}} }}
floitsch
floitsch 03/29/2024 06:43 PM
I see. So your app is using 8k.
And something else is using 24.5K.
floitsch
floitsch 03/29/2024 06:43 PM
That must be Artemis.
floitsch
floitsch 03/29/2024 06:43 PM
I don't remember it being that memory hungry.(edited)
floitsch
floitsch 03/29/2024 06:43 PM
Let me check what I have.
AmusedGrape
AmusedGrape 03/29/2024 06:44 PM
yeah the system is only using 16k so its gotta be artemis
AmusedGrape
AmusedGrape 03/29/2024 06:46 PM
let me try running the same code without artemis
floitsch
floitsch 03/29/2024 06:50 PM
unfortunately I also have ~20K.
floitsch
floitsch 03/29/2024 06:51 PM
I guess we will spend a bit of time over the next weeks to trim that down a bit...
AmusedGrape
AmusedGrape 03/29/2024 06:52 PM
Here's without artemis, still crashing:
Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 1536 │ 1 │ external byte array │ │ 36864 │ 11 │ toit processes │ │ 16384 │ 4 │ system 0 d76b3d7e-9719-6d20-835d-fac4a92e544c │ │ 12288 │ 3 │ current 1 224bbb94-c28a-2f47-fd16-27d72746f31f │ │ 8192 │ 2 │ other 2 22d830b6-90fa-da4b-ef98-ea7d9a1659cf │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 34752 │ 72 │ lwip │ │ 8104 │ 648 │ heap overhead │ │ 2024 │ 29 │ event source │ │ 41752 │ 153 │ thread/other │ │ 31000 │ 25 │ thread/spawn │ │ 49992 │ 230 │ untagged │ │ 11952 │ 56 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 238456 bytes in 577 allocations (93%), largest free 28k, total free 41k
floitsch
floitsch 03/29/2024 06:53 PM
When you say "without Artemis", how do you run the program?
floitsch
floitsch 03/29/2024 06:54 PM
This looks strange, though:
Total: 238456 bytes in 577 allocations (93%), largest free 28k, total free 41k
It looks like something wants to allocate more than 28K.
floitschfloitsch
When you say "without Artemis", how do you run the program?
AmusedGrape
AmusedGrape 03/29/2024 06:57 PM
jag container install tracker ./Client/tracker.toit
floitsch
floitsch 03/29/2024 06:58 PM
Ok. So it's running the Jaguar container instead of the Artemis container.
floitsch
floitsch 03/29/2024 06:58 PM
can you do a jag container list ?
AmusedGrape
AmusedGrape 03/29/2024 06:59 PM
DEVICE IMAGE NAME lucid-silver 22d830b6-90fa-da4b-ef98-ea7d9a1659cf tracker lucid-silver 224bbb94-c28a-2f47-fd16-27d72746f31f jaguar
floitsch
floitsch 03/29/2024 06:59 PM
Ok, so it was Jaguar running out of memory (current == "224...")
floitsch
floitsch 03/29/2024 06:59 PM
I don't understand why it was trying to allocate 28K.
floitsch
floitsch 03/29/2024 07:00 PM
was there a stacktrace?
AmusedGrape
AmusedGrape 03/29/2024 07:00 PM
[jaguar] INFO: container 'tracker' installed and started Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 56 │ 2 │ tls/bignum │ │ 45056 │ 13 │ toit processes │ │ 16384 │ 4 │ system 0 d76b3d7e-9719-6d20-835d-fac4a92e544c │ │ 20480 │ 5 │ other 1 224bbb94-c28a-2f47-fd16-27d72746f31f │ │ 8192 │ 2 │ current 3 22d830b6-90fa-da4b-ef98-ea7d9a1659cf │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 6152 │ 24 │ lwip │ │ 7776 │ 613 │ heap overhead │ │ 1944 │ 29 │ event source │ │ 44480 │ 173 │ thread/other │ │ 36600 │ 29 │ thread/spawn │ │ 48808 │ 223 │ untagged │ │ 11960 │ 56 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 223312 bytes in 549 allocations (87%), largest free 4k, total free 35k ****************************************************************************** Decoding by `jag`, device has version <2.0.0-alpha.143> ****************************************************************************** MALLOC_FAILED error. 0: Session.handshake.<block> <sdk>/tls/session.toit:281:7 1: Session.handshake <sdk>/tls/session.toit:227:3 2: Socket.handshake <sdk>/tls/socket.toit:69:14 3: Client.try-to-reuse_.<block>.<block> <pkg:pkg-http-2.5.1>/client.toit:651:24 4: catch.<block> <sdk>/core/exceptions.toit:124:10 5: catch <sdk>/core/exceptions.toit:122:1 6: catch <sdk>/core/exceptions.toit:85:10 7: Client.try-to-reuse_.<block> <pkg:pkg-http-2.5.1>/client.toit:646:9 8: Client.try-to-reuse_ <pkg:pkg-http-2.5.1>/client.toit:634:3 9: Client.web-socket_.<block> <pkg:pkg-http-2.5.1>/client.toit:367:7 10: SmallInteger_.repeat <sdk>/core/numbers.toit:1194:3 11: Client.web-socket_ <pkg:pkg-http-2.5.1>/client.toit:364:19 12: Client.web-socket <pkg:pkg-http-2.5.1>/client.toit:336:12 13: main.<block> Client/tracker.toit:27:23 14: main Client/tracker.toit:10:1 ******************************************************************************
floitsch
floitsch 03/29/2024 07:00 PM
Interesting. So "other" and "current" might not be related.
floitsch
floitsch 03/29/2024 07:00 PM
But that's not the dump from above.
AmusedGrape
AmusedGrape 03/29/2024 07:01 PM
I can see if I can recreate the issue with a local server (no TLS)
floitsch
floitsch 03/29/2024 07:01 PM
Is the TLS server something we can test again?
floitschfloitsch
But that's not the dump from above.
AmusedGrape
AmusedGrape 03/29/2024 07:01 PM
Right- there was no stack trace for that one
floitschfloitsch
Is the TLS server something we can test again?
AmusedGrape
AmusedGrape 03/29/2024 07:01 PM
Sure
floitsch
floitsch 03/29/2024 07:02 PM
Some servers use certificates that are harder to work with.
AmusedGrape
AmusedGrape 03/29/2024 07:03 PM
[jaguar] INFO: container 'tracker' installed and started Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 224 │ 1 │ tls/bignum │ │ 45056 │ 13 │ toit processes │ │ 16384 │ 4 │ system 0 d76b3d7e-9719-6d20-835d-fac4a92e544c │ │ 20480 │ 5 │ current 1 224bbb94-c28a-2f47-fd16-27d72746f31f │ │ 8192 │ 2 │ other 4 b09117fb-db68-76de-da22-b464f69cbfd7 │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 29464 │ 54 │ lwip │ │ 12288 │ 1171 │ heap overhead │ │ 2024 │ 29 │ event source │ │ 56872 │ 704 │ thread/other │ │ 31008 │ 25 │ thread/spawn │ │ 48808 │ 223 │ untagged │ │ 11952 │ 56 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 258176 bytes in 1105 allocations (101%), largest free 0k, total free 0k Heap report @ out of memory: ┌───────────┬──────────┬─────────────────────────────────────────────────────┐ │ Bytes │ Count │ Type │ ├───────────┼──────────┼─────────────────────────────────────────────────────┤ │ 224 │ 1 │ tls/bignum │ │ 45056 │ 13 │ toit processes │ │ 16384 │ 4 │ system 0 d76b3d7e-9719-6d20-835d-fac4a92e544c │ │ 20480 │ 5 │ current 1 224bbb94-c28a-2f47-fd16-27d72746f31f │ │ 8192 │ 2 │ other 4 b09117fb-db68-76de-da22-b464f69cbfd7 │ │ 16384 │ 1 │ heap metadata │ │ 4096 │ 1 │ spare new-space │ │ 29464 │ 54 │ lwip │ │ 12360 │ 1180 │ heap overhead │ │ 2024 │ 29 │ event source │ │ 57120 │ 714 │ thread/other │ │ 31008 │ 25 │ thread/spawn │ │ 48808 │ 223 │ untagged │ │ 11952 │ 56 │ wifi │ └───────────┴──────────┴─────────────────────────────────────────────────────┘ Total: 258496 bytes in 1115 allocations (101%), largest free 0k, total free 0k [jaguar] WARN: running Jaguar failed due to 'OUT_OF_MEMORY' (1/3)
AmusedGrape
AmusedGrape 03/29/2024 07:03 PM
I'll add some debugging logs, see if I can pinpoint where it crashes
floitsch
floitsch 03/29/2024 07:04 PM
It's almost certainly the ws connection.
floitsch
floitsch 03/29/2024 07:04 PM
Which server are you using?
AmusedGrape
AmusedGrape 03/29/2024 07:06 PM
Like for the websockets?
floitsch
floitsch 03/29/2024 07:06 PM
yes.
AmusedGrape
AmusedGrape 03/29/2024 07:07 PM
The server is using Go, net/http and gorilla/websockets for the websockets
floitsch
floitsch 03/29/2024 07:07 PM
Written by you?
AmusedGrape
AmusedGrape 03/29/2024 07:07 PM
Correct
floitsch
floitsch 03/29/2024 07:10 PM
The TLS/HTTP code was written by @erikcorry , so he is probably best for investigating this.
It's currently holidays here in Denmark, so he might only be able to have a look next week.
If it's possible:
- could you maybe avoid the TLS connection for now
- give us access next week so we can try to replicate
?
AmusedGrape
AmusedGrape 03/29/2024 07:11 PM
I'll see if theres a way I can avoid the TLS connection for now- using Cloudflare so it could be unavoidable unless I move off CF temporarily
floitsch
floitsch 03/29/2024 07:12 PM
hmm.
AmusedGrape
AmusedGrape 03/29/2024 07:12 PM
I'll work on getting you guys access, I'd just need to clean up the Git repo
floitsch
floitsch 03/29/2024 07:12 PM
cloudflare is actually one that we use a lot. We don't usually have problems with it.(edited)
floitsch
floitsch 03/29/2024 07:13 PM
Maybe it's the websocket, then.
floitsch
floitsch 03/29/2024 07:13 PM
Do you sometimes get data through, or is it always failing?
AmusedGrape
AmusedGrape 03/29/2024 07:14 PM
I'll see if I get data through
erikcorry
erikcorry 03/29/2024 07:16 PM
"other" is normally Jaguar or Artemis.
Current is likely your program.
System is the system process.
floitschfloitsch
This looks strange, though: Total: 238456 bytes in 577 allocations (93%), largest free 28k, total free 41k It looks like something wants to allocate more than 28K.
floitsch
floitsch 03/29/2024 07:16 PM
@erikcorry this is the weirdest one so far
floitsch
floitsch 03/29/2024 07:16 PM
That said, some look like they simply run out of memory.
erikcorry
erikcorry 03/29/2024 07:17 PM
It's hard to run TLS and also have bluetooth support if you don't have PSRAM. They just take a lot of space.
floitsch
floitsch 03/29/2024 07:18 PM
I'm doing that with Artemis all the time.
erikcorry
erikcorry 03/29/2024 07:19 PM
With the TLS you can run out of memory while doing the TLS handshake. This causes TLS to fail, and I think it can free all the memory that it was using for the handshake. You don't see the OOM before the handshake has failed, and at that point the memory has been freed that it was using.
AmusedGrape
AmusedGrape 03/29/2024 07:20 PM
Yeah without TLS it seems to work fine
AmusedGrape
AmusedGrape 03/29/2024 07:20 PM
Could it also just be a board limitation? Just not having enough memory :😂:
AmusedGrape
AmusedGrape 03/29/2024 07:20 PM
I'm basically running this on some cheap ESP32 dev boards from Aliexpress
floitsch
floitsch 03/29/2024 07:21 PM
All ESP32 boards have the same amount of internal RAM.
erikcorry
erikcorry 03/29/2024 07:21 PM
Well having PSRAM fixes the issue, so in a way that's true. But we do aim to work on non-PSRAM devices too.
floitsch
floitsch 03/29/2024 07:21 PM
Some have additional external RAM.
floitsch
floitsch 03/29/2024 07:22 PM
It's a bit strange:
Artemis has no problem connecting to our broker with TLS (through cloudflare).
But the tracker program seems to be unable to do so (reliably).
floitsch
floitsch 03/29/2024 07:23 PM
@erikcorry does a websocket connection use more memory than a normal http connection?
AmusedGrape
AmusedGrape 03/29/2024 07:23 PM
It probably does- its a constant connection I believe
erikcorry
erikcorry 03/29/2024 07:24 PM
The initial handshake can be quite heavy, depending on the crypto primitives used and the size of the certificates.
erikcorryerikcorry
The initial handshake can be quite heavy, depending on the crypto primitives used and the size of the certificates.
floitsch
floitsch 03/29/2024 07:24 PM
AmusedGrape is using cloudflare as well.
erikcorry
erikcorry 03/29/2024 07:25 PM
After that there's a period where the memory use is not so bad, but the go servers will gradually increase the size of the encrypted blocks up to 16k. That should be manageable, and we no longer require that memory to be contiguous.
AmusedGrapeOPAmusedGrape
It probably does- its a constant connection I believe
floitsch
floitsch 03/29/2024 07:25 PM
I think the peak usage is during connection, so the fact that the websocket stays alive probably doesn't matter too much here.
erikcorry
erikcorry 03/29/2024 07:26 PM
We can see some pretty big allocations that are not Toit.
│ 57120 │ 714 │ thread/other │ │ 31008 │ 25 │ thread/spawn │ │ 48808 │ 223 │ untagged │
erikcorry
erikcorry 03/29/2024 07:26 PM
We don't really know what these are. Our malloc implementation records them, but we don't have a lot of insight, because it's not Toit. Some of it is certainly BLE.
floitsch
floitsch 03/29/2024 07:28 PM
@AmusedGrape you have set the max-offline to something bigger.
So the tracker application starts up fresh and isn't running with a system where a BLE scan has been done before.
Right?
AmusedGrape
AmusedGrape 03/29/2024 07:28 PM
Yeah max-offline is 90s
floitsch
floitsch 03/29/2024 07:29 PM
Ah. So there is something that could help:
if you move the ble initialization after the connection to the server, the memory for the BLE won't be in use while the connection is established (where the http connection uses the most memory).
erikcorry
erikcorry 03/29/2024 07:29 PM
Yeah, the BLE grabs a lot of memory at boot time, but it probably gets even more when you start using it.
AmusedGrape
AmusedGrape 03/29/2024 07:30 PM
I'll try that
floitsch
floitsch 03/29/2024 07:31 PM
If that doesn't help (enough), then there is another approach:
- capture the seen IDs in memory
- shut down the BLE
- then only establish a connection to the server.

This way BLE and TLS don't need to be alive at the same time at all.
It would mean that the scanned IDs are sent with a delay (of at most 20s).
(edited)
AmusedGrape
AmusedGrape 03/29/2024 07:33 PM
I think I initially tried capturing the IDs in memory, but I think it captured so many that it ran out of memory
erikcorry
erikcorry 03/29/2024 07:33 PM
You can get that report at other times than OOM:
import system show serial-print-heap-report main: serial-print-heap-report
erikcorry
erikcorry 03/29/2024 07:33 PM
Would be interesting to see if the memory use rises when you start using BLE.
AmusedGrape
AmusedGrape 03/29/2024 07:34 PM
I'll try that
AmusedGrape
AmusedGrape 03/29/2024 07:35 PM
floitsch
floitsch 03/29/2024 07:37 PM
Maybe one more before opening BLE? (adapter := ble.Adapter)
AmusedGrape
AmusedGrape 03/29/2024 07:40 PM
session := client.web-socket --uri="wss://crowdtracker.hamp.sh/api/report" --headers=headers // ws://192.168.0.58:8080/ws print "After WS connection, before BLE scan" serial-print-heap-report adapter := ble.Adapter
Should be how it is
floitsch
floitsch 03/29/2024 07:41 PM
ah. I see.
AmusedGrape
AmusedGrape 03/29/2024 07:42 PM
Okay this is a long log- this is using the TLS server, but it crashed
AmusedGrape
AmusedGrape 03/29/2024 07:42 PM
Probably helps if I send the manually logged heaps
floitsch
floitsch 03/29/2024 07:44 PM
Weird. In the first log you gave the WS connection didn't add a lot of memory.
In the last one, it used up a lot. Not sure what that could be.

BLE is definitely the worst offender.
floitsch
floitsch 03/29/2024 07:47 PM
Can you send us the code that you are running again?
floitsch
floitsch 03/29/2024 07:48 PM
To me this looks like BLE + WiFi is just not a good idea.
I would run BLE first. Scan for 20 seconds, and then only (after having shut down BLE) use the WiFi connection.
floitsch
floitsch 03/29/2024 07:49 PM
Since it works from time to time, it's probably just at the edge of what could be possible, though.
AmusedGrape
AmusedGrape 03/29/2024 07:49 PM
How would I shut down BLE?
floitsch
floitsch 03/29/2024 07:50 PM
very good question...
AmusedGrape
AmusedGrape 03/29/2024 07:50 PM
Hah, yeah I found nothing in the docs
floitsch
floitsch 03/29/2024 07:54 PM
It looks like we are not exposing the close.
There is a low-level function, though.

Could you try:
ble.ble-close_ adapter.resource-group_
?

If that works, I will add a close to the Adapter next week.
AmusedGrape
AmusedGrape 03/29/2024 07:58 PM
Not sure if it works (it might, just can't tell) but here's the logs using that plus running the scan then wifi
AmusedGrape
AmusedGrape 03/29/2024 07:59 PM
Actually let me get some better heaps
AmusedGrape
AmusedGrape 03/29/2024 08:10 PM
Doesn't seem to like JSON encoding the data...
AmusedGrape
AmusedGrape 03/29/2024 08:10 PM
try: try: central.scan --duration=SCAN-DURATION: | device/ble.RemoteScannedDevice | // session.send (hex.encode device.data.manufacturer-data) devices.add (hex.encode device.data.manufacturer-data) finally: print "After BLE scan" serial-print-heap-report ble.ble-close_ adapter.resource-group_ print "After BLE close" serial-print-heap-report session.send (json.encode { "devices": devices }) print "After WS send" session.close print "After WS close" finally: network.close
floitsch
floitsch 03/29/2024 08:14 PM
Hmm.
Could you print how many entries you have?
floitsch
floitsch 03/29/2024 08:14 PM
Fwiw, closing the BLE definitely helps.
AmusedGrape
AmusedGrape 03/29/2024 08:17 PM
804... :😅:
AmusedGrape
AmusedGrape 03/29/2024 08:17 PM
I'll see if I can remove duplicates
floitsch
floitsch 03/29/2024 08:22 PM
This is (more or less), how I would try it:
import artemis import ble import certificate-roots import encoding.hex import http import net SCAN-DURATION ::= Duration --s=20 main args: with-timeout (Duration --m=2): artemis.run --offline: actual-main args actual-main args: certificate-roots.install-common-trusted-roots adapter := ble.Adapter central := adapter.central data := {} central.scan --duration=SCAN-DURATION: | device/ble.RemoteScannedDevice | data.add device.data.manufacturer-data ble.ble-close_ adapter.resource_group_ network := net.open client := http.Client network headers := http.Headers if args[1] == null or args[1] == "": throw "Authorization header is required" exit 1 headers.add "Authorization" args[1] headers.add "Device-ID" artemis.device.id.to-string try: session := client.web-socket --uri=args[0] --headers=headers // ws://192.168.0.58:8080/ws data.do: session.send (hex.encode it) finally: session.close finally: network.close
AmusedGrapeOPAmusedGrape
I'll see if I can remove duplicates
AmusedGrape
AmusedGrape 03/29/2024 08:23 PM
14, that definitely helped lol
AmusedGrape
AmusedGrape 03/29/2024 08:23 PM
I'll try that code real quick
👍1
AmusedGrape
AmusedGrape 03/29/2024 09:30 PM
Works after some slight modification!
floitsch
floitsch 03/29/2024 09:30 PM
Nice.
floitsch
floitsch 03/29/2024 09:30 PM
I will add the close to the BLE next week.
AmusedGrape
AmusedGrape 03/29/2024 09:31 PM
Did get this though:
floitsch
floitsch 03/29/2024 09:33 PM
Interesting.
Is your container still running at that point?
AmusedGrape
AmusedGrape 03/29/2024 09:34 PM
I think? I can't really tell
AmusedGrape
AmusedGrape 03/29/2024 09:34 PM
I think it is?
floitsch
floitsch 03/29/2024 09:36 PM
Are you running with artemis.run --offline ?
AmusedGrape
AmusedGrape 03/29/2024 09:39 PM
I disabled that- it doesn't seem to run with that for some reason
floitsch
floitsch 03/29/2024 09:40 PM
hmm.
what does "It doesn't seem to run" mean?
floitsch
floitsch 03/29/2024 09:40 PM
What happened, was that the container did some bluetooth things while Artemis tried to synchronize. -> BLE + TLS at the same time.
floitsch
floitsch 03/29/2024 09:41 PM
And it looks like this was after a "reset", so it couldn't use TLS resume (which is a much cheaper way of establishing the connection).
AmusedGrape
AmusedGrape 03/29/2024 09:44 PM
I'll test again here shortly
AmusedGrape
AmusedGrape 03/29/2024 10:37 PM
Yeah seems like it is working, I just didn't know if it was running.

Running offline shouldn't affect the interval trigger though right?
floitsch
floitsch 03/29/2024 10:38 PM
"offline" just means that Artemis is not allowed to synchronize while the block is active.
AmusedGrape
AmusedGrape 03/29/2024 10:38 PM
Ahh perfect then
floitsch
floitsch 03/29/2024 10:39 PM
In theory, this can make it impossible for Artemis to synchronize, which is why I added the with-timeout. Just to make sure Artemis actually has a chance of synchronizing in case there is a bug in the program.
AmusedGrape
AmusedGrape 03/29/2024 10:41 PM
Oh yeah that was smart
floitsch
floitsch 03/29/2024 10:44 PM
In theory we probably want a watchdog instead, but this was simpler
AmusedGrape
AmusedGrape 03/29/2024 11:05 PM
Kinda unrelated- what should I do in this situation:
[artemis.synchronize] INFO: firmware update initiated [artemis.synchronize] INFO: firmware update {from: <base64>, to: <base64>} [artemis.synchronize] INFO: synchronized state to broker [artemis.scheduler] INFO: runlevel decreasing {runlevel: 1} [artemis.synchronize] INFO: firmware update {size: 1719648} E (11105) Toit: Oversized ota_begin args: 1719648-1703936 [artemis.scheduler] INFO: runlevel increasing {runlevel: 3} [artemis.synchronize] WARN: firmware update failed {error: OUT_OF_BOUNDS}

Would I have to update the boards in person rather than OTA?
AmusedGrapeOPAmusedGrape
Kinda unrelated- what should I do in this situation: ``` [artemis.synchronize] INFO: firmware update initiated [artemis.synchronize] INFO: firmware update {from: <base64>, to: <bas...
floitsch
floitsch 03/29/2024 11:53 PM
I think this means that the OTA partition isn't big enough
floitsch
floitsch 03/29/2024 11:55 PM
Contrary to Jaguar, Artemis bundles the user code with the firmware. The default partition scheme thus easily runs out of space.
floitsch
floitsch 03/29/2024 11:56 PM
You probably want to use this envelope instead: https://github.com/toitlang/envelopes/tree/main/variants/esp32-ota-1c0000
Toit envelopes for different configurations. Contribute to toitlang/envelopes development by creating an account on GitHub.
floitsch
floitsch 03/29/2024 11:56 PM
In your pod spec: envelope: esp32-ota-1c0000.
floitsch
floitsch 03/29/2024 11:57 PM
We should probably warn when users start with OTA partitions that are relatively small.
floitsch
floitsch 03/29/2024 11:58 PM
Note that you can't (easily) change the partition size over the air. So right now the easiest is to reflash.
floitsch
floitsch 03/30/2024 12:00 AM
If really necessary one can update the partition table OTA we will but it's complicated. I wrote a blog post about it: https://blog.toit.io/changing-an-esp32-partition-table-over-the-air-276c86feeba8
Over the last years, the main commercial product of Toit.io has been our fleet-management system for ESP32-based IoT devices. Our first…
AmusedGrape
AmusedGrape 03/30/2024 01:31 AM
Weird, I added envelope: esp32-ota-1c0000 to my pod spec, but getting this error when flashing:
Firmware is too big to fit in designated partition (1719648 > 1703936) EXCEPTION error. firmware tool failed with exit code 0
floitsch
floitsch 03/30/2024 01:33 AM
Hmm. The 1703936 is still the old size.
floitsch
floitsch 03/30/2024 01:33 AM
Did you rebuild and upload the pod?
AmusedGrapeOPAmusedGrape
Weird, I added envelope: esp32-ota-1c0000 to my pod spec, but getting this error when flashing: ``` Firmware is too big to fit in designated partition (1719648 > 1703936) EXCEPTI...
floitsch
floitsch 03/30/2024 01:34 AM
This is happening with artemis serial flash?
AmusedGrape
AmusedGrape 03/30/2024 01:42 AM
Yes and yes
floitsch
floitsch 03/30/2024 01:44 AM
Let me try to reproduce.
floitsch
floitsch 03/30/2024 01:47 AM
While I'm trying to reproduce, could you try to read the partition table from your device?
I have written a small tool that prints it:
https://github.com/toitware/toit-partition-table-esp32
Toit package for handling ESP32 partition tables. Contribute to toitware/toit-partition-table-esp32 development by creating an account on GitHub.
floitsch
floitsch 03/30/2024 01:51 AM
argh.
floitsch
floitsch 03/30/2024 01:51 AM
my mistake. It's firmware-envelope and not just envelope.
floitsch
floitsch 03/30/2024 01:51 AM
I even made the mistake in the json-schema, so the auto-completion suggests envelope.
floitsch
floitsch 03/30/2024 01:52 AM
I will write a patch for Artemis that allows both, and we will gradually migrate towards envelope. I think. (Pending review).
floitsch
floitsch 03/30/2024 01:56 AM
PR is out. If Kasper agrees, we will accept "envelope" and "firmware-envelope" in the next release. (But only document "envelope").
AmusedGrape
AmusedGrape 03/30/2024 02:05 AM
Oh here's a new one..Broker error: duplicate key value violates unique constraint "pods_pkey"
AmusedGrape
AmusedGrape 03/30/2024 02:05 AM
SQL is great isn't it?
floitsch
floitsch 03/30/2024 02:05 AM
checking.
floitsch
floitsch 03/30/2024 02:06 AM
Which operation did you do?
floitsch
floitsch 03/30/2024 02:10 AM
Releases of the Artemis fleet management tool. Contribute to toitware/artemis-releases development by creating an account on GitHub.
floitschfloitsch
Which operation did you do?
AmusedGrape
AmusedGrape 03/30/2024 02:12 AM
Uploading the pod
floitsch
floitsch 03/30/2024 02:13 AM
From a built pod, or from a spec?
AmusedGrape
AmusedGrape 03/30/2024 02:13 AM
i think built, and i specified the file
floitsch
floitsch 03/30/2024 02:13 AM
ah. ok. I can reproduce.
AmusedGrape
AmusedGrape 03/30/2024 02:13 AM
./artemis pod upload --tag=v1.0.3r1 tracker.pod
AmusedGrape
AmusedGrape 03/30/2024 02:13 AM
Sweet
floitsch
floitsch 03/30/2024 02:13 AM
It looks like you uploaded the same pod again.
floitsch
floitsch 03/30/2024 02:14 AM
-> primary key error (since the pod's id is in the file).
floitsch
floitsch 03/30/2024 02:14 AM
and artemis pod tag is on my TODO list. Doesn't exist yet...
AmusedGrape
AmusedGrape 03/30/2024 02:14 AM
weird, somehow that was the reason
AmusedGrape
AmusedGrape 03/30/2024 02:15 AM
seems to all work now!
floitsch
floitsch 03/30/2024 02:15 AM
nice :🙂:
Thanks for sending us all that feedback. My todo list has a few more items now...
AmusedGrape
AmusedGrape 03/30/2024 02:16 AM
of course! best part about beta software :😜:
floitsch
floitsch 03/30/2024 02:17 AM
Hopefully not too much left.
We are hoping to get to v2.0.0 (Toit), and v1.0.0 (Artemis) soon.
AmusedGrape
AmusedGrape 03/30/2024 02:20 AM
yall have done a great job though! made my project 10x easier.
🙏1
bitphlipphar
bitphlipphar 03/30/2024 08:51 AM
For what it's worth, artemis.run --offline takes an optional --timeout argument, so you don't need to wrap it in a call to with-timeout.
👍1
floitsch
floitsch 03/30/2024 01:39 PM
Also noteworthy: since cloudflare supports TLS resume, establishing a TLS connection will be significantly cheaper if it happens within 23h of the last connection, and if the RTC memory wasn't deleted (which happens with a hard reset, or a power off, but not with deep sleep).
191 messages in total