Skip to content

BSSL:_connectSSL: OOM error with Wemos D1 Mini (ESP8266) and InfluxDB 2.1.1 #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pheetr opened this issue Jan 6, 2022 · 15 comments
Closed
Labels
question Further information is requested

Comments

@pheetr
Copy link

pheetr commented Jan 6, 2022

Hi, I'm getting this OOM error when trying to send data to InfluxDB.
Based on the description of this feature I'm just looking for a quick confirmation whether this is the expected behavior in my setup and whether my only options currently are using a reverse proxy or setting up the db without ssl.

Logs (sensitive parts substituted):

Running task: WriteToDatabase
[HTTP-Client][begin] url: https://ip:port/api/v2/write?org=org&bucket=bucket
[HTTP-Client][begin] host: ip port: port url: /api/v2/write?org=org&bucket=bucket
[HTTP-Client][sendRequest] type: 'POST' redirCount: 0
[hostByName] Host: ip is a IP!
:close
:ur 1
:dsrcv 0
:del
:ref 1
BSSL:_connectSSL: start connection
:oom(16709)@abi.cpp:72
:oom(16709)@abi.cpp:72

_iobuf_in:       0x00000000
_iobuf_out:      0x3fff9c34
_iobuf_in_size:  16709
_iobuf_out_size: 597
BSSL:_connectSSL: OOM error
[HTTP-Client] failed connect to ip:port
[HTTP-Client][returnError] error(-1): connection failed
[HTTP-Client][end] tcp is closed
InfluxDB write failed: connection failed
@vlastahajek vlastahajek added the question Further information is requested label Jan 6, 2022
@vlastahajek
Copy link
Collaborator

As ESP8266 has a little free memory for the app (~40KB), successful communication over TLS highly depends on the rest of the application jobs. How much memory and max bock allocation is available before writing?
Try adding such a line:

Serial.printf_P(PSTR("Before: free_heap %d, max_alloc_heap %d, heap_fragmentation  %d\n"), ESP.getFreeHeap(), ESP.getMaxFreeBlockSize(), ESP.getHeapFragmentation());

To know whether MFLN works with your server, please, enable library debug as written here.

@pheetr
Copy link
Author

pheetr commented Jan 7, 2022

Thanks for the quick guidance. Please see below the output after adding these settings.
For additional context, I'm trying to get this AirGradient sketch to work.

Running task: WriteToDatabase
Before: free_heap 25472, max_alloc_heap 24624, heap_fragmentation  4
75.515 [D] writeRecord: bufferPointer: 1, batchPointer: 0, _bufferCeiling: 1
75.534 [D] Flushing buffer: is oversized true, is timeout true, is buffer full false
75.622 [D] Writing batch, batchpointer: 0, size 1
75.675 [D] Init
75.691 [D]  Library version: 3.9.0
75.727 [D]  Device : ESP8266
75.757 [D]  SDK version: 3.0.2
75.790 [D]  Server url: https://ip:port
75.842 [D]  Org: org
75.864 [D]  Bucket: bucket
75.889 [D]  Token: token
76.001 [D]  DB version: 2
76.041 [D] setUrls
76.048 [D]  writeUrl: https://ip:port/api/v2/write?org=org&bucket=bucket
76.132 [D]  queryUrl: https://ip:port/api/v2/query?org=org
76.204 [D] Writing to https://ip:port/api/v2/write?org=org&bucket=bucket
76.287 [D] Sending:
AirGradient pm2.5=0i,co2=752i,humidity=36i,temperature=21.10

[HTTP-Client][begin] url: https://ip:port/api/v2/write?org=org&bucket=bucket
[HTTP-Client][begin] host: ip port: port url: /api/v2/write?org=org&bucket=bucket
[HTTP-Client][sendRequest] type: 'POST' redirCount: 0
BSSL:_connectSSL: start connection
BSSL:_connectSSL: OOM error
[HTTP-Client] failed connect to ip:port
[HTTP-Client][returnError] error(-1): connection failed
[HTTP-Client][end] tcp is closed
76.820 [D] Leaving data in buffer for retry, retryInterval: 0
76.884 [D] Success: 0, _bufferPointer: 1, _batchPointer: 0, _writeBuffer[_bufferPointer]_0
InfluxDB write failed: connection failed

Memory output after the first run (above).
Second run: Before: free_heap 17584, max_alloc_heap 17312, heap_fragmentation 2
Third run: Before: free_heap 17240, max_alloc_heap 16696, heap_fragmentation 4

@vlastahajek
Copy link
Collaborator

Thanks, for the info. I've just remembered that the MFLN part is moved elsewhere. Please, enable debug the same way also in src/HTTPService.cpp.

I've found your issue in AirGradient-InfluxDB. If you use TSL on the InfluxDB OSS server, you need more RAM on the device, which is expected, to answer your original question.
MFLN will not work. Try releasing sth else to achieve around 30KB free memory, just for a test (e.g. disable air quality sensors).

Hint: If you don't write to InfluxDB Cloud, remove the last parameter of calling InfluxDBClient constructor on the line around 99, this will save you some memory.

If you don't manage to update the app to release enough memory, you have several options:

  • If you have your server in your local network, you don't need a TLS.
  • Use some reverse proxy on the server, that will allow MFLN.
  • Use Telegraf as a "data proxy" on your local network. Telegraf can act as an InfluxDB server and will resend data.
  • Use a more powerful device. ESP32 or, smaller, ESP32 C3.

@pheetr
Copy link
Author

pheetr commented Jan 7, 2022

Thanks for the additional guidance. I didn't realize that Telegraf could be used directly as an InfluxDB proxy. It was pretty quick to setup the InfluxDB V2 Listener Input Plugin, which is now working fine with the library through HTTP. :)

As an aside, I did do a test with your recommended settings. This confirmed that there was no MFLN support, but threw another error (I do have influxClient.setInsecure(); set due to self-signed cert).

Output:

Running task: WriteToDatabase
Before: free_heap 31288, max_alloc_heap 30968, heap_fragmentation  2
74.771 [D] writeRecord: bufferPointer: 1, batchPointer: 0, _bufferCeiling: 1
74.820 [D] Flushing buffer: is oversized true, is timeout true, is buffer full false
74.909 [D] Writing batch, batchpointer: 0, size 1
74.961 [D] Init
74.977 [D]  Library version: 3.9.0
75.014 [D]  Device : ESP8266
75.044 [D]  SDK version: 3.0.2
75.076 [D]  Server url: https://ip:port
75.128 [D]  Org: org
75.150 [D]  Bucket: bucket
75.175 [D]  Token: token
75.288 [D]  DB version: 2
75.316 [D] probeMaxFragmentLength to ip:port
75.389 [D]  MFLN:no
75.393 [D] setUrls
75.413 [D]  writeUrl: https://ip:port/api/v2/write?org=org&bucket=bucket
75.496 [D]  queryUrl: https://ip:port/api/v2/query?org=org
75.568 [D] Writing to https://ip:port/api/v2/write?org=org&bucket=bucket
75.651 [D] Sending:
AirGradient co2=782i

75.696 [D] POST request - https://ip:port/api/v2/write?org=org&bucket=bucket, data: 22bytes, type text/plain
[HTTP-Client][begin] url: https://ip:port/api/v2/write?org=org&bucket=bucket
[HTTP-Client][begin] host: ip port: port url: /api/v2/write?org=org&bucket=bucket
[HTTP-Client][sendRequest] type: 'POST' redirCount: 0
BSSL:_connectSSL: start connection
BSSL:CERT: cert 
BSSL:_run_until: Timeout
BSSL:_wait_for_handshake: failed
BSSL:Couldn't connect. Error = 'Unknown error code.'
[HTTP-Client] failed connect to ip:port
[HTTP-Client][returnError] error(-1): connection failed
91.219 [D] HTTP status code - -1
91.253 [E] Error - connection failed
[HTTP-Client][end] tcp is closed
91.326 [D] Leaving data in buffer for retry, retryInterval: 0
91.390 [D] Success: 0, _bufferPointer: 1, _batchPointer: 0, _writeBuffer[_bufferPointer]_0
InfluxDB write failed: connection failed

@vlastahajek
Copy link
Collaborator

vlastahajek commented Jan 7, 2022

So, Telegraf is able to successfully write to your server, ok?
Does validateConnection() function works?
Try also increasing timeout:
client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000));

@pheetr
Copy link
Author

pheetr commented Jan 9, 2022

Yes, Telegraf is working fine. It's generally the main method of how I'm getting most of my data into InfluxDB anyway, but I didn't know it was able to proxy InfluxDB data directly, I've learned something new. :)
This will be the method I'll go with for now, given that I don't really know how I could further decrease memory consumption, even for testing purposes.

I did try the things you've suggested and found that I would likely be running into low memory issues, no matter what.

validateConnection() works and it is able to connect successfully, however it leaves the memory well below 30k once it ran.

client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000)); might be the solution, given the timeout message from BSSL, if not for the fact that adding this single line drops available memory from ~32k to ~24k, thereby leading to OOM situation again:
Without client.setHTTPOptions: free_heap 31760, max_alloc_heap 31440, heap_fragmentation 2
With client.setHTTPOptions: free_heap 24272, max_alloc_heap 23920, heap_fragmentation 2

Thanks for your help and guidance.

@pheetr pheetr closed this as completed Jan 9, 2022
@vlastahajek
Copy link
Collaborator

Successful secured connection takes, as you noticed, around 10K.
After this and other initialization, you can safely run the app even when available RAM is just 10K (one of my long-running projects) if there are no more big allocations (e.g. other secured connections).

And, you should set the client to reuse the connection to avoid reconnections.

Anyway, great it works for you the way it is best for you.

@pheetr
Copy link
Author

pheetr commented Jan 10, 2022

Wait, so you're saying that the connection only requires around 10k? So why would validateConnection() fail with an OOM error when I enable client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000));? I assumed it was because adding that line reduced available memory to below ~30k, but if connection requires only 10k it should still work.

Hmm, I'll guess it may be worth looking into this some more, even just so I can get a better understanding of how things work here. :)

@pheetr
Copy link
Author

pheetr commented Jan 10, 2022

I found where I went wrong before. I didn't realize that the order in which client.setInsecure() and client.setHTTPOptions() are set matters. Apparently when client.setInsecure() is set after client.setHTTPOptions() it gets ignored, which in my case led to an OOM exception, so I assumed memory was insufficient. Looking at the logs closer I now saw that setInsecure was being ignored: BSSL:Connection *will* fail, no authentication method is setup

With these settings in the right order, client.validateConnection() now works both with and without client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000)) set.

The actual db write attempt still doesn't work though, for some reason.
I'm getting more or less the same output whether client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000)) is set or not. The only difference seems to be that with it starting memory is about 8k lower, but the outcome is the same, including "After" memory values.

Starting memory without client.setHTTPOptions: Before: free_heap 31288, max_alloc_heap 30968, heap_fragmentation 2
Starting memory with client.setHTTPOptions: Before: free_heap 24272, max_alloc_heap 23912, heap_fragmentation 2

DB write output:

Running task: WriteToDatabase
Before: free_heap 24272, max_alloc_heap 23912, heap_fragmentation  2
writePoint
46.437 [D] writeRecord: bufferPointer: 1, batchPointer: 0, _bufferCeiling: 1
46.499 [D] Flushing buffer: is oversized true, is timeout false, is buffer full false
46.589 [D] Writing batch, batchpointer: 0, size 1
46.641 [D] Writing to https://ip:port/api/v2/write?org=org&bucket=bucket
46.724 [D] Sending:
AirGradient co2=1449i

46.769 [D] POST request - https://ip:port/api/v2/write?org=org&bucket=bucket, data: 22bytes, type text/plain
[HTTP-Client][begin] url: https://ip:port/api/v2/write?org=org&bucket=bucket
[HTTP-Client][begin] host: ip port: port url: /api/v2/write?org=org&bucket=bucket
[HTTP-Client][sendRequest] type: 'POST' redirCount: 0
BSSL:_connectSSL: start connection
BSSL:CERT: cert data
BSSL:_run_until: Timeout
BSSL:_wait_for_handshake: failed
BSSL:Couldn't connect. Error = 'Unknown error code.'
[HTTP-Client] failed connect to ip:port
[HTTP-Client][returnError] error(-1): connection failed
62.291 [D] HTTP status code - -1
62.325 [E] Error - connection failed
[HTTP-Client][end] tcp is closed
62.398 [D] Leaving data in buffer for retry, retryInterval: 0
62.462 [D] Success: 0, _bufferPointer: 1, _batchPointer: 0, _writeBuffer[_bufferPointer]_0
InfluxDB write failed: connection failed
After: free_heap 2552, max_alloc_heap 2000, heap_fragmentation  21

After this first run the second DB write run in the loop will end in an OOM exception, no doubt due to the extremely low memory state at that point.

One difference between the successful client.validateConnection() and unsuccessful DB write attempts that I can see in the output and that could be interesting is that in the successful case there are four BSSL:CERT entries in the attempt, while the unsuccessful one stops after the first:

BSSL:_connectSSL: start connection
BSSL:CERT: cert data
BSSL:CERT: cert data
BSSL:CERT: cert data
BSSL:CERT: cert data
BSSL:Connected!

vs unsuccessful:

BSSL:_connectSSL: start connection
BSSL:CERT: cert data 
BSSL:_run_until: Timeout
BSSL:_wait_for_handshake: failed
BSSL:Couldn't connect. Error = 'Unknown error code.'

What I find curious is where this timeout happens: right after the first cert segment was processed.
Comparing the successful and unsuccessful connections, the cert data in the first BSSL:CERT segment is the same and fully there in the unsuccessful attempt as well.
However, the unsuccessful attempt only gets to the end of the first BSSL:CERT segment and doesn't start the second one, which is the case both when client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000)) is set and when it is not.

@pheetr pheetr reopened this Jan 10, 2022
@vlastahajek
Copy link
Collaborator

vlastahajek commented Jan 10, 2022

Wait, so you're saying that the connection only requires around 10k? So why would validateConnection() fail with an OOM error when I enable client.setHTTPOptions(HTTPOptions().httpReadTimeout(20000));? I assumed it was because adding that line reduced available memory to below ~30k, but if connection requires only 10k it should still work.

I meant that after a secured connection is established, memory is about ~10K lower, than it started, because of buffers allocated by WiFiClient.
It still requires a large amount of memory at the beginning (30K).

Yes, setInsecure must be called before setting options.

Regarding timeout, I've almost forgotten about hard-coded connection timeout in the secure wifi client, which is 15s and cannot be changed. I'm sorry for the misleading hint.

The core problem you are experiencing is in communication between secure wifi client and server. I don't know why, yet. It is on the SDK code, not in this library.
I have several Wemos D1 minis with a couple of sensors sending data to InfuxDB with a custom certificate without any problem.

I would suggest trying the SecureWrite example from this library to test the most simple app for writing to InfluxDB, where there is as much as possible RAM.

@pheetr
Copy link
Author

pheetr commented Jan 10, 2022

Ah, ok, thanks for the explanation regarding memory requirements.
Relatedly, is there a good example of how you would set up a sketch with the secure connection being established at the beginning (presumably toward the beginning of setup() when there is still enough memory) and the same connection being reused later on? I guess for the second part of the question you would use something like client.setHTTPOptions(HTTPOptions().reuseConnection(true)), but I'm not sure how you would go about establishing the connection in the beginning without a write event.

I'll see if I can test the SecureWrite example later today. I probably won't be able to progress further for about two weeks due to travel.

@pheetr
Copy link
Author

pheetr commented Jan 10, 2022

I've just tested the SecureWrite example.
It is able to connect successfully and write to the db. Interestingly, it is able to do so at a lower "Before" memory level than the sketch I'm trying to get to work (~18k vs ~24/31k).

Writing: wifi_status,device=ESP8266,SSID=ssid rssi=-20i

Before: free_heap 18536, max_alloc_heap 14264, heap_fragmentation  21

275.449 [D] writeRecord: bufferPointer: 1, batchPointer: 0, _bufferCeiling: 1
275.455 [D] Flushing buffer: is oversized true, is timeout false, is buffer full false
275.462 [D] Writing batch, batchpointer: 0, size 1
275.467 [D] Writing to https://ip:port/api/v2/write?org=org&bucket=test
275.474 [D] Sending:
wifi_status,device=ESP8266,SSID=ssid rssi=-20i

275.480 [D] POST request - https://ip:port/api/v2/write?org=org&bucket=test, data: 49bytes, type text/plain
BSSL:_connectSSL: start connection
BSSL:CERT: cert data 
BSSL:CERT: cert data 
BSSL:CERT: cert data 
BSSL:CERT: cert data 
BSSL:Connected!
276.632 [D] HTTP status code - 204
276.638 [D] Dropped batch, batchpointer: 1
276.641 [D] Success: 1, _bufferPointer: 1, _batchPointer: 1, _writeBuffer[_bufferPointer]_0
276.649 [D] Buffer empty

After: free_heap 18536, max_alloc_heap 14064, heap_fragmentation  22

@vlastahajek
Copy link
Collaborator

Thanks for the test.

What is the first row about memory?

Line Before: free_heap 18536, max_alloc_heap 14264, heap_fragmentation 21 after first connection remains almost the same between writes, because, shortly explained, wifi client has allocated buffer and it is reused. Thus, the next write doesn't need allocations and it runs ok .

There must be a problem in the app you have.

@pheetr
Copy link
Author

pheetr commented Jan 10, 2022

Thanks for the explanation.
If I understand your question, you meant to check the delta of "before" available memory between first and subsequent runs.
I have now disabled client.validateConnection() in the WriteSecure sketch, to only see how db writes behave.

First run:

Before: free_heap 47568, max_alloc_heap 47280, heap_fragmentation  1
...
After: free_heap 18976, max_alloc_heap 18168, heap_fragmentation  5

Second run:

Before: free_heap 18456, max_alloc_heap 17648, heap_fragmentation  5
...
After: free_heap 18456, max_alloc_heap 14008, heap_fragmentation  22

I see now what you meant by about 30k being needed for the buffer to be allocated initially.

This may certainly be app related. Unfortunately I'm not at a level when it comes to Arduino or coding in general that I could properly troubleshoot the situation or optimize memory usage.

Btw, I did test WriteSecure with connectionReuse enabled, mainly to see how it behaved. I only saw a difference in the first run:

Before: free_heap 40632, max_alloc_heap 40424, heap_fragmentation  1
...
After: free_heap 19056, max_alloc_heap 18384, heap_fragmentation  4

Subsequent runs were the same as before, with the default false setting:

Before: free_heap 18536, max_alloc_heap 17864, heap_fragmentation  4
...
After: free_heap 18536, max_alloc_heap 17864, heap_fragmentation  4

Relatedly, this option is currently referenced incorrectly in the readme as "reuseConnection", while it is actually the other way around, "connectionReuse".

@vlastahajek
Copy link
Collaborator

I see now what you meant by about 30k being needed for the buffer to be allocated initially.

Well, I don't know details about the internal work of the secured wifi client, especially about the closed source of TLS layer. I suppose that along with caching server certificate info and some other stuff, there is another buffer kept for sending data (encrypting) data.
You can check how it behaves using the stock example for HTPPS.

This may certainly be app related. Unfortunately I'm not at a level when it comes to Arduino or coding in general that I could properly troubleshoot the situation or optimize memory usage.

I recommend trying to keep only SHT3x sensor in the AirQuality app to test this.
I will test the app later as well, but I don't have those air quality sensors. so I will need to emulate them.

Relatedly, this option is currently referenced incorrectly in the readme as "reuseConnection", while it is actually the other way around, "connectionReuse".

Thanks for noticing this and reporting it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants