Skip to content

v1.0.3: Network stack never recovers without a hard reset after receiving a fixed number of UDP packets #3287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ssilverman opened this issue Sep 26, 2019 · 33 comments
Labels
Status: Stale Issue is stale stage (outdated/stuck)

Comments

@ssilverman
Copy link

ssilverman commented Sep 26, 2019

This is the same bug as #2899, but for v1.0.3.

Hardware:

Board: Adafruit ESP32 Feather
Core Installation version: Core v1.0.3 in Arduino 1.8.10
IDE name: Arduino v1.8.10
Flash Frequency: 80MHz
PSRAM enabled: Don't know
Upload Speed: 921600
Computer OS: Mac OSX v10.14.6

Description:

I refer the reader to the main description in #2899, but in summary, here are two programs for testing.

To continuously send closely-spaced packets (in bash):

while true; do echo -n $(printf '.%.0s' {1..1400}) > /dev/udp/192.168.1.9/8000; sleep 0.05; done

[Change the IP address to your board's IP address.]

You'll notice that after about 32 packets, no more packets are received.

Sketch:

#include <AsyncUDP.h>
#include <Esp.h>
#include <WiFi.h>

constexpr char kAPName[] = "ChangeMe";
constexpr char kAPPassword[] = "ChangeMe";
constexpr bool isSoftAP = false;  // Change to true for SoftAP mode

AsyncUDP udp;

void setup() {
  Serial.begin(115200);
  while (!Serial && millis() < 4000) {
    // Wait for Serial
  }
  Serial.println("Starting.");

  if (isSoftAP) {
    Serial.println("Starting SoftAP...");
    if (WiFi.softAP(kAPName, kAPPassword)) {
      Serial.print("    IP: ");
      Serial.println(WiFi.softAPIP());
    } else {
      Serial.println("ERROR: Starting SoftAP!");
    }
  } else {
    if (WiFi.begin(kAPName, kAPPassword)) {
      while (!WiFi.isConnected()) {
        delay(500);
      }
      Serial.print("    IP: ");
      Serial.println(WiFi.localIP());
      Serial.print("    Subnet: ");
      Serial.println(WiFi.subnetMask());
      Serial.print("    Gateway: ");
      Serial.println(WiFi.gatewayIP());
    } else {
      Serial.println("    ERROR: Connecting to AP!");
    }
  }

  if (!udp.listen(8000)) {
    Serial.println("ERROR: Starting UDP server!");
  }
  udp.onPacket(onPacket);
}

int counter = 0;

void onPacket(AsyncUDPPacket &packet) {
  Serial.printf("%d: %d\n", ++counter, packet.length());
}

void loop() {
  // Print some status every 5s
  Serial.printf("Free heap: %d\n", ESP.getFreeHeap());
  delay(5000);
}
@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

I did some more testing with different packet sizes, and for me, it always fails at 32 packets.

@ssilverman
Copy link
Author

After even more testing, even for very slow packet rates, eg. 0.5Hz (sleep 2 in the bash script), it always stops receiving packets at 32. Is it overrunning the dynamic buffers? Is it a coincidence that CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM is 32?

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

Yep. When I change this value to something else, the failure always happens at that number of packets. For example, when CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM is 43, it fails at 43 packets, etc.

@stickbreaker
Copy link
Contributor

@ssilverman try calling the packet destructor when you are finished with the packet:

void onPacket(AsyncUDPPacket &packet) {
  Serial.printf("%d: %d\n", ++counter, packet.length());
  delete packet; // all done free resources
}

Chuck.

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

@stickbreaker That doesn't work and causes crashes. (I changed to delete &packet;, BTW.)

@lbernstone
Copy link
Contributor

This is not at about 32 packets, but at 32 packets exactly. Not even closely spaced. Are sockets not getting closed?

@stickbreaker
Copy link
Contributor

@ssilverman

It look like asynchUdp is not freeing the buffer after you use it. I added a call to pbuf_free() after it calls your onPacket() handler.

This is the code where your handler is called:
asyncUdp.cpp

void AsyncUDP::_recv(udp_pcb *upcb, pbuf *pb, const ip_addr_t *addr, uint16_t port, struct netif * netif)
{
    while(pb != NULL) {
        pbuf * this_pb = pb;
        pb = pb->next;
        this_pb->next = NULL;
        if(_handler) {
            AsyncUDPPacket packet(this, this_pb, addr, port, netif);
            _handler(packet);
++            pbuf_free(this_pb); // try adding this line
        } else {
            pbuf_free(this_pb);
        }
    }
}

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

I solved the problem by changing the onPacket method signature to: void onPacket(AsyncUDPPacket packet). i.e. it doesn't use a reference. However, that disagrees with AsyncUDP's definition of AuPacketHandlerFunction using a reference.

(Note that this agrees with the notes inside and linked from #2899.)

@ssilverman
Copy link
Author

@lbernstone You're right. I'm adjusting the issue title...

@ssilverman ssilverman changed the title v1.0.3: Network stack never recovers without a hard reset after receiving closely-spaced UDP packets v1.0.3: Network stack never recovers without a hard reset after receiving a fixed number of UDP packets Sep 26, 2019
@stickbreaker
Copy link
Contributor

@ssilverman so, what does your working code look like?

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

@stickbreaker I changed the onPacket function like so:

void onPacket(AsyncUDPPacket packet) {
  Serial.printf("%d: %d\n", ++counter, packet.length());
}

But, according to your code paste, this_pb still isn't getting freed via pbuf_free, but does it need to? Is there a reason it's not (question for the authors)?

Update: Turns out the AsyncUDPPacket destructor calls this, but I wonder why passing the reference doesn't work, but passing a copy does? I mean, after the handler is called, the AsyncUDPPacket object drops out of scope.

@stickbreaker
Copy link
Contributor

@ssilverman Yes, that was my question.
Can you modify asyncUdp.cpp by addint the pbuf_free() call and rerun your tests?

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

Thinking about why (per my update two comments above) the AsyncUDPPacket destructor gets called when I pass a copy, but not the reference. The AsyncUDPPacket object drops out of scope after the handler is called. Why wouldn't the destructor get called? When passing a copy, wouldn't pbuf_free get called twice? What am I missing?

@stickbreaker
Copy link
Contributor

@ssilverman that buffer needs to be freed somewhere. If I am reading the code correctly it only frees the buffer element if there is no callback? Without a callback how does the data get returned to the user?

my recommendation for delete packet; was in error.

I think the else case is the problem. Can you modify the asyncUDP.cpp and see if your results change?

Or, are you saying by changing the function you can receive continuous packet streams?

Chuck.

@stickbreaker
Copy link
Contributor

@ssilverman

Update: Turns out the AsyncUDPPacket destructor calls this, but I wonder why passing the reference doesn't work, but passing a copy does? I mean, after the handler is called, the AsyncUDPPacket object drops out of scope.

the packet() object's scope is above your onPacket() so it does not loose scope in your function. Scope if from create to deletion. I think packet() is a member of the upd() object.

@lbernstone
Copy link
Contributor

Pass pointers, not references. That makes a copy of the object and holds it open.

@stickbreaker
Copy link
Contributor

@ssilverman I think I figured it out. packets() scope is:

if(_handler) {
            AsyncUDPPacket packet(this, this_pb, addr, port, netif);
            _handler(packet);
        }

yes @lbernstone It finally donned on me! you are correct, the copy object was the guilty party!
It only took an hour to sink through my head.

Chuck.

@lbernstone
Copy link
Contributor

The literal definition of a fnord. It took me 3 looks to see it.

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

I think I get what's happening:

  1. The regular constructor calls pbuf_ref and the destructor calls pbuf_free. This means that when calling the handler with a reference, it's missing a pbuf_free because (and I'm guessing here because I haven't looked at the pbuf_ref code yet) there's still one reference count outstanding and so we have a leak.
  2. The copy constructor doesn't have an extra reference so when we use the handler with the copied object, pbuf_free gets called twice, which would also cause a leak eventually? Once when the copied object gets freed and another time when packet drops out of scope.
    Update: There's no leak in this case because pbuf_free does need to get called twice for the current code.

So:

  1. To use the reference version, we need an additional call to pbuf_free.
  2. To use the copy version, we need an additional call to pbuf_ref in the copy constructor. This way we can still use that extra pbuf_free added from the previous point.

In other words, no matter which version we're using there's a leak. Does my assessment look correct? If not, which part is incorrect?

@stickbreaker
Copy link
Contributor

@lbernstone Thank you, you just gave me a new word fnord I had never heard of it before.
@ssilverman you can close this issue if you satisfied.
Chuck.

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

@stickbreaker looks like we commented at the same time :)
I still believe there's a leak in both the reference and copy-constructor versions.

Update: I'm wrong for the second case. There's no current leak for the copy-constructor version.

@ssilverman
Copy link
Author

ssilverman commented Sep 26, 2019

Here's my proposed fix:

  1. Add a call to pbuf_ref in the AsyncUDPPacket copy constructor, and
  2. Call pbuf_free regardless (in AsyncUDP::_recv), outside the if and not in an else.

This should cover both the pass-as-reference and pass-as-copy cases.

@stickbreaker
Copy link
Contributor

@ssilverman I'm studying your fix.

@ssilverman
Copy link
Author

@stickbreaker the troubling fact is that the code still works even when compiling with a copy constructor for AuPacketHandlerFunction instead of the defined reference version. So I think the attempted fix should accommodate both. It is my belief that my above proposed fix solves the problem, but I'd definitely like other opinions.

When writing the copy constructor, it sucks that we can't just create a default and call pbuf_ref inside. The code must have all the copying done manually. :)

@ssilverman
Copy link
Author

ssilverman commented Sep 27, 2019

@stickbreaker (re. #3287 (comment)). The &packet syntax means "take the address of". It's a little confusing when the & character is used to declare a reference. In other words, the & is used for both reference declarations and "take the address of".

To pass something that takes a reference, just pass the object, not its address. Don't prepend the &.

Update: The comments to which I'm referring were deleted.

@ssilverman
Copy link
Author

ssilverman commented Sep 27, 2019

@stickbreaker message me offline, I'd be happy to chat about this. I don't want to dilute this issue too much. :)

Update: The comments to which I'm referring were deleted.

ssilverman added a commit to ssilverman/arduino-esp32 that referenced this issue Sep 27, 2019
…ack for both pass-as-reference and pass-as-copy versions. See espressif#3287 and espressif#3288.
@ssilverman
Copy link
Author

ssilverman commented Oct 1, 2019

I’m failing to see why the logic in #3288 and the actual fix in #3290 wouldn’t work. Could anyone come up with cases where it doesn’t work?

I’m not saying my solution is not wrong or incorrect, I just don’t see its error.

@stale
Copy link

stale bot commented Nov 30, 2019

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issue is stale stage (outdated/stuck) label Nov 30, 2019
@ssilverman
Copy link
Author

There's still no explanation about why the proposed fix in #3290 doesn't work. The pass-by-reference version in AsyncUDP is still broken.

@stale
Copy link

stale bot commented Dec 13, 2019

[STALE_CLR] This issue has been removed from the stale queue. Please ensure activity to keep it openin the future.

@stale stale bot removed the Status: Stale Issue is stale stage (outdated/stuck) label Dec 13, 2019
@stale
Copy link

stale bot commented Feb 11, 2020

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issue is stale stage (outdated/stuck) label Feb 11, 2020
@stale
Copy link

stale bot commented Feb 26, 2020

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

@stale stale bot closed this as completed Feb 26, 2020
@ssilverman
Copy link
Author

My fix was taken, but uncredited: #3290 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale Issue is stale stage (outdated/stuck)
Projects
None yet
Development

No branches or pull requests

3 participants