Skip to content

Packet received out-of-order.Expected 1;got 2 while connect to a mysql cluster. #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ErikXu opened this issue May 31, 2017 · 12 comments
Closed

Comments

@ErikXu
Copy link

ErikXu commented May 31, 2017

I hava a mysql cluster, one primary, two secondaries and two sql proxies. And I also have a visual IP(VIP) binded to the proxies. When I use the VIP to connect to the cluster, I get the Mysql.Data.MysqlClient.MysqlProtocolException, information:Packet received out-of-order.Expected 1;got 2 via Pomelo.EntityFrameworkCore.MySql. While the official connector and Navicat can work fine. Here is some information:
Pomelo.EntityFrameworkCore.MySql Version: 1.1.4 or 1.1.0
Mysql and Cluster version: 5.7.18
Application OS: CentOS 7 or Windows or docker

@caleblloyd
Copy link
Contributor

This is a setup that we are not familiar with. Is there any way you can take an unencrypted packet dump of the failing TCP connection using Wireshark and post it here in a zip file?

It'd also be helpful to see a working TCP connection from the official connector or Navicat

@bgrainger
Copy link
Member

Can you post the full call stack for this exception?

@simon1144
Copy link

simon1144 commented Aug 31, 2017

We have a similar setup and an identical problem which occurs several times a day. We've just started investigating, but what seems to happen is we will get one of two exceptions on a request

a) System.InvalidOperationException: Read past end of buffer.
b) System.FormatException: Length-encoded integer cannot have 0xFB prefix byte.

The context is within a using block - so would be released when these failures occur.
We then get
MySql.Data.MySqlClient.MySqlProtocolException: Packet received out-of-order. Expected 1; got 28.
on another thread which has just opened a new context. It looks like the packet 28 belongs to the previously failed request.

@bgrainger bgrainger reopened this Aug 31, 2017
@bgrainger
Copy link
Member

A packet capture (with Wireshark or similar) would be extremely helpful in diagnosing this issue. (You can email it to the address on my GitHub profile.)

@simon1144
Copy link

Thanks for the quick reply. I should be able to get a capture in the next 24 hours or so. I will email it through as soon as we have it.

@bgrainger
Copy link
Member

bgrainger commented Sep 7, 2017

I've reviewed @simon1144's packet capture and this is what I see:

(packet number) (direction) (contents)
0 -> Query: CALL sproc(1, 1, 1);\n
1 <- column_count = 21
2..22 <- column definitions
23 <- EOF, more results
24..25 <- rows
26 <- EOF, more results
27..28 <- rows
29 <- EOF, more result
30 <- OK

According to the protocol documentation, the EOF in packet 26 should be followed by a full result set response, which begins with a column_count packet. Instead, packet 27 is a row. MySqlConnector fails to deserialize it and throws an exception. I'm assuming (haven't proven yet) that it keeps this session alive and when it attempts to reuse it, it reads packet 28 and throws the out-of-order exception.

I checked the source code for some other connector libraries and couldn't find any that support reading rows (without the column definition metadata) right after an EOF. Interestingly, I found mysqljs/mysql#867 which has packet dumps showing the exact same thing I saw in the Wireshark trace, and it also uses a Galera cluster.

Action items:

  • Make sure the session/connection is closed or put into an unusable state if a deserialization exception occurs; this should avoid the out-of-order exception.
  • Throw a better exception in this situation; we can detect it because there is extra data at the end of the column_count packet.
  • Investigate other client libraries reporting issues with Galera cluster and see if there's a real fix we can pull in.

@bgrainger
Copy link
Member

In sidorares/node-mysql2#113, he concludes that "it seems like MySQL/Galera/MariaDB bug". Given my understanding of the protocol and reading of the Wireshark trace, I'm inclined to agree. However, if this is true, it's a fairly serious Galera bug (IMHO). At best, only some of the rows can be returned before the connector will have to throw an exception due to a protocol error.

@bgrainger
Copy link
Member

However, if this is true, it's a fairly serious Galera bug

Which makes it very odd that it seems to be reported extremely infrequently. The only other reference I've been able to find is the JavaScript MySQL client libraries.

Perhaps it's a rare combination of factors, such as a client that doesn't support CLIENT_DEPRECATE_EOF (#322) combined with latent failure to handle malformed packets?

@simon1144
Copy link

FYI, the underlying issue causing the protocol violation has been confirmed as a problem with Galera codership/mysql-wsrep#313

@bgrainger
Copy link
Member

Closing this as an external bug.

@bgrainger
Copy link
Member

@MosheL moved to #612; please comment there.

@MosheL
Copy link

MosheL commented May 6, 2019

I am still reciving this error, but less frenquently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants