Skip to content

"Packet received out-of-order. Expected 1; got 0." happening infrequently #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bgrainger opened this issue Feb 21, 2019 · 27 comments
Closed
Labels

Comments

@bgrainger
Copy link
Member

I can see the same error message in a ~ 0.001% of my requests, while connecting to server.
I do not us a Galera cluster.

my stack:



[Packet received out-of-order. Expected 1; got 0.]
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Threading.Tasks.ValueTask`1.get_Result()
at MySqlConnector.Protocol.Serialization.ProtocolUtility.DoReadPayloadAsync(BufferedByteReader bufferedByteReader, IByteHandler byteHandler, Func`1 getNextSequenceNumber, ArraySegmentHolder`1 previousPayloads, ProtocolErrorbe‎havior protocolErrorbe‎havior, IObe‎havior iobe‎havior)
at MySqlConnector.Protocol.Serialization.ProtocolUtility.ReadPayloadAsync(BufferedByteReader bufferedByteReader, IByteHandler byteHandler, Func`1 getNextSequenceNumber, ArraySegmentHolder`1 cache, ProtocolErrorbe‎havior protocolErrorbe‎havior, IObe‎havior iobe‎havior)
at MySqlConnector.Protocol.Serialization.StandardPayloadHandler.ReadPayloadAsync(ArraySegmentHolder`1 cache, ProtocolErrorbe‎havior protocolErrorbe‎havior, IObe‎havior iobe‎havior)
at MySqlConnector.Core.ServerSession.ReceiveReplyAsync(IObe‎havior iobe‎havior, CancellationToken cancellationToken)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at MySqlConnector.Core.ServerSession.TryAsyncContinuation(Task`1 task)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at MySqlConnector.Core.ServerSession.<TryResetConnectionAsync>d__65.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MySqlConnector.Core.ConnectionPool.<GetSessionAsync>d__10.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at MySqlConnector.Core.ConnectionPool.<GetSessionAsync>d__10.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Threading.Tasks.ValueTask`1.get_Result()
at MySql.Data.MySqlClient.MySqlConnection.<CreateSessionAsync>d__96.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Threading.Tasks.ValueTask`1.get_Result()
at MySql.Data.MySqlClient.MySqlConnection.<OpenAsync>d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at MySql.Data.MySqlClient.MySqlConnection.Open()

Originally posted by @MosheL in #267 (comment)

@bgrainger
Copy link
Member Author

bgrainger commented Feb 21, 2019

@MosheL please provide information on your client-side enviroment (.NET Framework, version, OS, MySqlConnector version) and server-side environment (DB server brand, version, hosting provider).

Are you able to provide a packet capture from when this happens? Are there any errors in your MySQL server logs?

@MosheL
Copy link

MosheL commented Feb 24, 2019

.NET Framework 4.7.2, 4.6.1
MySqlConnector last from nuget (0.38 for now).
Windows 10, Windows 2008 R2, Windows Server 2016 - all of them
MariaDB 10.3, hosted on my private cloud, both on Windows and on Linux.

I am reciving this error on a ~ 0.01% of the requests, while connecting.

@MosheL
Copy link

MosheL commented Feb 24, 2019

about packet capture - I will try to write a test code. it will take time.

@bgrainger
Copy link
Member Author

0.38.0 is fairly old; are you able to update to 0.49.2? I'm not aware of a specific bug fix that has been made that would solve this problem, but it's usually good to be running the latest version.

As best I can tell, the call to Open is reusing a connection from the connection pool. The driver is trying to reset the connection to put it back in a known state. (At least with earlier versions of MariaDB, this must be done by sending a "change user" request; I haven't checked if 10.3 supports a "reset connection" command.) After sending the reset request, the client parses an invalid reply, and throws this exception.

A packet capture would likely show us exactly what's going on (if we could isolate it down to the failing thread easily).

@MosheL
Copy link

MosheL commented Feb 24, 2019

upgraded to the last version. the same error on production (MariaDB 10.3.11).

I tried to repro the bug on my local machine (Windows, MariaDB 10.3.9) and I couldn't.

for now, I am starting from a simple test (sorry for the VB, wrote years ago to explain a bug on Oracle connector.

	Dim t As New Stopwatch()
	t.Start()

	Parallel.For(0, 90000, New ParallelOptions() With {.MaxDegreeOfParallelism = 3}, Function(i)
																						 Using Conn = CreateAndOpenConnection()

																						 End Using
																						 Return 0
																					 End Function)
	Response.Write(t.ElapsedMilliseconds())

I am using this code under IIS, same IIS version like production.

things I want to test:
10.3.13 (the last version)
Linux vs Windows (stupid, but maybe)
IIS@Pro vs IIS@Win Server, as I know there is a limits on this version.

@MosheL
Copy link

MosheL commented Feb 24, 2019

10.3.13: no problems

@MosheL
Copy link

MosheL commented Feb 24, 2019

update: tested 10.2.16. In the test no problem, in a production load - the same problem on the same rate.
for now I was added a hack in production: try, catch, reconnect.

@bgrainger
Copy link
Member Author

I'm not able to build a test harness that reproduces any connection problems with MariaDB 10.2.16; have you had any success?

@MosheL
Copy link

MosheL commented Feb 27, 2019

I didn't see any problem on any test, not with 10.2.16 and not with 10.3.11/12/13.
the problem is on production only :-(

@bgrainger
Copy link
Member Author

Please test again with 0.50.0, which fixes #613.

@MosheL
Copy link

MosheL commented May 6, 2019

Tested on production for a month.
the connection speed (#613) is better
the problem still here, but with less frequently.

@MosheL
Copy link

MosheL commented Jun 3, 2019

Packet received out-of-order. Expected 1; got 84

MySQLConnector 0.56, 10.3.15-MariaDB

@MosheL
Copy link

MosheL commented Jun 13, 2019

I was sent by mail a PCAP with the problem.

@bgrainger
Copy link
Member Author

Here's an updated exception stack:

[MySqlProtocolException: Packet received out-of-order. Expected 1; got 0.]
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
MySqlConnector.Protocol.Serialization.ProtocolUtility.DoReadPayloadAsync(BufferedByteReader bufferedByteReader, IByteHandler byteHandler, Func`1 getNextSequenceNumber, ArraySegmentHolder`1 previousPayloads, ProtocolErrorBehavior protocolErrorBehavior, IOBehavior ioBehavior) +158
MySqlConnector.Protocol.Serialization.StandardPayloadHandler.ReadPayloadAsync(ArraySegmentHolder`1 cache, ProtocolErrorBehavior protocolErrorBehavior, IOBehavior ioBehavior) +80
MySqlConnector.Core.ServerSession.ReceiveReplyAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) +122
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
MySqlConnector.Core.ServerSession.TryAsyncContinuation(Task`1 task) +105
System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() +70
System.Threading.Tasks.Task.Execute() +70
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
System.Runtime.CompilerServices.ConfiguredValueTaskAwaiter.GetResult() +51
MySqlConnector.Core.<TryPingAsync>d__70.MoveNext() +1293
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
System.Runtime.CompilerServices.ConfiguredValueTaskAwaiter.GetResult() +46
MySqlConnector.Core.<GetSessionAsync>d__10.MoveNext() +2193
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
MySqlConnector.Core.<GetSessionAsync>d__10.MoveNext() +4793
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
System.Runtime.CompilerServices.ConfiguredValueTaskAwaiter.GetResult() +51
MySql.Data.MySqlClient.<CreateSessionAsync>d__99.MoveNext() +913
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
System.Runtime.CompilerServices.ConfiguredValueTaskAwaiter.GetResult() +51
MySql.Data.MySqlClient.<OpenAsync>d__27.MoveNext() +1042
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() +31
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +60
MySql.Data.MySqlClient.MySqlConnection.Open() +61

The exception is coming from TryPingAsync. Here's the PING request (from the packet capture):

image

Here's the response:

image

In yellow is the expected "OK" packet that is the response to the ping. In red is an unexpected sequence of bytes that's being interpreted as packet with length 14 (first three bytes) and sequence number 0 (4th byte). I believe this is causing the error.

I'm not yet sure if Wireshark's warnings about "previous segment not captured" and "out of order" are relevant to this problem. Is that just an error with the packet capture that's leading me down a rabbit trail, or is that a symptom of whatever is causing the problem?

@bgrainger
Copy link
Member Author

@herea in the server's response seems very suspicious, since your query contains SET @hereami = .

@bgrainger
Copy link
Member Author

bgrainger commented Jun 13, 2019

On closer inspection, it looks like 11 bytes of garbage data, followed by one OK packet (sequence number 1, flags = AUTOCOMMIT | MORE_RESULTS_EXISTS) followed by another OK packet (sequence number 1 again, flags = AUTOCOMMIT).

image

If this is really what the client is receiving, it really looks like either a server-side bug or an unreliable network that's corrupting packets. One way to diagnose the latter might be to also perform a packet capture from the server-side, then compare the client-side and server-side captures.

@MosheL
Copy link

MosheL commented Jun 13, 2019

the packet you see was captured from server side.

@bgrainger
Copy link
Member Author

Does your comment at #651 (comment) apply to this case, too? I.e., removing the @hereami user variable fixes this issue?

@MosheL
Copy link

MosheL commented Jun 13, 2019

I am testing right now some ideas.

I think the problem is not the SET user variable, but the exectue of two (or more) commands separated by ;;

		var s2 = sql.Split(';');

			for (int i = 0; i < 2200; i++)
				{
				foreach (var current in s2)
				{
					var Cmd = new MySqlCommand(current);
					Cmd.Connection = Conn;
					using (var oReader = Cmd.ExecuteReader())
					{
						oReader.Read();
					}
				}
				}

Is working OK.

@MosheL
Copy link

MosheL commented Jun 13, 2019

When I am stopping to pass two (or more) queries on the same job, the connection problem was disappeared also.
but it a ~ 50-60% slower.

for (int i = 0; i < 2200; i++)
				{

				using (var Conn = new MySqlConnection("server= pooling=true;Connection reset =false; Min Pool Size=5;Max Pool Size=20; Connect Timeout=10;default command timeout=30;UseAffectedRows=true;Allow User Variables=true; Allow User Variables=true; CharacterSet=utf8mb4"))
				{
					Conn.Open();
					foreach (var current in s2)
					{
						var Cmd = new MySqlCommand(current);
						Cmd.Connection = Conn;
						using (var oReader = Cmd.ExecuteReader())
						{
							oReader.Read();
						}
					}
				}
				}

@bgrainger
Copy link
Member Author

I think the problem is not the SET user variable, but the exectue of two (or more) commands separated by ;

I haven't seen that to be a problem on any other MySQL version. (OTOH, I haven't seen user variables cause a problem either...) It's an interesting data point that it may require a combination of the two.

@MosheL
Copy link

MosheL commented Jun 13, 2019

Its the combination.
Two SELECTs didn't make this problem on this server. I was see the same problem on other production machine but could not repro now.

			string sql = @" SET @hereami = POINT(32.08373,34.8965);
select * from search_sites limit 1;  ";
// failed

	string sql = @" SET @hereami =1;
select * from search_sites limit 1; ";

// failed

			string sql = @" /*SET @hereami = POINT(32.08373,34.8965);*/
select * from search_sites limit 1;  SELECT types.id , concat( min(types.name) ,' ‏(', count(st_site) ,')') as name, min(parent.name) parent, min(parent.id) parent_id , min(parent.display_order) display_order1, min(types.display_order) display_order FROM types ,types as parent , sites_types where st_type=types.id and parent.id=types.parent_id and types.id > 0  and types.parent_id=1686 ";

 //working

Same code, working OK on a other 10.3.15, Installed on Windows 10 x64.
Failed on 10.3.13, on linux.

@bgrainger bgrainger added invalid and removed waiting for answer Needs more information from the bug reporter labels Jun 13, 2019
@bgrainger
Copy link
Member Author

I think the problem is not the SET user variable, but the exectue of two (or more) commands separated by ;

From https://jira.mariadb.org/browse/MDEV-16308:

  1. it happens when there is a semicolon-separated "batch". Both cases where it was seen, there is a command that returns OK (SET/UPDATE), followed by semicolon, followed by a SELECT
  2. for one case for which we have network trace, pcap, it manifests in a weird resultset already returned by select command in the batch (resultset rows end with an EOF with "more results exist" flag, which is directly followed by an EOF packet without this flag)
  3. the following OK packet (response to ping) is malformed, and embeds part of another OK packet into message (which is normally reserved for readable string)

The symptoms (two EOF packets, malformed OK packet) and causes (multi-statement batch where first statement is SET) exactly match your situation.

Closing this as an external bug, and duplicate of #651.

@MosheL
Copy link

MosheL commented Jun 13, 2019

Just to document it somewhere: on my dev machine (same version of 10.3.15 @ windows 10) - no problems, with stress test or not.
on production I can see it, in two diffrent setups (linux, 10.2,10.3, windows 2016) and diffrent clients (Oracle on Windows Net Framework 4.5, Oracle on NET Core 2.1/debian, MySqlConnector @ Windows, MySQLConnector NET Core).

@markus456
Copy link

markus456 commented Jun 27, 2019

This seems to be easily reproducible by shutting down the server while doing a query.

Here's a gist describing how I reproduced it: https://gist.github.com/markus456/fee40f72d83b4ee47ba2625fb67e3cf3

Also there is a stacktrace and a network capture showing that the server does send non-sequential packets on shutdown.

Edit: I'm no expect but a look at the code suggests that this could be avoided by adding a check for packets with sequence number 0 inside ReadPacketAfterHeader and postpone the error generation up until the command byte can be read.

@bgrainger
Copy link
Member Author

this could be avoided by adding a check for packets with sequence number 0 inside ReadPacketAfterHeader and postpone the error generation up until the command byte can be read.

Agreed: #662.

@Sandaman2015
Copy link

i got same error with "Packet received out-of-order. Expected 1; got 2.",
here is my using :
Dotnet 5.0 + EntityFrameWork Core 2.0 + Polemo.EntityFrameWork.Mysql 5.0.1, this is a web api project.
Mysql Veresion: 8.0+
Opreation System: windwos 10
My project is open in github, hopefully you can look where is my code warn, my project designed by repository unit and all method was used asynchronous task. i awalys consider about it could be something wrong with my asnchronous task method on BaseRepository.c#
open project url on github: https://github.com/Sandaman2015/ZswBlog.Core

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants