DXP - System.ComponentModel.Win32Exception (258) or Timeout period elapsed prior to obtaining a connection from the pool

Vote:
 

Here's a simple KQL query you can run in AI Logs to summarize exceptions from the Azure SQL calls:

dependencies
| where target contains ".database.windows.net"
| where tostring(customDimensions.Exception) !contains "Violation of"
| summarize sum(itemCount) by substring(tostring(customDimensions.Exception),0,275)
| order by sum_itemCount desc
| project sum_itemCount, Column1

Looking into stability issues and when problems arise we see large amounts of these:

  1. Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258
  2. Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

I'm interested if others are also seeing these two exceptions in some volume for their CMS 12 on DXP?

Here's an issue for SqlClient that suggests this may be a broader issue with dotnet core running in Linux containers in general:
https://github.com/dotnet/SqlClient/issues/1530
 
For the DXP this suggests some tweaks are already in place:
https://support.optimizely.com/hc/en-us/articles/4432366206733-CMS-12-site-crash-due-to-SQL-timeout-error-when-working-in-CMS-edit-mode

#307196
Aug 21, 2023 7:24
Vote:
 

A shot in the dark but maybe this is with MARS enabled? When I was testing performance for CMS 12 MARS does not work well on Linux

#307293
Aug 22, 2023 14:00
Johan Kronberg - Aug 22, 2023 14:37
I got some info on that too; checked with operations and they set the connectionstring without the parameter in it (which should mean False).
We got better stability upgrading SqlClient to 5.1.1 (from 5.0.1 in our case). We are also now testing upgrade to .NET 7 from 6 on Integration to see if that gives us some improvements.
Quan Mai - Aug 22, 2023 14:53
interesting, keep us posted please
Johan Kronberg - Sep 11, 2023 17:54
Unfortunately we have logged occurrences of both exceptions also after going to .NET 7 in production.
Vote:
 

Hi Johan, did you manage to resolve these errors? We're receiving the same error in DXP 😞

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  
The timeout period elapsed prior to completion of the operation or the server is not responding.
Operation cancelled by user.
The statement has been terminated.
System.ComponentModel.Win32Exception (258): Unknown error 258
#311918
Nov 03, 2023 4:58
Vote:
 

We still see them and usually at high volume in proximity of reported downtime. But there can also be tens of millions operations between showings.

Here is another issue with the client package that we are following since we usually see the problem around queries that are doing heavier work. Unfortunately they are not our own queries but if they were this issue thread suggest re-working those calls to be sync and not async.

https://github.com/dotnet/SqlClient/issues/593

#311920
Nov 03, 2023 8:15
Johan Kronberg - Nov 03, 2023 8:20
We recently went live with Microsoft.Data.SqlClient 5.1.2 and no errors reported yet but too early to say something definitive.
Vote:
 

Johan, what is the status of this now? 5.1.2 seems good still?

(BTW, as a side note, seems like 5.2.0-5.2.1 may have other issues, see Errors after upgrading to 5.2.0 from 5.1.5 on Linux · Issue #2378 · dotnet/SqlClient (github.com). 5.2.2 just released supposed to fix those.

#328244
Aug 28, 2024 14:55
Johan Kronberg - Aug 28, 2024 15:01
We rolled back from 5.2.* and have been on 5.1.5 over the summer and had good stability. I think 5.2.2 will be alright too but we probably won't see as good performance on Linux as on Windows before SqlClientX has materialized (haven't read too closely but I think the earliest version will be 6 .0) - https://github.com/dotnet/SqlClient/discussions/2603#discussioncomment-9868402
Magnus Rahl - Aug 28, 2024 15:19
Thanks for the quick update!
Vote:
 

Can someone help explain what Johan means by 'We rolled back from 5.2.* and have been on 5.1.5 over the summer and had good stability' means?

We are on EPiServer.Framework 12.21.6 and that has a dependency on Microsoft.Data.SqlClient (≥ 5.1.4)
So whatever project in my solution that has EPiServer.Framework is then using 5.1.4.

Are you saying you are manually adding the nuget package Microsoft.Data.SqlClient 5.1.5 to projects that have EPiServer.Framework?

Also, if I'm already on 5.1.4 and see these exceptions, is going to 5.1.5 really going to matter?

#329491
Sep 06, 2024 20:37
Vote:
 

Can't recall what the status was between 5.1.4 and 5.1.5... But with the 5.2.0-5.2.1 error count went up and probably from a brand new error cause.

Errors like in my initial post are of course not only affected by the SqlClient-code but could also be turning up in states of port exhaustion for network calls or maxed out thread count.

The environment in question also had various upscaling done around the time of the 5.2.0-5.2.1 roll out, both number of cores, new type of app service tier and bumped up Azure SQL.

I guess after rolling back the upscaling in combination with the rollback to 5.1.5 gave us the good stability we see now.

I have this info endpoint added to see some of what the machine has set:

[Authorize(Roles = "CmsAdmins")]
public class EnvironmentInfoController : Microsoft.AspNetCore.Mvc.Controller
{
	public Microsoft.AspNetCore.Mvc.ActionResult Index()
	{
		ThreadPool.GetMinThreads(out var minWorkerThreads, out var minCompletionPortThreads);

		ThreadPool.GetMaxThreads(out var maxWorkerThreads, out var maxCompletionPortThreads);

		ThreadPool.GetAvailableThreads(out var availableWorkerThreads, out var availableCompletionPortThreads);

		var currentProcess = Process.GetCurrentProcess();

		var sb = new StringBuilder();

		if (currentProcess.Threads.Count > 0)
		{
			foreach (ProcessThread thread in currentProcess.Threads)
			{
				sb.AppendLine($"Start time: {thread.StartTime}, Total processor time: {thread.TotalProcessorTime}, Thread ID: {thread.Id}, State = {thread.ThreadState}");
			}
		}

		var content = $"""
						Instance name: {Environment.MachineName}
						The number of processors on this instance is: {Environment.ProcessorCount}
						Command line: {Environment.CommandLine}
						Current process thread count: {currentProcess.Threads.Count}
						Thread pool thread count: {ThreadPool.ThreadCount}
						Thread pool min worker threads: {minWorkerThreads} / completion: {minCompletionPortThreads}
						Thread pool max worker threads: {maxWorkerThreads} / completion: {maxCompletionPortThreads}
						Thread pool available worker threads: {availableWorkerThreads} / completion: {availableCompletionPortThreads}
						
						Thread info:
						{sb}
						""";

		return this.Content(content, "text/plain");
	}
}
#329527
Sep 07, 2024 8:42
Vote:
 

Mike, you can force a different version of a transitive dependency either by adding it as a top level dependency, or by using central package management and transitive pinning.

Also: Are you using MARS (MultipleActiveResultSets) on your connection string by any chance?

#329601
Sep 09, 2024 6:49
Mike Malloy - Sep 09, 2024 12:04
We are DXP, we are not using MARS.
Vote:
 

Just saw this posted:

Fixed an issue by updating the Microsoft.Data.SqlClient dependency to 5.2.2.
https://world.optimizely.com/support/bug-list/bug/CMS-35781

#329672
Sep 10, 2024 19:36
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.