Diagnosing an ASP.NET Core hard crash
source link: https://codeblog.jonskeet.uk/2022/02/20/diagnosing-an-asp-net-core-hard-crash/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Diagnosing an ASP.NET Core hard crash
As part of my church A/V system (At Your Service), I run a separate local web server to interact with the Zoom SDK. Initially this was because the Zoom SDK would only run in 32-bit processes and I needed a 64-bit process to handle the memory requirements for the rest of the app. However, it’s also proven useful in terms of keeping the Zoom meeting for a church service alive if At Your Service crashes. Obviously I try hard to avoid that happening, but when interoperating with a lot of native code (LibVLC, NDI, the Stream Deck, PowerPoint via COM) there are quite a few avenues for crashes. The web server runs ASP.NET Core within a WPF application to make it easy to interact with logs while it’s running, and to give the Zoom SDK a normal event dispatcher.
Yesterday when trying to change my error handling code significantly, I found that the web server was crashing hard, with no obvious trace of what’s going on. I’ve already spent a little time trying to figure out what’s going on, but I couldn’t get to the bottom of it. I know the immediate cause of the crash, and I’ve fixed that fairly easily – but I want to harden the web server against any further bugs I might introduce. I figured it would be useful to blog about that process as I went along.
What I know so far
The immediate crash was due to an exception being thrown in an async void
method.
Relevant threading aspects:
- I start the ASP.NET Core app in a separate thread (although that’s probably unnecessary anyway, now that I think about it) calling
IHost.Start
- I have handlers for
Dispatcher.UnhandledException
andTaskScheduler.UnobservedTaskException
- I execute all Zoom-specific code on the WPF Dispatcher thread
The immediate error came from code like the following. You can ignore the way this effectively reproduces Web API to some extent… it’s the method body that’s important.
public
abstract
class
CommandBase<TRequest, TResponse> : CommandBase
{
public
override
async
Task HandleRequest(HttpContext context)
{
var
reader =
new
StreamReader(context.Request.Body);
var
text =
await
reader.ReadToEndAsync();
var
request = JsonUtilities.Parse<TRequest>(text);
var
dispatcher = Application.Current.Dispatcher;
try
{
var
response =
await
dispatcher.Invoke(() => ExecuteAsync(request));
var
responseJson = JsonUtilities.ToJson(response);
await
context.Response.WriteAsync(responseJson);
}
catch
(ZoomSdkException ex)
{
SetExceptionResponse(
new
ZoomExceptionResponse {
/* More info here */
});
}
async
void
SetExceptionResponse(ZoomExceptionResponse response)
{
var
responseJson = JsonUtilities.ToJson(response);
await
context.Response.WriteAsync(responseJson);
context.Response.StatusCode = 500;
}
}
public
abstract
Task<TResponse> ExecuteAsync(TRequest request);
}
There are at least three problems here:
- I’m trying to set
HttpResponse.StatusCode
after writing the body - The
SetExceptionResponse
method isasync void
(generally a bad idea) - I’m not awaiting the call to
SetExceptionResponse
(which I can’t, due to it returningvoid
)
(It’s also a bit pointless having a local method there. This code could do with being rewritten when I don’t have Covid brain fog, but hey…)
The first of these causes an InvalidOperationException
to be thrown. The second and third, between them, cause the app to crash. The debugger has been no help here in working out what’s going on.
Create with a console app to start ASP.NET Core
It feels like this should be really easy to demonstrate in a simple console app that does nothing but start a web server which fails in this particular way.
At this stage I should say how much I love the new top-level statements in C# 10. They make simple complete examples an absolute joy. So let’s create a console app, change the Sdk
attribute in the project file to Microsoft.NET.Sdk.Web
, and see what we can do. I’m aware that with ASP.NET Core 6 there are probably even simpler ways of starting the server, but this will do for now:
using
System.Net;
var
host = Host.CreateDefaultBuilder()
.ConfigureWebHostDefaults(builder => builder
.ConfigureKestrel((context, options) => options.Listen(IPAddress.Loopback, 8080))
.Configure(application => application.Run(HandleRequest)))
.Build();
host.Start();
host.WaitForShutdown();
async
Task HandleRequest(HttpContext context)
{
await
context.Response.WriteAsync(
"Testing"
);
}
Trying to run that initially brought up prompts about IIS Express and trusting SSL certificates – all very normal for a regular web app, but not what I want here. After editing launchSettings.json
to a simpler set of settings:
{
"profiles": {
"AspNetCoreCrash": {
"commandName": "Project"
}
}
}
… I can now start the debugger, then open up localhost:8080 and get the testing page. Great.
Reproduce the exception
Next step: make sure I can throw the InvalidOperationException
in the same way as the original code. This is easy, just replacing the body of the HandlRequest
method:
async
Task HandleRequest(HttpContext context)
{
await
context.Response.WriteAsync(
"Testing"
);
context.Response.StatusCode = 500;
}
Sure enough the console logs show that it’s failed as expected:
System.InvalidOperationException: StatusCode cannot be set because the response has already started.
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ThrowResponseAlreadyStartedException(String value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.set_StatusCode(Int32 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.Microsoft.AspNetCore.Http.Features.IHttpResponseFeature.set_StatusCode(Int32 value)
at Microsoft.AspNetCore.Http.DefaultHttpResponse.set_StatusCode(Int32 value)
at Program.<<Main>$>g__HandleRequest|0_1(HttpContext context) in C:\users\skeet\GitHub\jskeet\DemoCode\AspNetCoreCrash\Program.cs:line 19
at Microsoft.WebTools.BrowserLink.Net.BrowserLinkMiddleware.ExecuteWithFilterAsync(IHttpSocketAdapter injectScriptSocket, String requestId, HttpContext httpContext)
at Microsoft.AspNetCore.Watch.BrowserRefresh.BrowserRefreshMiddleware.InvokeAsync(HttpContext context)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
… but (again, as expected) the server is still running. It’s interesting that BrowserLink
occurs in the stack trace – I suspect that wouldn’t be the case in my
Let’s try making the failure occur in the same way as in At Your Service:
async
Task HandleRequest(HttpContext context)
{
// In AYS we await executing code in the dispatcher;
// Task.Yield should take us off the synchronous path.
await
Task.Yield();
WriteError();
async
void
WriteError()
{
await
context.Response.WriteAsync(
"Testing"
);
context.Response.StatusCode = 500;
}
}
This time we get a longer stack trace, and the process quits, just like in AYS:
System.InvalidOperationException: StatusCode cannot be set because the response has already started.
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ThrowResponseAlreadyStartedException(String value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.set_StatusCode(Int32 value)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.Microsoft.AspNetCore.Http.Features.IHttpResponseFeature.set_StatusCode(Int32 value)
at Microsoft.AspNetCore.Http.DefaultHttpResponse.set_StatusCode(Int32 value)
at Program.c__DisplayClass0_0.<<<Main>$>g__WriteError|4>d.MoveNext() in C:\users\skeet\GitHub\jskeet\DemoCode\AspNetCoreCrash\Program.cs:line 19
--- End of stack trace from previous location ---
at System.Threading.Tasks.Task.c.b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallback.c.b__6_0(QueueUserWorkItemCallback quwi)
at System.Threading.ExecutionContext.RunForThreadPoolUnsafe[TState](ExecutionContext executionContext, Action`1 callback, TState& state)
at System.Threading.QueueUserWorkItemCallback.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
This happens in both the debugger and when running from the command line.
Setting a break point in the WriteError
method shows a stack trace like this:
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
at Program.c__DisplayClass0_0.<<Main>$>g__WriteError|4()
at Program.<<Main>$>g__HandleRequest|0_1(HttpContext context) in C:\users\skeet\GitHub\jskeet\DemoCode\AspNetCoreCrash\Program.cs:line 14
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Runtime.CompilerServices.YieldAwaitable.YieldAwaiter.c.b__6_0(Action innerContinuation, Task continuationIdTask)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke()
at System.Runtime.CompilerServices.YieldAwaitable.YieldAwaiter.RunAction(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
There’s nothing about ASP.NET Core in there at all… so I wonder if we can take that out of the equation too?
Reproducing the crash in a pure console app
To recap, I’m expecting at this stage that to reproduce the crash I should:
- Write an
async void
method that throws an exception - Call that method from a regular async method
Let’s try:
#pragma warning disable CS1998 // Async method lacks 'await' operators and will run synchronously
await
NormalAsyncMethod();
Console.WriteLine(
"Done"
);
async
Task NormalAsyncMethod()
{
await
Task.Yield();
Console.WriteLine(
"Start ofNormalAsyncMethod"
);
BrokenAsyncMethod();
Console.WriteLine(
"End of NormalAsyncMethod"
);
}
async
void
BrokenAsyncMethod()
{
await
Task.Yield();
throw
new
Exception(
"Bang"
);
}
Hmm. That exits normally:
$ dotnet run
Start ofNormalAsyncMethod
End of NormalAsyncMethod
Done
But maybe there’s a race condition between the main thread finishing and the problem crashing the process? Let’s add a simple sleep:
#pragma warning disable CS1998 // Async method lacks 'await' operators and will run synchronously
await
NormalAsyncMethod();
Thread.Sleep(1000);
Console.WriteLine(
"Done"
);
// Remainder of code as before
Yup, this time it crashes hard:
Start ofNormalAsyncMethod
End of NormalAsyncMethod
Unhandled exception. System.Exception: Bang
at Program.<<Main>$>g__BrokenAsyncMethod|0_1()
in
C:\users\skeet\GitHub\jskeet\DemoCode\AspNetCoreCrash\ConsoleCrash\Program.cs:line 15
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
Interlude: what about async Task?
At this point I’m remembering some of what I’ve learned about how async void
methods handle exceptions. What happens if we turn it into an async Task
method instead? At that point, the Task
returned by the method (which we ignore) will have the exception, and as by default unobserved task exceptions no longer crash the process, maybe we’ll be okay. So just changing BrokenAsyncMethod
to:
async
Task BrokenAsyncMethod()
{
throw
new
Exception(
"Bang"
);
}
(and ignoring the warning at the call site)… the program no longer crashes. (I could subscribe to TaskScheduler.UnobservedTaskException
but I’m not that bothered… I’m pretty convinced it would fire, at least eventually.)
Do all ThreadPool exceptions crash the app?
We don’t need to use async methods to execute code on the thread pool. What happens if we just write a method which throws an exception, and call that from the thread pool?
ThreadPool.QueueUserWorkItem(ThrowException);
Thread.Sleep(1000);
Console.WriteLine(
"Done"
);
void
ThrowException(
object
? state)
{
throw
new
Exception(
"Bang!"
);
}
Yup, that crashes:
Unhandled exception. System.Exception: Bang!
at Program.<<Main>$>g__ThrowException|0_0(Object state) in C:\users\skeet\GitHub\jskeet\DemoCode\AspNetCoreCrash\ConsoleCrash\Program.cs:line 7
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
At this point some readers (if there still are any…) may be surprised that this is a surprise to me. It’s been a long time since I’ve interacted with the thread pool directly, and taking down the process like this feels a little harsh to me. (There are pros and cons, certainly. I’m not trying to argue that Microsoft made the wrong decision here.)
Can we change the ThreadPool behaviour?
Given that we have things like TaskScheduler.UnobservedTaskException
, I’d expect there to be something similar for the thread pool… but I can’t see anything. It looks like this is behaviour that changed with .NET 2.0 – back in 1.x, thread pool exceptions didn’t tear down the application.
After a bit more research, I found AppDomain.UnhandledException
. This allows us to react to an exception that’s about to take down the application, but it doesn’t let us mark it as “handled”.
Here’s an example:
AppDomain.CurrentDomain.UnhandledException += (sender, args) =>
Console.WriteLine($
"Unhandled exception: {((Exception)args.ExceptionObject).Message}"
);
ThreadPool.QueueUserWorkItem(ThrowException);
Thread.Sleep(1000);
Console.WriteLine(
"Done"
);
void
ThrowException(
object
? state) =>
throw
new
Exception(
"Bang!"
);
Running this code a few times, I always get output like this:
Unhandled exception: Bang!
Unhandled exception. System.Exception: Bang!
at Program.<<Main>$>g__ThrowException|0_1(Object state) in C:\users\skeet\GitHub\jskeet\DemoCode\AspNetCoreCrash\ConsoleCrash\Program.cs:line 8
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
… but sometimes “Done” is printed too. So I guess there’s some uncertainty about how quickly the AppDomain
is torn down.
Hardening At Your Service
Given what I know now, I don’t think I can easily stop the web server for Zoom interactions from terminating if I have a bug – but I can make it easier to find that bug afterwards. I already normally write the log out to a text file when the app exits, but that only happens for an orderly shutdown.
Fortunately, it looks like the AppDomain.UnhandledException
is given enough time to write the log out before the process terminates. Temporarily reverting to the broken code allows me to test that – and yes, I get a log with an appropriate critical error.
Conclusion
I knew that async void
methods were generally bad, but I hadn’t quite appreciated how dangerous they are, particularly when executed from a thread pool thread.
While I’m not thrilled with the final result, I at least understand it now, and can find similar errors more easily in the future. The “not understanding” part was the main motivation for this blog post – given that I’d already found the immediate bug, I could have just fixed it and ignored the worrying lack of diagnostic information… but I always find it tremendously unsettling when I can’t explain significant behaviour. It’s not always worth investigating immediately but it’s generally useful to come back to it later on and keep diving deeper until you’ve got to the bottom of it.
I haven’t put the source code for this blog post in GitHub as there are so many iterations – and because it’s all in the post itself. Shout if you would find it valuable, and I’m happy to add it after all.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK