7

一个超经典 WinForm 卡死问题的再反思

 1 year ago
source link: https://www.cnblogs.com/huangxincheng/p/16868486.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

一个超经典 WinForm 卡死问题的再反思

1.讲故事

这篇文章起源于昨天的一位朋友发给我的dump文件,说它的程序出现了卡死,看了下程序的主线程栈,居然又碰到了 OnUserPreferenceChanged 导致的挂死问题,真的是经典中的经典,线程栈如下:


0:000:x86> !clrstack
OS Thread Id: 0x4eb688 (0)
Child SP       IP Call Site
002fed38 0000002b [HelperMethodFrame_1OBJ: 002fed38] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
002fee1c 5cddad21 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
002fee34 5cddace8 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
002fee48 538d876c System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
002fee88 53c5214a System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
002fee8c 538dab4b [InlinedCallFrame: 002fee8c] 
002fef14 538dab4b System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
002fef48 53b03bc6 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
002fef60 5c774708 Microsoft.Win32.SystemEvents+SystemEventInvokeInfo.Invoke(Boolean, System.Object[])
002fef94 5c6616ec Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
002fefe8 5c660cd4 Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
002ff008 5c882c98 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
...

说实话,这种dump从去年看到今年,应该不下五次了,都看烦了,其形成原因是:

  • 未在主线程中生成用户控件,导致用 WindowsFormsSynchronizationContext.Send 跨线程封送时,对方无法响应请求进而挂死

虽然知道原因,但有一个非常大的遗憾就是在 dump 中找不到到底是哪一个控件,只能笼统的告诉朋友,让其洞察下代码是哪里用了工作线程创建了 用户控件, 有些朋友根据这个信息成功的找到,也有朋友因为各种原因没有找到,比较遗憾。

为了不让这些朋友的遗憾延续下去,这一篇做一个系统归纳,希望能助这些朋友一臂之力。

二:解决方案

这个问题的形成详情,我在去年的一篇文章为:记一次 .NET 某新能源汽车锂电池检测程序 UI挂死分析 中已经做过分享,因为 dump 中找不到问题的 Control,所以也留下了一些遗憾,这一篇就做个补充。

2. 问题突破点分析

熟悉 WinForm 底层的朋友应该知道,一旦在 工作线程 上创建了 Control 控件,框架会自动给这个线程配备一个 WindowsFormsSynchronizationContext 和其底层的 MarshalingControl ,这个是有源码支撑的,大家可以找下 Control 的构造函数,简化后的源码如下:


public class Control : Component
{
    internal Control(bool autoInstallSyncContext)
    {
        //***

        if (autoInstallSyncContext)
        {
            WindowsFormsSynchronizationContext.InstallIfNeeded();
        }
    }
}

public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable
{
    private Control controlToSendTo;

    private WeakReference destinationThreadRef;

    public WindowsFormsSynchronizationContext()
    {
        DestinationThread = Thread.CurrentThread;
        Application.ThreadContext threadContext = Application.ThreadContext.FromCurrent();
        if (threadContext != null)
        {
            controlToSendTo = threadContext.MarshalingControl;
        }
    }

    internal static void InstallIfNeeded()
    {
        try
        {
            SynchronizationContext synchronizationContext = AsyncOperationManager.SynchronizationContext;
            if (synchronizationContext == null || synchronizationContext.GetType() == typeof(SynchronizationContext))
            {
                AsyncOperationManager.SynchronizationContext = new WindowsFormsSynchronizationContext();
            }
        }
        finally
        {
            inSyncContextInstallation = false;
        }
    }
}

public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable
{
    public WindowsFormsSynchronizationContext()
    {
        DestinationThread = Thread.CurrentThread;
        Application.ThreadContext threadContext = Application.ThreadContext.FromCurrent();
        if (threadContext != null)
        {
            controlToSendTo = threadContext.MarshalingControl;
        }
    }
}

internal sealed class ThreadContext
{
    internal Control MarshalingControl
    {
        get
        {
            lock (this)
            {
                if (marshalingControl == null)
                {
                    marshalingControl = new MarshalingControl();
                }
                return marshalingControl;
            }
        }
    }
}

这段代码可以挖到下面两点信息。

  1. 一旦 Control 创建在工作线程上,那这个线程就会安装一个 WindowsFormsSynchronizationContext 变量,比如此时就存在两个对象了。

0:000:x86> !dso
OS Thread Id: 0x4eb688 (0)
ESP/REG  Object   Name
002FEC40 025a0fb0 System.Windows.Forms.WindowsFormsSynchronizationContext
...
002FEF44 0260992c System.Object[]    (System.Object[])
002FEF48 02d69164 System.Windows.Forms.WindowsFormsSynchronizationContext
...

  1. 工作线程ID 会记录在内部的 destinationThreadRef 字段中,我们试探下 02d69164

0:000:x86> !do 02d69164
Name:        System.Windows.Forms.WindowsFormsSynchronizationContext
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
...
533c2204  4002522        8 ...ows.Forms.Control  0 instance 02d69218 controlToSendTo
5cef92d0  4002523        c System.WeakReference  0 instance 02d69178 destinationThreadRef

0:000:x86> !DumpObj /d 02d69178
Name:        System.WeakReference
MethodTable: 5cef92d0
EEClass:     5cabf0cc
Size:        12(0xc) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
5cee2bdc  400070a        4        System.IntPtr  1 instance   111828 m_handle

0:000:x86> !do poi(111828)
Name:        System.Threading.Thread
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
5cee1638  40018ca       28         System.Int32  1 instance        9 m_ManagedThreadId 
...

从上面的输出中可以看到,9号线程 曾经创建了不该创建的 Control,所以找出这个 Control 就是解决问题的关键,这也是最难的。

3. 如何找到问题 Control

以我目前的技术实力,从 dump 中确实找不到,但我可以运行时监测,突破点就是一旦这个 Control 在工作线程中创建,底层会安排一个 WindowsFormsSynchronizationContext 以及 MarshalingControl 对象,我们拦截他们的生成构造就好了。

为了方便讲述,先上一段测试代码,在 backgroundWorker1_DoWork 方法中创建一个 Button 控件。


namespace WindowsFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
        {
            Button btn = new Button();
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }

        private void button1_Click(object sender, EventArgs e)
        {
            backgroundWorker1.RunWorkerAsync();
        }
    }
}

接下来在 MarshalingControl 的构造函数上下一个bp断点来自动化记录,观察 new Button 的时候是否命中。


0:007> !name2ee System_Windows_Forms_ni System.Windows.Forms.Application+MarshalingControl..ctor
Module:      5b9b1000
Assembly:    System.Windows.Forms.dll
Token:       0600554a
MethodDesc:  5b9fe594
Name:        System.Windows.Forms.Application+MarshalingControl..ctor()
JITTED Code Address: 5bb5d1a4
0:007> bp 5bb5d1a4 "!clrstack; gc"
0:007> g
OS Thread Id: 0x249c (9)
Child SP       IP Call Site
067ff2f0 5bb5d1a4 System.Windows.Forms.Application+MarshalingControl..ctor()
067ff2f4 5bb70224 System.Windows.Forms.Application+ThreadContext.get_MarshalingControl()
067ff324 5bb6fe5d System.Windows.Forms.WindowsFormsSynchronizationContext..ctor()
067ff338 5bb6fd4d System.Windows.Forms.WindowsFormsSynchronizationContext.InstallIfNeeded()
067ff364 5bb6e9a0 System.Windows.Forms.Control..ctor(Boolean)
067ff41c 5bbcd5cc System.Windows.Forms.ButtonBase..ctor()
067ff428 5bbcd531 System.Windows.Forms.Button..ctor()
067ff434 02342500 WindowsFormsApp1.Form1.backgroundWorker1_DoWork(System.Object, System.ComponentModel.DoWorkEventArgs)
067ff488 630ee649 System.ComponentModel.BackgroundWorker.OnDoWork(System.ComponentModel.DoWorkEventArgs) [f:\dd\NDP\fx\src\compmod\system\componentmodel\BackgroundWorker.cs @ 107]
067ff49c 630ee55d System.ComponentModel.BackgroundWorker.WorkerThreadStart(System.Object) [f:\dd\NDP\fx\src\compmod\system\componentmodel\BackgroundWorker.cs @ 245]
067ff6a0 7c69f036 [HelperMethodFrame_PROTECTOBJ: 067ff6a0] System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr, System.Object[], System.Object, System.Object[] ByRef)
067ff95c 6197c82c System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Messaging.IMessageSink)
067ff9b0 61978274 System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.DoAsyncCall() [f:\dd\ndp\clr\src\BCL\system\runtime\remoting\remotingproxy.cs @ 760]
067ff9bc 61978238 System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.ThreadPoolCallBack(System.Object) [f:\dd\ndp\clr\src\BCL\system\runtime\remoting\remotingproxy.cs @ 753]
067ff9c0 6104e7b4 System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object) [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1274]
067ff9c8 61078604 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 980]
067ffa34 61078537 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 928]
067ffa48 6104f445 System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1252]
067ffa5c 6104eb7d System.Threading.ThreadPoolWorkQueue.Dispatch() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 820]
067ffaac 6104e9db System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1161]
067ffccc 7c69f036 [DebuggerU2MCatchHandlerFrame: 067ffccc] 

从线程栈可以清晰的追踪到原来是 backgroundWorker1_DoWork 下的 Button 创建的,这就是问题的根源。。。

在我一百多dump的分析旅程中,这个问题真的太高频了,补充此篇真心希望能帮助这些朋友在焦虑中找到问题Control, 一毫之善,与人方便。

图片名称

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK