1

记一次 .NET 某埋线管理系统 崩溃分析

 1 year ago
source link: https://www.cnblogs.com/huangxincheng/p/17513935.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

1. 讲故事

经常有朋友跟我反馈,说看你的文章就像看天书一样,有没有一些简单入手的dump 让我们先找找感觉,哈哈,今天就给大家带来一篇入门级的案例,这里的入门是从 WinDbg 的角度来阐述的,这个问题如果你通过 记日志,分析代码 的方式,可能真的无法解决,不信的话继续往下看呗!

前段时间有位朋友微信上找到我,说他的程序崩溃了,也没找出是什么原因,然后就让朋友抓一个崩溃的dump让我看看。

二:WinDbg 分析

1. 崩溃原因在哪里

在 windbg 中有一个自动化的分析命令 !analyze -v 可以寻找到 miniDumpWriteDump 时塞入的 PMINIDUMP_EXCEPTION_INFORMATION 信息,结构如下:


typedef struct _MINIDUMP_EXCEPTION_INFORMATION {
  DWORD               ThreadId;
  PEXCEPTION_POINTERS ExceptionPointers;
  BOOL                ClientPointers;
} MINIDUMP_EXCEPTION_INFORMATION, *PMINIDUMP_EXCEPTION_INFORMATION;

这个命令执行时间可能很长,要稍等片刻


0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************
CONTEXT:  (.ecxr)
rax=0000000000000198 rbx=0000000000000001 rcx=0000000000000002
rdx=0000000039959600 rsi=0000000000000000 rdi=0000000039959600
rip=00007fffe1e4cba4 rsp=00000000010fc050 rbp=00000000010fc150
 r8=0000000000000000  r9=000000003999b640 r10=0000000000000018
r11=00000000010fc020 r12=0000000000000000 r13=00000000010fc370
r14=000000004b727aa0 r15=0000000000000020
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
igxelpicd64+0x1fcba4:
00007fff`e1e4cba4 488b08          mov     rcx,qword ptr [rax] ds:00000000`00000198=????????????????
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007fffe1e4cba4 (igxelpicd64+0x00000000001fcba4)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000198
Attempt to read from address 0000000000000198

PROCESS_NAME:  xxx.exe

上面的崩溃点的汇编语句 mov rcx,qword ptr [rax] 说的非常清楚,访问0区的 0000000000000198 地址必然会是访问违例,接下来简单看一下汇编代码。


0:000> ub igxelpicd64+0x00000000001fcba4
igxelpicd64+0x1fcb80:
00007fff`e1e4cb80 418b09          mov     ecx,dword ptr [r9]
00007fff`e1e4cb83 83f910          cmp     ecx,10h
00007fff`e1e4cb86 0f83bb0a0000    jae     igxelpicd64+0x1fd647 (00007fff`e1e4d647)
00007fff`e1e4cb8c 488d04cd21000000 lea     rax,[rcx*8+21h]
00007fff`e1e4cb94 4803c1          add     rax,rcx
00007fff`e1e4cb97 488d04c6        lea     rax,[rsi+rax*8]
00007fff`e1e4cb9b 4885c0          test    rax,rax
00007fff`e1e4cb9e 0f847c0c0000    je      igxelpicd64+0x1fd820 (00007fff`e1e4d820)

从汇编代码看是一段 数组操作 的逻辑,捋汇编太累了,我们看下 igxelpicd64.dll 模块到底是谁写的,用 lmvm 观察下。


0:000> lmvm igxelpicd64
Browse full module list
start             end                 module name
00007fff`e1c50000 00007fff`e2cfe000   igxelpicd64   (export symbols)       igxelpicd64.dll
    Loaded symbol image file: igxelpicd64.dll
    Image path: C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ec5e4cdfcd3a62b8\igxelpicd64.dll
    Image name: igxelpicd64.dll
    Browse all global symbols  functions  data
    Timestamp:        Sat Jul 16 02:54:34 2022 (62D1B7EA)
    CheckSum:         010A00BB
    ImageSize:        010AE000
    File version:     31.0.101.3251
    Product version:  31.0.101.3251
    File flags:       0 (Mask 3F)
    File OS:          10004 DOS Win32
    File type:        2.8 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    Information from resource tables:
        CompanyName:      Intel Corporation
        ProductName:      Intel HD Graphics Drivers for Windows(R)
        InternalName:     OpenGL
        OriginalFilename: ig7icd32
        ProductVersion:   31.0.101.3251
        FileVersion:      31.0.101.3251
        FileDescription:  OpenGL(R) Driver for Intel(R) Graphics Accelerator
        LegalCopyright:   Copyright (c) 1998-2018 Intel Corporation.

OpenGL(R) Driver for Intel(R) Graphics Accelerator 来看原来是用来渲染 2D,3D 矢量图形的工具包哈,这东西太底层了,没玩过,不过有一点可以肯定的是这个 dll 是属于 Intel 的,那为什么会调用这个渲染功能呢? 这就需要观察线程栈了。

2. 谁在调用渲染

崩溃有两个场景,一个是崩溃前,一个是崩溃后,要看崩溃前的线程栈我们一定要知道崩溃前的状况,这里用 .ecxr 命令切换,简化后如下:


0:000> .ecxr ; k
rax=0000000000000198 rbx=0000000000000001 rcx=0000000000000002
rdx=0000000039959600 rsi=0000000000000000 rdi=0000000039959600
rip=00007fffe1e4cba4 rsp=00000000010fc050 rbp=00000000010fc150
 r8=0000000000000000  r9=000000003999b640 r10=0000000000000018
r11=00000000010fc020 r12=0000000000000000 r13=00000000010fc370
r14=000000004b727aa0 r15=0000000000000020
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
igxelpicd64+0x1fcba4:
00007fff`e1e4cba4 488b08          mov     rcx,qword ptr [rax] ds:00000000`00000198=????????????????
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr               Call Site
00 00000000`010fc050 00007fff`e1e4c500     igxelpicd64+0x1fcba4
...
07 00000000`010fd430 00007fff`e503b788     igxelpicd64!DumpRegistryKeyDefinitions+0x11865
08 00000000`010fd490 00000000`324147f6     opengl32!glReadPixels+0x88
...
0c 00000000`010fd6d0 00007ff7`f5a3185a     GSGlobeDotNet!GeoScene.Globe.GSOGlobe.ScreenToScene+0xa5
...
0e 00000000`010fe1b0 00007ff8`3285d810     System_Windows_Forms_ni!System.Windows.Forms.Control.OnMouseClick+0x9b
...

从线程栈看是用户点击了鼠标,进入了 GSGlobeDotNet.dll ,在读取像素的底层逻辑中抛了异常,然后到网上搜了一下,原来是绘制三维地球的工具包,这个🐂了。

214741-20230629124456400-1186510347.png

知道这些信息后,让朋友升级下 显卡驱动 试试,后来朋友改了显卡的设置就搞定了,截图如下:

214741-20230629124456392-1682796356.png

显卡的问题也能导致程序的崩溃,太不可思议了,试想一下你如果只是记日志,看代码,怎么可能找的出问题,哈哈哈,这就是高级调试的价值。

图片名称

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK