5

Fixing empty paths in FileIO events (ETW)

 3 years ago
source link: https://lowleveldesign.org/2020/08/15/fixing-empty-paths-in-fileio-events-etw/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

lowleveldesign.org

Software tracing, debugging, and security

This month marks ten years since I started this blog 🥂🥂🥂. On this occasion, I would like to thank you for being my reader! Let’s celebrate with a new post on ETW 🙂

Empty paths issue in the wtrace output has been bugging me for quite some time. As I started working on a new wtrace release (coming soon!), there came the right moment to fix it. I’ve seen other people struggling with this problem too, so I thought that maybe it’s worth a blog post 🙂 Wtrace uses the TraceEvent library to interact with the ETW API, and in this post, I will use this library as well. Note that this issue affects only the real-time ETW sessions.

Tracing FileIO events

Overall, when it comes to storage I/O tracing, there are two classes of events: DiskIO and FileIO. DiskIO events represent disk driver operations, while FileIO events describe the calls to the file system. I consider wtrace a user-mode troubleshooting tool. Therefore I decided to trace only the file system events. Another benefit of using FileIO events is that they are emitted even when the file system cache handles the I/O request.

Enabling FileIO events in the trace session requires only the FileIOInit flag in the call to TraceEventSession.EnableKernelProvider. You may also add the FileIO flag if you want to track the IO request duration (new wtrace will do that). The next step is to add handlers to the interesting FileIO events, for example:

public void SubscribeToSession(TraceEventSession session)
{
var kernel = session.Source.Kernel;
kernel.FileIOClose += HandleFileIoSimpleOp;
kernel.FileIOFlush += HandleFileIoSimpleOp;
kernel.FileIOCreate += HandleFileIoCreate;
kernel.FileIODelete += HandleFileIoInfo;
kernel.FileIORename += HandleFileIoInfo;
kernel.FileIOFileCreate += HandleFileIoName;
kernel.FileIOFileDelete += HandleFileIoName;
kernel.FileIOFileRundown += HandleFileIoName;
kernel.FileIOName += HandleFileIoName;
kernel.FileIORead += HandleFileIoReadWrite;
kernel.FileIOWrite += HandleFileIoReadWrite;
kernel.FileIOMapFile += HandleFileIoMapFile;
}

However, you will shortly discover that the FileName property will sometimes return an empty string. An example wtrace output with this problem:

1136,1873 (1072) FileIO/Close 'C:\' (0xFFFFFA801D789CA0)
1136,2049 (1072) FileIO/Create 'C:\Windows\SYSTEM32\wow64win.dll' (0xFFFFFA801D789CA0) rw-
1363,8894 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x173400 32768b
1364,7208 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x117400 32768b
1365,6873 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x1CD400 32768b
1375,6284 (1072) FileIO/Create 'C:\Windows\win.ini' (0xFFFFFA801A43F2F0) rw-
1375,6702 (1072) FileIO/Read 'C:\Windows\win.ini' (0xFFFFFA801A43F2F0) 0x0 516b
1375,7369 (1072) FileIO/Create 'C:\Windows\SysWOW64\MAPI32.DLL' (0xFFFFFA8023E50710) rw-
1375,7585 (1072) FileIO/Close 'C:\Windows\SysWOW64\msxml6r.dll' (0xFFFFFA8023E50710)
1384,8796 (1072) FileIO/Read '' (0xFFFFFA801FDBFCD0) 0x58200 16384b

What’s causing this? Before we answer this question, let’s check how the file name resolution works internally.

File names resolution

Each ETW event emitted by the kernel is represented by the EVENT_RECORD structure. ETWTraceEventSource decodes this structure into a TraceEvent class, which is much easier to process in the managed code. As you can see in the structure definition, ETW events have some common properties, such as ProcessId, TimeStamp, or ThreadId. The data specific to a given event goes into the UserData field, which, without a proper manifest, is just a block of bytes. Thus, to process the UserData, we need a parser. The TraceEvent library already contains parsers for all the Kernel events, and it could even generate parsers dynamically. The Kernel parsers are located in a huge KernelTraceEventParser file (13K lines). By scanning through the FileIO*TraceData classes defined there, we can see that almost all of them have the FileName property. However, the way how this property is implemented differs between the parsers. For example, in FileIOCreateTraceData, we have the following code:

public string FileName { get { return state.KernelToUser(GetUnicodeStringAt(LayoutVersion <= 2 ? HostOffset(24, 3) : HostOffset(24, 2))); } }

While FileIOSimpleOpTraceData, for example, has the following implemention:

public string FileName { get { return state.FileIDToName(FileKey, FileObject, TimeStampQPC); } }

The difference is that FileIOCreateTraceData extracts the FileName from its payload (the string is in the UserData field). FileIOSimpleOpTraceData, on the other hand, queries an instance of the KernelTraceEventParserState class to find the name by either FileObject or FileKey. It keeps a private _fileIDToName dictionary for this purpose. As you may have guessed, we receive an empty path when the FileObject/FileKey is missing in this dictionary.

_fileIDToName dictionary and FileIO events

From the previous paragraph, we know that an empty path signifies that there is no record in the _fileIDToName dictionary for a given FileObject/FileKey. If we look for code that modifies this dictionary, we should quickly locate the hooks in the KernelTraceEventParser constructor:

AddCallbackForEvents(delegate (FileIONameTraceData data)
{
// TODO this does now work for DCStarts.  Do DCStarts event exist? 
var isRundown = (data.Opcode == (TraceEventOpcode)36) || (data.Opcode == (TraceEventOpcode)35);        // 36=FileIOFileRundown 35=FileIODelete
Debug.Assert(data.FileName.Length != 0);
state.fileIDToName.Add(data.FileKey, data.TimeStampQPC, data.FileName, isRundown);
});
...
// Because we may not have proper startup rundown, we also remember not only the FileKey but
// also the fileObject (which is per-open file not per fileName).  
FileIOCreate += delegate (FileIOCreateTraceData data)
{
state.fileIDToName.Add(data.FileObject, data.TimeStampQPC, data.FileName);
};
if (source.IsRealTime)
{
// Keep the table under control
Action onNameDeath = delegate (FileIONameTraceData data)
{
state.fileIDToName.Remove(data.FileKey);
};
FileIOFileDelete += onNameDeath;
FileIOFileRundown += onNameDeath;
FileIOCleanup += delegate (FileIOSimpleOpTraceData data)
{
// Keep the table under control remove unneeded entries. 
state.fileIDToName.Remove(data.FileObject);
};
}

As handlers registered here will always execute before our handlers, delete, rundown, or cleanup events won’t have the file name as it just got removed from the _fileIDToName dictionary :). A simple fix is to keep your own FileObject/FileKey/FileName map, and that’s what the current version of wtrace is doing. However, as you remember, that does not always work. After some debugging, I discovered another problem. The FileIORundown events were never emitted, so the _fileIDToName contained only FileObjects from the FileCreate events. That explains why there are almost no empty paths for processes that started while the trace session was running. After checking the documentation and some more debugging, I learned that FileIORundown events appear in the trace when you enable DiskFileIO and DiskIO keywords. And, what’s important, they are issued at the end of the trace session. Although the documentation states they contain a FileObject, it is in fact a FileKey. Fortunately, TraceEvent has the property correctly named to FileKey in the FileIONameTraceData class.

Working example

My final approach is to open a “rundown” session for two seconds, prepare my fileIDToName dictionary, and then pass it to the actual tracing session with only the FileIO events:

open System
open System.Threading
open Microsoft.Diagnostics.Tracing
open Microsoft.Diagnostics.Tracing.Parsers
open Microsoft.Diagnostics.Tracing.Session
open System.Collections.Generic
open Microsoft.Diagnostics.Tracing.Parsers.Kernel
type NtKeywords = KernelTraceEventParser.Keywords
type ClrKeywords = ClrTraceEventParser.Keywords
let rundownFileKeyMap sessionName =
use session = new TraceEventSession(sessionName)
let traceFlags = NtKeywords.DiskFileIO ||| NtKeywords.DiskIO
session.EnableKernelProvider(traceFlags, NtKeywords.None) |> ignore
let fileKeyMap = Dictionary<uint64, string>()
session.Source.Kernel.add_FileIOFileRundown(fun ev -> fileKeyMap.[ev.FileKey] <- ev.FileName)
use cts = new CancellationTokenSource(TimeSpan.FromSeconds(2.0))
use _r = cts.Token.Register(fun _ -> session.Stop() |> ignore)
session.Source.Process() |> ignore
fileKeyMap
let processEvent (fileObjectAndKeyMap : Dictionary<uint64, string>) (ev : TraceEvent) =
let opcode = int32(ev.Opcode)
let fileKey =
let i = ev.PayloadIndex("FileKey")
if i >= 0 then ev.PayloadValue(i) :?> uint64 else 0UL
let fileObject =
let i = ev.PayloadIndex("FileObject")
if i >= 0 then ev.PayloadValue(i) :?> uint64 else 0UL
let path =
if opcode = 64 (* create *) then
let ev = ev :?> FileIOCreateTraceData
let fileName = ev.FileName
fileObjectAndKeyMap.[fileObject] <- fileName
fileName
else
let chooser k =
match fileObjectAndKeyMap.TryGetValue(k) with
| (true, s) -> Some s
| (false, _) -> None
seq { fileKey; fileObject }
|> Seq.tryPick chooser
|> Option.defaultValue "n/a"
printfn "%d %s (%d) (%s) [%d.%d] (%s) key: 0x%X object: 0x%X '%s'"
(uint32(ev.EventIndex)) ev.EventName opcode (ev.GetType().Name)
ev.ProcessID ev.ThreadID ev.ProcessName fileKey fileObject path
[<EntryPoint>]
let main _ =
let sessionName = sprintf "mytrace-%d" DateTime.UtcNow.Ticks
use session = new TraceEventSession(sessionName)
let traceFlags = NtKeywords.FileIOInit
let stackFlags = NtKeywords.None
session.EnableKernelProvider(traceFlags, stackFlags) |> ignore
printf "Collecting rundown events..."
let fileObjectAndKeyMap = rundownFileKeyMap (sprintf "%s_rundown" sessionName)
printfn "done"
printfn "Key map size: %d" fileObjectAndKeyMap.Count
use _sub = session.Source.Kernel.Observe(fun s -> s.StartsWith("FileIO", StringComparison.Ordinal))
|> Observable.subscribe (processEvent fileObjectAndKeyMap)
Console.CancelKeyPress.Add(fun ev -> ev.Cancel <- true; session.Stop() |> ignore)
session.Source.Process() |> ignore
printfn "Key map size: %d" fileObjectAndKeyMap.Count
0

Notice, I am not removing anything from the dictionary, so you may want to add logic for that if you plan to run the trace session for a long time.

[BONUS] What are FileObject and FileKey

I’m unsure why some FileIO events have only the FileObject field, while others have both, FileKey and FileObject. To check what these fields represent, I created a simple app in LINQPad that keeps a file open while running:

do
use f = new StreamWriter("D:\\temp\\test.txt")
let rec readAndWrite () =
let r = Console.ReadLine()
if "" then
f.WriteLine(r)
f.Flush()
readAndWrite ()
readAndWrite ()

Then I started a tracing app and recorded few writes:

6243 12415615435 FileIO/Write (68) (FileIOReadWriteTraceData) [7180.1784] (LINQPad6.Query) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
6244 12415615551 FileIO/Write (68) (FileIOReadWriteTraceData) [7180.1784] (LINQPad6.Query) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
17566 12449509307 FileIO/Write (68) (FileIOReadWriteTraceData) [4.9748] (System) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
17567 12449509409 FileIO/SetInfo (69) (FileIOInfoTraceData) [4.9748] (System) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'

Next, I attached WinDbg to the kernel locally and checked the FileObject address:

lkd> !fileobj 0xFFFF970FC37949F0
\temp\test.txt
Device Object: 0xffff970f418ebb90   \Driver\volmgr
Vpb: 0xffff970f41ae1640
Event signalled
Access: Write SharedRead
Flags:  0x40062
Synchronous IO
Sequential Only
Cache Supported
Handle Created
FsContext: 0xffffae0c9b0eb7c0    FsContext2: 0xffffae0c9b0eba20
Private Cache Map: 0xffff970fbccf4b18
CurrentByteOffset: e
Cache Data:
Section Object Pointers: ffff970f4be27838
Shared Cache Map: ffff970fbccf49a0         File Offset: e in VACB number 0
Vacb: ffff970f3fcdfe78
Your data is at: ffffd98630f0000e

As the !fileobj command worked, FileObject is a Kernel Object representing a file. From the WinDbg output, we can also learn that FileKey denotes the FsContext field of the FILE_OBJECT structure. As we can see in the API documentation, for the file system drivers, it must point to the FSRTL_ADVANCED_FCB_HEADER. This does not answer the initial question why some events miss this field, but at least we know its meaning.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK