4

Indexing Windows header files with the PowerShell Scour module

 3 years ago
source link: https://lowleveldesign.org/2019/02/08/indexing-windows-header-files-with-the-powershell-scour-module/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

lowleveldesign.org

Software tracing, debugging, and security

Developing system applications in C# requires a lot of PInvoking. Although there are many great PInvoke Nuget libraries, for smaller projects I still prefer to import only the definitions I use. The pinvoke.net site is an excellent source of stub definitions. However, it happens that the online definition does not contain all the needed constants or lacks something. In such a case you have to look into the Windows headers (which btw. contain not only definitions but also a lot of interesting comments). I used to search through those files using Total Commander “Find Files” dialog, but it was slow and inefficient. So I switched to Sublime Text and created a project for the Windows headers folder (C:\Program Files (x86)\Windows Kits\10\Include\10.0.x.x). Once the folder index is cached, Sublime becomes a great tool for analyzing the source code (not only for C++!). However, when you read a lot of code and switch between various projects, Sublime replaces the old cached projects with the new ones to keep the cache at a reasonable size. That triggers the cache rebuilt when you open the “old” project again, which takes time and makes your search inefficient again.

I then started looking for a way to build a permanent index on the folders I regularly scan (such as the Windows headers directory). At first, I was thinking about running a local instance of Elasticsearch or Apache Solr server, but that seemed like overkill. I was looking for something simpler, some kind of a wrapper over the Apache Lucene library, which is the core engine for the servers mentioned above. Then I stumbled upon the Lee Holmes article about Scour, a PowerShell module that wraps the Lucene.Net library and provides cmdlets to create full-text indexes for your folders. After using it for some time, I am happy with the results so I decided to share my simple setup with you.

Indexing

The Scour module is available in the PowerShell Gallery and you may install it with the following command:

Install-Module Scour –Scope CurrentUser

Then, run the PowerShell console as an Administrator, go to C:\Program Files (x86)\Windows Kits\10\Include\10.0.x.x (replace x.x with the version you have installed) and execute:

Initialize-ScourIndex -Path *.h

This command creates a __scour directory in the headers folder, which contains the Lucene index for the .h files. Alternatively, if you don’t like writing to the Program Files folders and running PowerShell with administrative rights, you may make a copy of the headers folder and perform the indexing there.

Searching

By default, the cmdlet to search through the indexed files is Search-ScourContent but to use it you need to go to the indexed folder and then write your query. The -Query parameter expects the Lucene Search Syntax and returns a list of files that match the query. To display also the file contents, you need to use the -RegularExpression parameter or pipe the results through the Select-String cmdlet (which is what -RegularExpression is doing for you). As I knew that my searches are simple, I added the following function to my PowerShell user profile (at $env:USERPROFILE\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1):

function Search-Headers(
[Parameter(Mandatory=$True,ValueFromPipeline=$True)][string]$Query)
{
# Requires the Scour module
# Make sure you created the index using the Initialize-ScourIndex -Path *.h command
Push-Location "c:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\"
$RgxQuery = $Query -replace '\*','.*'
Search-ScourContent -Query $Query -RegularExpression $RgxQuery
Pop-Location
}

After starting the PowerShell console, I could start using my index, for example:

powershell-scour-search-headers-1.png

Final words

You may find various arguments that full-text search is not the right way to index source code. I agree with them if we consider only symbol indexing. However, it works good enough for basic code searches. If you look at the Scour module source code, you will see that Lee does not create any custom analyzers but uses the StandardAnalyzer. So if you are unhappy with the results, you may always choose a different one or write your own. If you are a PowerShell developer, check the way how the indexing is parallelized (great code to reuse). Finally, Lee wasn’t the first person to match PowerShell with Lucene.Net. You may watch a very interesting presentation by Bruce Payette and download the code he wrote. Doug Finke even created a WPF UI over Bruce’s code so if you want to search files in a GUI window, give it a try.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK