7

WiredTiger File Forensics Part 1: Building "wt" - Percona Database Per...

 3 years ago
source link: https://www.percona.com/blog/2021/05/18/wiredtiger-file-forensics-part-1-building-wt/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
WiredTiger File Forensics Part 1: Building "wt"

wiredtiger file forensicsMost of the files in a data directory of a MongoDB server are made by the WiredTiger storage engine. If you want to look at the content inside them you can use the tool “wt” from the WiredTiger library:

https://github.com/wiredtiger/wiredtiger/

http://source.wiredtiger.com/10.0.0/command_line.html

Inspection of the WiredTiger files is not an essential MongoDB DBA skill – it’s just for the curious. Or maybe post-crash data recovery, if you weren’t using a replica set. But for a certain type of person this is fun, and if you voluntarily clicked a page link to “file forensics” then it seems that’s you. 😛

Build and Install the “wt” Executable

Documentation: http://source.wiredtiger.com/develop/install.html

  1. git clone https://github.com/wiredtiger/wiredtiger.git
  2. cd wiredtiger
  3. git branch --all | grep mongodb  #find a branch matching the major version of mongodb you are using
  4. git checkout mongodb-4.4  #Using the WT code matching mongodb 4.4 as an example
  5. sh autogen.sh
  6. ./configure --disable-shared --with-builtins=lz4,snappy,zlib,zstd
  7. make -j $(nproc) # -j $(nproc) to parallelize build to number of cores
  8. sudo make install

Points about the make configuration above:

Use --disabled-shared  to force the build of a statically linked “wt” binary.

If you do not disable shared library mode, the final build product will be somewhat different. The “wt” target executable at the top of the source directory (and installed to /usr/local/bin/) will be a script, not a compiled binary. That script will execute a compiled binary built at .libs/wt, but before it can it has to fix the LD_LIBRARY_PRELOAD environment variable so the binary can find the libwiredtiger.so, also in .libs/. Much hacking is required if you then try to move these files elsewhere.

Also use --with-builtins=lz4,snappy,zlib,zstd to match what normal mongod binaries expect. But feel free to drop “lz4”, “zlib” and/or “zstd” if there is a hassle about installing those libraries’ dev packages. Only snappy is used in 99%+ of MongoDB deployments.

--with-builtins lz4,snappy,zlib,zstd, ie. without a “=” between option name and values, will fail with the message “checking build system type… Invalid configuration snappy': machine snappy’ not recognized”. Put the “=” in.

You can use configuration option –enable-XXX (eg. –enable-snappy) to build extensions as standalone libraries instead, but that’s the hard way. You’ll have to continually specify the full path to the standalone extension library as an extra extension in config every time you run a wt program.

Versions

You’ll need to build a different branch of wiredtiger depending on which mongodb version of data you’re looking at. “Log” in the table below means the transaction log, a.k.a. the journal in mongodb.

Shell
* Table of MongoDB<->WiredTiger<->Log version numbers:
* |                MongoDB | WiredTiger | Log |
* |------------------------+------------+-----|
* |                 3.0.15 |      2.5.3 |   1 |
* |                 3.2.20 |      2.9.2 |   1 |
* |                 3.4.15 |      2.9.2 |   1 |
* |                  3.6.4 |      3.0.1 |   2 |
* |                 4.0.16 |      3.1.1 |   3 |
* |                  4.2.1 |      3.2.2 |   3 |
* |                  4.2.6 |      3.3.0 |   3 |
* | 4.2.6 (blessed by 4.4) |      3.3.0 |   4 |
* |                  4.4.0 |     10.0.0 |   5 |

(This useful table is currently found as a comment in mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp.)

To build WT matching 4.4 do “git checkout mongodb-4.4”, for the latest 4.2 minor versions “git checkout mongodb-4.2”, etc. There are also tags (rather than branches) for a lot of minor versions. Use “git branch -a” and “git tag –list” to see.

A “wt” Build Shortcut

If you already have a compilable source code directory of MongoDB Community or Percona Server for MongoDB, a slightly quicker way to build and install wt it is to run scons with “wt” as the target (instead of “mongod”, or “install-core”, etc.). The WiredTiger library code in the subdirectory src/third_party/wiredtiger/ is already cherry-picked to match whichever version of MongoDB you have checked out at the moment.

The built binary will be created at build/opt/third_party/wiredtiger/wt if building with legacy (<=v4.2 style) install mode. Otherwise, it is installed to scons –prefix option plus “/bin/” I believe.

Executing the wt Command

WiredTiger Data Engine (version 10.0)
MongoDB wiredtiger_open configuration: ...
global_options:
    -B  maintain release 3.3 log file compatibility
    -C config
        wiredtiger_open configuration
    -E key
        secret encryption key
    -h home
        database directory
    -L  turn logging off for debug-mode
    -m  run verify on metadata
    -R  run recovery (if recovery configured)
    -r  access the database via a readonly connection
    -S  run salvage recovery (if recovery configured)
    -V  display library version and exit
    -v  verbose
commands:
    alter
        alter an object
    backup
        database backup
    compact
        compact an object
    copyright
        display copyright information
    create
        create an object
    downgrade
        downgrade a database
        drop an object
        dump an object

The options above are the global ones, and each subcommand has other options not shown above. Eg. the list subcommand has two options -c and -v (see help output below). The subcommand-specific options (-c or -v in this case) must appear after the subcommand (“list”, “dump”, etc.) on the command line, and the global options (-LmRrSVv, -C, -E) must be before it.

$ wt list --help
wt: illegal option -- -
usage: wt [-LmRrSVv] [-C config] [-E secretkey] [-h home] list [-cv] [uri]
options:
    -c  display checkpoints in human-readable format (by default checkpoints are not displayed)
    -v  display the complete schema table (by default only a subset is displayed)

Set the wiredtiger_open Config String

MongoDB configures WiredTiger so that (by default) the snappy compressor is used, and the WT transaction log files are placed under the “journal/” subdirectory.

Some wt subcommands don’t need to read the transaction logs, so you might survive without setting this. But if you do then “wt: __wt_log_scan, 2114: no log files found: Operation not supported” is the error message you will see, and you should follow the advice in this section.

Neither the “journal” transaction log subdirectory or snappy compression are defaults of the WT library. The current v10.0.0 wt’s help output shows “MongoDB wiredtiger_open configuration: log=(enabled=true,path=journal,compressor=snappy)” as the second or third line of output, but this is misleading. It is just a hint message, not the config string actually in effect 🙁 

So set this value in a config string so the wiredtiger connection API knows where and how to open the wiredtiger transaction log files. (Option setting method is explained in the “How to put the configuration into effect” section just a little bit further down the page.)

log=(compressor=snappy,path=journal)

As of May 2021: I do not include the log=(enabled=true,…) value because it doesn’t seem to be necessary, and it leads to a “WT_LOGSCAN_FIRST not set” error when using the wt printlog subcommand at least.

Config String Syntax

See http://source.wiredtiger.com/10.0.0/config_strings.html for notes about the supposedly “compatible with JavaScript Object Notation (JSON)” config string format that WT uses.

In the config strings you’ll see in WT or MongoDB code, and the ones used in this article, there are no “{}” brackets or the “:” key-value delimiter. Instead it will be “object=(key1=val1,key2=val2,..),array_val=[val1,val2,val3]” style. It’s weird JSON, but presumably also perfectly conforming to some standard of JSON written somewhere.

These config options are for the WT C API’s wiredtiger_open function which initializes a WT_CONNECTION object. (grep for “wiredtiger_open_configuration” in util_main.cpp if you want to confirm the code.)

How to Put the Configuration into Effect

Lowest precedence: WiredTiger.config file

This is probably the most comfortable option. Put the config string in a file WiredTiger.config in the data directory (i.e. alongside WiredTiger.wt, WiredTiger.turtle, all the collection and index *.wt files, etc.)

$ cat WiredTiger.config 
log=(compressor=snappy,path=journal)

http://source.wiredtiger.com/1.4.2/database_config.html#config_file

Middle precedence: WIREDTIGER_CONFIG environment variable

Settings here will override those coming from a WiredTiger.config file.

Eg. this setting will enable verbose log messages from places in the code where __wt_verbose(session, WT_VERB_METADATA, …) is called.

export WIREDTIGER_CONFIG="verbose=[metadata]"

http://source.wiredtiger.com/1.4.2/database_config.html#config_file

Highest precedence: Use the -C option

This will also include values merged from the previous two sources, but otherwise is the final say in what config string is passed into wiredtiger_open(.., .., config_string, &conn).

N.b. The -C value has to be before the subcommand. Eg.

wt -C "log=(compressor=snappy,path=journal)" list

Some other options of the “wt” util will add to this config string as well. Eg. -m adds “verify_metadata=true”, -R adds “log=(recover=on)”, etc.

Config merging rules

… aren’t clear to me. In practical testing, it seems to override sometimes but merge at other times. See the code of wiredtiger_open() if you want to figure it out.

Q. What wiredtiger_open() config does MongoDB use?

As it happens you can easily see the config string mongod uses as it initializes an internal connection to the WiredTiger library. Just search for “Opening WiredTiger” in mongod’s diagnostic log file. In MongoDB 4.4+, with the structured json-style log file, you can alternatively grep for a unique log message identifier number 22315.

$ grep -m 1 ‘Opening WiredTiger’ /data/node_s1n1/mongod.log

{“t”:{“$date”:”2021-04-27T21:58:09.580+09:00″},”s”:”I”,  “c”:”STORAGE”,  “id”:22315,   “ctx”:”initandlisten”,”msg”:”Opening WiredTiger”,”attr”:{“config”:”create,cache_size=7375M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],“}}

You could use this entire config string when starting the wt utility command, but you don’t need to. Some are for asynchronous processes, eg. eviction and file manager timers, which the wt util won’t run. And some I wouldn’t use: Eg. log=(archive=true) will discard old log files after recovery and checkpoint if you somehow manage to make that happen with just the wt command.

Summary

This article was a relatively brief guide on how to build the “wt” utility of the WiredTiger library; plus information regarding the runtime config values it needs when inspecting a data directory created by MongoDB.

Please see the next article “WiredTiger File Forensics (Part 2: wt dump)” to see how to use this tool with a mongod node’s data directory files.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK