Hope: High-Speed Order-Preserving Encoder

High-speed Order-Preserving Encoder (HOPE)

HOPEis a fast dictionary-based compressor that encodes arbitrary byte-strings while preserving their order. It is optimized for compressing database index keys. Detailed description can be found in our SIGMOD paper .

Install Dependencies

sudo apt-get install build-essential cmake libgtest.dev
cd /usr/src/gtest
sudo cmake CMakeLists.txt
sudo make
sudo cp *.a /usr/lib

Build

mkdir build
cd build
cmake ..
make -j

Usage Example

A simple example can be found here . To run the example:

cd build
./example

Unit Tests

make test

Benchmark

./scripts/run_experiment.sh [OPTION]

We included a sample of the Wiki and URL datasets in this repository. To reproduce the results in our paper, please download the full datasets (download links are in the paper) to replace the samples. Our Email dataset is private. You need to provide your own email list (email.txt) to run the corresponding experiments. Below are options to facilitate running a subset of the full benchmark:

Options
  -r, --repeat_times=N
    Run each experiment N times and report the average measurements. Default: 1.
  --email, --wiki, --url
    Run the benchmark using the Email/Wiki/URL dataset.
    If unspecified, the scripts includes the Wiki and URL experiments.
  --alldatasets
    Include benchmarks for all three datasets.
  --alm
    Include the alm-based encoders. The other encoders (Single, Double, 3-gram, 4-gram) are enabled by default.
  --surf, --art, --hot, --btree, --prefixbtree
    Run the SuRF/ART/HOT/B+tree/prefix B+tree benchmark suite.
  --all
    Run the full benchmark. If unspecified, the script only runs the microbenchmarks for Wiki and URL.

The above script will record benchmark measurements under "results/". The master plotting script is under "scripts/". The individual scripts are under "plots/". Generated figures will be under "figures/". Make sure you run the benchmark with the --alm option on before using the plotting scripts.

License

Licensed under the Apache License 2.0 .

High-speed Order-Preserving Encoder (HOPE)

Install Dependencies

Build

Usage Example

Unit Tests

Benchmark

License

Recommend

阻力设计在产品中的应用

人怎样才能活得久一点？能健健康康的超过60岁也不是一件特别容易的事情

Tour of Data Preparation Techniques for Machine Learning

The Return of the 90s Web

Making the AI Journey from Public Cloud to On-prem

Facebook前高管: FB苹果等科技巨头未来10年将被分拆

Project Zero: FF Sandbox Escape (CVE-2020-12388)

From model fitting to production in seconds

Silhouette or Elbow? That is the Question.

京东618累计下单金额达到2692亿元创下新的纪录

About Joyk