7

Fixes namespaced UUID generation for namespace IDs represented as strings by eri...

 2 years ago
source link: https://github.com/rails/rails/pull/37682
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Summary

Updates the Digest::UUID.uuid_from_hash implementation to return the correct UUID value if the uuid_namespace parameter is provided as a String instead of using one of the pre-defined constants.

Other Information

The patch is backwards-compatible with the previous behaviour of using the constants defined in Digest::UUID as the namespace ID.

With the new behaviour, all the calls below should return the same value:

require "digest"
require "active_support/core_ext/digest"

Digest::UUID.uuid_v5(Digest::UUID::DNS_NAMESPACE, "www.widgets.com")
=> "21f7f8de-8051-5b89-8680-0195ef798b6a"
Digest::UUID.uuid_v5("6BA7B810-9DAD-11D1-80B4-00C04FD430C8", "www.widgets.com")
=> "21f7f8de-8051-5b89-8680-0195ef798b6a"
Digest::UUID.uuid_v5("6ba7b810-9dad-11d1-80b4-00c04fd430c8", "www.widgets.com")
=> "21f7f8de-8051-5b89-8680-0195ef798b6a"

Benchmarks

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "activesupport", "6.0.0"
  gem "benchmark-ips"
end

require "digest"
require "active_support/core_ext/digest"

# Your patch goes here.
module Digest
  module UUID
    def self.new_uuid_from_hash(hash_class, uuid_namespace, name)
      if hash_class == Digest::MD5
        version = 3
      elsif hash_class == Digest::SHA1
        version = 5
      else
        raise ArgumentError, "Expected Digest::SHA1 or Digest::MD5, got #{hash_class.name}."
      end

      packed_namespace_uuid = uuid_namespace.length == 36 ? pack_uuid(uuid_namespace) : uuid_namespace

      hash = hash_class.new
      hash.update(packed_namespace_uuid)
      hash.update(name)

      ary = hash.digest.unpack("NnnnnN")
      ary[2] = (ary[2] & 0x0FFF) | (version << 12)
      ary[3] = (ary[3] & 0x3FFF) | 0x8000

      "%08x-%04x-%04x-%04x-%04x%08x" % ary
    end

    def self.pack_uuid(uuid)
      uuid.scan(/(\h{8})-(\h{4})-(\h{4})-(\h{4})-(\h{4})(\h{8})/).flatten.map { |s| s.to_i(16) }.pack("NnnnnN")
    end
  end
end

# Enumerate some representative scenarios here.
#
# It is very easy to make an optimization that improves performance for a
# specific scenario you care about but regresses on other common cases.
# Therefore, you should test your change against a list of representative
# scenarios. Ideally, they should be based on real-world scenarios extracted
# from production applications.
SCENARIOS = {
  "Binary namespace ID"      => Digest::UUID::DNS_NAMESPACE,
  "Lower-cased namespace ID" => "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
  "Upper-cased namespace ID" => "6BA7B810-9DAD-11D1-80B4-00C04FD430C8"
}

VALUE = "www.widgets.com"

SCENARIOS.each_pair do |name, uuid_namespace|
  puts
  puts " #{name} ".center(80, "=")
  puts

  Benchmark.ips do |x|
    x.report("old_uuid_from_hash_md5") { Digest::UUID.uuid_from_hash(Digest::MD5, uuid_namespace, VALUE) }
    x.report("new_uuid_from_hash_md5") { Digest::UUID.new_uuid_from_hash(Digest::MD5, uuid_namespace, VALUE) }
    x.compare!
  end

  Benchmark.ips do |x|
    x.report("old_uuid_from_hash_sha1") { Digest::UUID.uuid_from_hash(Digest::SHA1, uuid_namespace, VALUE) }
    x.report("new_uuid_from_hash_sha1") { Digest::UUID.new_uuid_from_hash(Digest::SHA1, uuid_namespace, VALUE) }
    x.compare!
  end
end

Results

ruby benchmark.rb
Fetching gem metadata from https://rubygems.org/..............
Resolving dependencies...
Using concurrent-ruby 1.1.5
Using i18n 1.7.0
Using minitest 5.13.0
Using thread_safe 0.3.6
Using tzinfo 1.2.5
Using zeitwerk 2.2.1
Using activesupport 6.0.0
Using benchmark-ips 2.7.2
Using bundler 2.0.2

============================= Binary namespace ID ==============================

Warming up --------------------------------------
old_uuid_from_hash_md5
                        14.605k i/100ms
new_uuid_from_hash_md5
                        14.328k i/100ms
Calculating -------------------------------------
old_uuid_from_hash_md5
                        166.129k (±10.3%) i/s -    832.485k in   5.074024s
new_uuid_from_hash_md5
                        163.218k (±10.5%) i/s -    816.696k in   5.080808s

Comparison:
old_uuid_from_hash_md5:   166128.7 i/s
new_uuid_from_hash_md5:   163218.5 i/s - same-ish: difference falls within error

Warming up --------------------------------------
old_uuid_from_hash_sha1
                        14.660k i/100ms
new_uuid_from_hash_sha1
                        14.050k i/100ms
Calculating -------------------------------------
old_uuid_from_hash_sha1
                        163.423k (±11.6%) i/s -    806.300k in   5.021001s
new_uuid_from_hash_sha1
                        159.354k (± 9.9%) i/s -    800.850k in   5.090466s

Comparison:
old_uuid_from_hash_sha1:   163422.6 i/s
new_uuid_from_hash_sha1:   159353.9 i/s - same-ish: difference falls within error


=========================== Lower-cased namespace ID ===========================

Warming up --------------------------------------
old_uuid_from_hash_md5
                        14.681k i/100ms
new_uuid_from_hash_md5
                         6.007k i/100ms
Calculating -------------------------------------
old_uuid_from_hash_md5
                        168.224k (± 8.2%) i/s -    836.817k in   5.012074s
new_uuid_from_hash_md5
                         68.417k (±19.1%) i/s -    324.378k in   5.033972s

Comparison:
old_uuid_from_hash_md5:   168224.3 i/s
new_uuid_from_hash_md5:    68417.4 i/s - 2.46x  slower

Warming up --------------------------------------
old_uuid_from_hash_sha1
                        14.321k i/100ms
new_uuid_from_hash_sha1
                         5.992k i/100ms
Calculating -------------------------------------
old_uuid_from_hash_sha1
                        165.424k (± 7.6%) i/s -    830.618k in   5.055094s
new_uuid_from_hash_sha1
                         67.677k (±19.0%) i/s -    323.568k in   5.070097s

Comparison:
old_uuid_from_hash_sha1:   165423.8 i/s
new_uuid_from_hash_sha1:    67676.6 i/s - 2.44x  slower


=========================== Upper-cased namespace ID ===========================

Warming up --------------------------------------
old_uuid_from_hash_md5
                        14.970k i/100ms
new_uuid_from_hash_md5
                         6.060k i/100ms
Calculating -------------------------------------
old_uuid_from_hash_md5
                        168.793k (± 8.5%) i/s -    838.320k in   5.012564s
new_uuid_from_hash_md5
                         68.246k (±19.0%) i/s -    327.240k in   5.082545s

Comparison:
old_uuid_from_hash_md5:   168793.3 i/s
new_uuid_from_hash_md5:    68245.5 i/s - 2.47x  slower

Warming up --------------------------------------
old_uuid_from_hash_sha1
                        14.683k i/100ms
new_uuid_from_hash_sha1
                         5.972k i/100ms
Calculating -------------------------------------
old_uuid_from_hash_sha1
                        164.037k (± 9.5%) i/s -    822.248k in   5.063807s
new_uuid_from_hash_sha1
                         67.914k (±18.9%) i/s -    322.488k in   5.033210s

Comparison:
old_uuid_from_hash_sha1:   164037.2 i/s
new_uuid_from_hash_sha1:    67913.9 i/s - 2.42x  slower

As can be seen from the benchmarks above, performance remains roughly the same for binary representations of namespace IDs, so no impact should be noted in this case.

For the String version of namespace IDs, there's a performance impact due to the extra work required to process them, but given that the current implementation returns the wrong value back in those cases, it seems like it's a fair trade-off.

Closes #37681


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK