12

Sweet Snippet 之 字符串编辑距离

 3 years ago
source link: https://blog.csdn.net/tkokof1/article/details/100709721
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Sweet Snippet 之 字符串编辑距离

字符串编辑距离的简单实现

字符串编辑距离应该是动态规划中的代表问题了:

给定两个字符串 a a a 与 b b b,求解将 a a a 编辑至 b b b 的操作步数(距离),编辑包含以下两种操作:

  • 删除某一字符
  • 增加某一字符

(这里我们不允许变更某一字符,注意一下)

求解方法则是根据子问题的结果"递推"出原问题的结果:

设字符串 a a a 的长度为 m m m, 字符串 b b b 的长度为 n n n, 我们定义问题 C ( i , j ) C(i, j) C(i,j)

C ( i , j ) C(i, j) C(i,j) : a a a 的(前缀)子串(长度为 i i i) 与 b b b 的(前缀)子串(长度为 j j j) 的字符串编辑距离.

接着就是 C ( i , j ) C(i, j) C(i,j) 的递推公式了(这里就不做细节的讲述了,不熟悉的朋友可以参考进一步的资料)

C ( i , j ) = { i , i f   j = 0 j , i f   i = 0 C ( i − 1 , j − 1 ) , i f   a [ i ] = b [ j ] m i n ( C ( i − 1 , j ) , C ( i , j − 1 ) ) + 1 , o t h e r w i s e C(i, j) = \left\{

\begin{aligned} % & 0, & if \ i = 0\ and\ j = 0 \\ & i, & if \ j = 0 \\ & j, & if \ i = 0 \\ & C(i - 1, j - 1), & if\ a[i] = b[j] \\ & min(C(i - 1, j), C(i, j - 1)) + 1, & otherwise \end{aligned}\begin{aligned} % & 0, & if \ i = 0\ and\ j = 0 \\ & i, & if \ j = 0 \\ & j, & if \ i = 0 \\ & C(i - 1, j - 1), & if\ a[i] = b[j] \\ & min(C(i - 1, j), C(i, j - 1)) + 1, & otherwise \end{aligned}
\right. C(i,j)=⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧​​i,j,C(i−1,j−1),min(C(i−1,j),C(i,j−1))+1,​if j=0if i=0if a[i]=b[j]otherwise​

下面简单列份实现(Lua):

-- get key from two index
function get_key(m, n)
    return m .. "_" .. n
end

function edit_dist_iter(a, b, m, n)
    local edit_dist_buffer = {}
    
    edit_dist_buffer[get_key(0, 0)] = 0
    
    for i = 1, m do
        edit_dist_buffer[get_key(i, 0)] = i
    end
    
    for i = 1, n do
        edit_dist_buffer[get_key(0, i)] = i
    end
    
    for i = 1, m do
        for j = 1, n do
            local ac = a:sub(i, i)
            local bc = b:sub(j, j)
            if ac == bc then
                edit_dist_buffer[get_key(i, j)] = edit_dist_buffer[get_key(i - 1, j - 1)]
            else
                local d1 = edit_dist_buffer[get_key(i - 1, j)]
                local d2 = edit_dist_buffer[get_key(i, j - 1)]
                edit_dist_buffer[get_key(i, j)] = math.min(d1, d2) + 1
            end
        end
    end
    
    return edit_dist_buffer[get_key(m, n)]
end

function edit_dist(a, b)
    return edit_dist_iter(a, b, #a, #b)
end

以上的代码使用了迭代形式,我们也可以用递归形式(来编写代码),只是递归会引起不少的重复计算,所以(工程)实现上,我们需要使用缓存来记录计算过的子问题结果(迭代版本也使用了缓存,作用上和递归版本其实也是一致的,记录的也是子问题的结果):

-- get key from two index
function get_key(m, n)
    return m .. "_" .. n
end

function edit_dist_recur(a, b, m, n, buffer)
    if m <= 0 then
        -- result is trivial, do not need buffer
        return n
    elseif n <= 0 then
        -- result is trivial, do not need buffer
        return m
    else
        local ac = a:sub(m, m)
        local bc = b:sub(n, n)
        if ac == bc then
            local d = buffer[get_key(m - 1, n - 1)]
            if d then
                buffer[get_key(m, n)] = d
                return d
            else
                local d = edit_dist_recur(a, b, m - 1, n - 1, buffer)
                buffer[get_key(m, n)] = d
                return d
            end
        else
            local d1 = buffer[get_key(m - 1, n)]
            if not d1 then
                d1 = edit_dist_recur(a, b, m - 1, n, buffer)
            end
            
            local d2 = buffer[get_key(m, n - 1)]
            if not d2 then
                d2 = edit_dist_recur(a, b, m, n - 1, buffer)
            end
            
            local d = math.min(d1, d2) + 1
            buffer[get_key(m, n)] = d
            return d
        end
    end
end

function edit_dist(a, b)
    -- create buffer
    local edit_dist_buffer = {}
    return edit_dist_recur(a, b, #a, #b, edit_dist_buffer)
end

另外还看到一种基于编辑图(Edit Graph)的实现方式,不过思路上仍然和之前的讲述是一致的,实现上则会更复杂些,在此就不列代码了~


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK