12

CMU数据库(15-445)-实验2-B+树索引实现(中)删除

 3 years ago
source link: http://www.cnblogs.com/JayL-zxl/p/14329951.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

3. Delete 实现

附上实验2的第一部分:link: https://www.cnblogs.com/JayL-zxl/p/14324297.html

3. 1 删除算法原理

如果叶子结点中没有相应的key,则删除失败。否则执行下面的步骤

图片来自于这篇博文https://www.programiz.com/dsa/deletion-from-a-b-plus-tree

  • 情况1 要删除的要素就只在叶子结点

    1. 删除叶子结点中对应的key。删除后若结点的key的个数大于等于 \(\frac{m-1}{2}\) ,删除操作结束。

      删除40的例子如下

      AJJj6zV.png!mobile
    2. 若兄弟结点key有多余( \(>\frac{m-1}{2}\) ),向兄弟结点借一个关键字,然后用兄弟结点的median key替代父结点。

      删除5的例子如下

      jq6VZbM.png!mobile
  • 情况2 要删除的元素不仅在叶子结点而且在内部结点出现

    1. 如果结点中的元素个数 \(> \frac{m-1}{2}\) ,只需从叶结点删除key值,同时从内部节点删除键。用key元素的后继元素补充内部节点的空余空间。

      删除45

      2mqEBv7.png!mobile
    2. 如果节点中元素个数等于 \(\frac{m-1}{2}\) ,则删除该键并从其直接兄弟借一个键。用借来的键填充内部结点中所形成的空空间。

      删除35

      yUVjeqF.png!mobile
    3. 和情况2的第一种情况类似。只不过空洞结点是当前结点的祖父结点。

      删除25

      FvUFf26.png!mobile
  • 情况3 这种情况是树的高度会缩小的情况。

    这种情况有点复杂。请看删除55的例子

    YN36Jja.png!mobile

cmu这里给了演示网站 https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

算法描述见下表

rQbEVzn.png!mobile

3.2 删除算法实现

  1. 如果当前是空树则立即返回

  2. 否则先找到要删除的key所在的page

  3. 随后调用 RemoveAndDeleteRecord 在叶page上直接删除key值

    同样还是经典的二分查找

    INDEX_TEMPLATE_ARGUMENTS
    int B_PLUS_TREE_LEAF_PAGE_TYPE::RemoveAndDeleteRecord(const KeyType &key, const KeyComparator &comparator) {
    
      int l=0,r=GetSize()-1;
      if(l>r||comparator(key,array[l].first)<0||comparator(key,array[r].first)>0)return GetSize();
      while(l<=r){
        int mid=(l+r)>>1;
        if(comparator(key, KeyAt(mid)) < 0){
          r=mid;
        }
        else if (comparator(key, KeyAt(mid)) > 0) l=mid+1;
        else{
          memmove(array + mid, array + mid + 1,static_cast<size_t>((GetSize() - mid - 1)*sizeof(MappingType)));
          IncreaseSize(-1);
          break;
        }
      }
    
      return GetSize();
    }
  4. 删除之后的叶子结点有两种情况

叶子结点内关键字个数小于最小值向下执行。否则结束

-- 调用 CoalesceOrRedistribute

​ 1.如果当前结点是根节点则调用 AdjustRoot(node)

INDEX_TEMPLATE_ARGUMENTS
bool BPLUSTREE_TYPE::AdjustRoot(BPlusTreePage *old_root_node) {
  //case 2
  if (old_root_node->IsLeafPage()) {
    if (old_root_node->GetSize() == 0) {
      root_page_id_ = INVALID_PAGE_ID;
      UpdateRootPageId(false);
      return true;
    }
    return false;
  }
  //  case 1
  if (old_root_node->GetSize() == 2) {
    auto root =reinterpret_cast<BPlusTreeInternalPage<KeyType, page_id_t,KeyComparator> *>(old_root_node);
    root_page_id_ = root->ValueAt(1);
    UpdateRootPageId(false);
    auto page = buffer_pool_manager_->FetchPage(root_page_id_);
    if (page == nullptr) {
      throw "no page can used while AdjustRoot";
    }
    auto new_root =reinterpret_cast<BPlusTreeInternalPage<KeyType, page_id_t,KeyComparator> *>(page);
    new_root->SetParentPageId(INVALID_PAGE_ID);
    buffer_pool_manager_->UnpinPage(root_page_id_, true);
    return true;
  }
  return false;
}

2.否则应该找它的兄弟节点

默认都是找它左边的结点。如果当前已经在最左边即第一个我们找右边的结点

调用 CoalesceOrRedistribute

a. 如果兄弟结点的size+当前结点的size大于最大值则需要重新分配

-- 调用 Redistribute 函数

INDEX_TEMPLATE_ARGUMENTS
template <typename N>
bool BPLUSTREE_TYPE::CoalesceOrRedistribute(N *node, Transaction *transaction) {
  if (node->IsRootPage()) {
    return AdjustRoot(node);
  }
  if (node->IsLeafPage()) {
    if (node->GetSize() >= node->GetMinSize()) {
      return false;
    }
  } else {
    if (node->GetSize() > node->GetMinSize()) {
      return false;
    }
  }
  auto page = buffer_pool_manager_->FetchPage(node->GetParentPageId());
  if (page == nullptr) {
    throw "no page can used while CoalesceOrRedistribute";
  }
  auto parent =reinterpret_cast<BPlusTreeInternalPage<KeyType, page_id_t,KeyComparator> *>(page);
  int value_index = parent->ValueIndex(node->GetPageId());
  //sibling page  always find left page
  int sibling_page_id;
  if (value_index == 0) {
    sibling_page_id = parent->ValueAt(value_index + 1);
  } else {
    sibling_page_id = parent->ValueAt(value_index - 1);
  }

  // fetch sibling node
  auto sibling_page = buffer_pool_manager_->FetchPage(sibling_page_id);
  if (sibling_page == nullptr) {
    throw Exception("all page are pinned while CoalesceOrRedistribute");
  }

  // put sibling node to PageSet
  sibling_page->WLatch();
  transaction->AddIntoPageSet(sibling_page);
  auto sibling = reinterpret_cast<N *>(sibling_page);
  bool is_redistribute = false;
  // If sibling's size + input
  //  page's size > page's max size, then redistribute.
  if (sibling->GetSize() + node->GetSize() > node->GetMaxSize()) {
    is_redistribute = true;
    //TODO need to modify parent
    buffer_pool_manager_->UnpinPage(parent->GetPageId(), true);
  }
  // exec  redistribute
  if (is_redistribute) {
    Redistribute<N>(sibling, node, value_index);
    return false;
  }

 //Otherwise, merge.
  bool ret;
  if (value_index == 0) {
    Coalesce<N>(node, sibling, parent, 1, transaction);
    transaction->AddIntoDeletedPageSet(sibling_page_id);
    // node should not be deleted
    ret = false;
  } else {
    Coalesce<N>(sibling, node, parent, value_index, transaction);
    // node should be deleted
    ret = true;
  }
  //TODO unpin parent
  buffer_pool_manager_->UnpinPage(parent->GetPageId(), true);
  return ret;
}

重新分配的时候有两种情况

(1) 移动它左边结点最大的的元素到当前结点的第一个元素---对应 MoveLastToFrontOf 函数

这里20年版本的实验把之前版本的传递index改成了传递key值的引用。并且没有等号可以用,emm为了偷懒我把它改成了和17年实验一样的设置,这里注意对于实验给你的头文件好多需要修改。不然模版类就会报错

注意这里对于 internalPageLeafPage 并不一样

首先看对于 LeafPage 的实现

整体逻辑非常简单

  1. 就是把元素append到末尾
  2. 然后就是修改父亲结点的元素。
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_LEAF_PAGE_TYPE::MoveLastToFrontOf(BPlusTreeLeafPage *recipient,int parentIndex,
                                                   BufferPoolManager *buffer_pool_manager) {
  MappingType pair = GetItem(GetSize() - 1);
  IncreaseSize(-1);
  recipient->CopyFirstFrom(pair, parentIndex, buffer_pool_manager);
}
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_LEAF_PAGE_TYPE::CopyFirstFrom(const MappingType &item, int parentIndex,
                                               BufferPoolManager *buffer_pool_manager) {
  assert(GetSize() + 1 < GetMaxSize());
  memmove(array + 1, array, GetSize()*sizeof(MappingType));
  IncreaseSize(1);
  array[0] = item;

  auto page = buffer_pool_manager->FetchPage(GetParentPageId());
  if (page == nullptr) {
    throw "no page can used while CopyFirstFrom";
  }
  // get parent
  auto parent =reinterpret_cast<BPlusTreeInternalPage<KeyType, decltype(GetPageId()),KeyComparator> *>(page->GetData());

  parent->SetKeyAt(parentIndex, item.first);

  buffer_pool_manager->UnpinPage(GetParentPageId(), true);
}

然后看对于 InternalPage 的实现

  1. 这里和 leafpage 不一样的就是最后一个元素在 GetSize()
  2. 这里要修改移动元素 value 值(所指向的结点)的 parent 结点
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_INTERNAL_PAGE_TYPE::MoveLastToFrontOf(BPlusTreeInternalPage *recipient,  int parent_index,
                                                       BufferPoolManager *buffer_pool_manager) {
  assert(GetSize() > 1);
  IncreaseSize(-1);
  MappingType pair = array[GetSize()];
  page_id_t child_page_id = pair.second;
  
  recipient->CopyFirstFrom(pair,parent_index, buffer_pool_manager);

  // update parent page id
  auto page = buffer_pool_manager->FetchPage(child_page_id);
  if (page == nullptr) {
    throw "no page can used while MoveLastFrontOf";
  }
  //把要移动元素所指向的结点的parent指针修改。
  auto child = reinterpret_cast<BPlusTreePage *>(page->GetData());
  child->SetParentPageId(recipient->GetPageId());

  assert(child->GetParentPageId() == recipient->GetPageId());
  buffer_pool_manager->UnpinPage(child->GetPageId(), true);
}

/* Append an entry at the beginning.
 * Since it is an internal page, the moved entry(page)'s parent needs to be updated.
 * So I need to 'adopt' it by changing its parent page id, which needs to be persisted with BufferPoolManger
 */
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_INTERNAL_PAGE_TYPE::CopyFirstFrom(const MappingType &pair, int parent_index,BufferPoolManager *buffer_pool_manager) {
  assert(GetSize() + 1 < GetMaxSize());

  auto page = buffer_pool_manager->FetchPage(GetParentPageId());
  if (page == nullptr) {
    throw "no page can used while CopyFirstFrom";
  }
  auto parent = reinterpret_cast<BPlusTreeInternalPage *>(page->GetData());

  auto key = parent->KeyAt(parent_index);

  // set parent key to the last of current page
  parent->SetKeyAt(parent_index, pair.first);

  InsertNodeAfter(array[0].second, key, array[0].second);
  array[0].second = pair.second;

  buffer_pool_manager->UnpinPage(parent->GetPageId(), true);
}

(2) 移动它右边结点最小的元素到当前结点的最后一个元素---对应了 MoveFirstToEndOf 函数

注意这里对于 internalPageLeafPage 并不一样

首先看对于 LeafPage 的实现

memmove
CopyLastFrom
node
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_LEAF_PAGE_TYPE::MoveFirstToEndOf(BPlusTreeLeafPage *recipient,BufferPoolManager *buffer_pool_manager) {
  MappingType pair = GetItem(0);
  IncreaseSize(-1);
  memmove(array, array + 1, static_cast<size_t>(GetSize()*sizeof(MappingType)));

  recipient->CopyLastFrom(pair);

  auto page = buffer_pool_manager->FetchPage(GetParentPageId());
  if (page == nullptr) {
    throw "no page can used while MoveFirstToEndOf";
  }
  auto parent =reinterpret_cast<BPlusTreeInternalPage<KeyType, decltype(GetPageId()),KeyComparator> *>(page->GetData());
  parent->SetKeyAt(parent->ValueIndex(GetPageId()), pair.first);
  buffer_pool_manager->UnpinPage(GetParentPageId(), true);
}

/*
 * Copy the item into the end of my item list. (Append item to my array)
 */
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_LEAF_PAGE_TYPE::CopyLastFrom(const MappingType &item) {
  assert(GetSize() + 1 <= GetMaxSize());
  array[GetSize()] = item;
  IncreaseSize(1);
}

然后看对于 InternalPage 的实现

internalPage
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_INTERNAL_PAGE_TYPE::MoveFirstToEndOf(BPlusTreeInternalPage *recipient,BufferPoolManager *buffer_pool_manager) {
  assert(GetSize() > 1);
  MappingType pair{KeyAt(1), ValueAt(0)};
  page_id_t child_page_id = ValueAt(0);
  SetValueAt(0, ValueAt(1));
  Remove(1);

  recipient->CopyLastFrom(pair, buffer_pool_manager);

  // update child parent page id
  auto page = buffer_pool_manager->FetchPage(child_page_id);
  if (page == nullptr) {
    throw "no page can  used while MoveFirstToEndOf";
  }
  auto child = reinterpret_cast<BPlusTreePage *>(page);
  child->SetParentPageId(recipient->GetPageId());

  assert(child->GetParentPageId() == recipient->GetPageId());
  buffer_pool_manager->UnpinPage(child->GetPageId(), true);
}

/* Append an entry at the end.
 * Since it is an internal page, the moved entry(page)'s parent needs to be updated.
 * So I need to 'adopt' it by changing its parent page id, which needs to be persisted with BufferPoolManger
 */
INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_INTERNAL_PAGE_TYPE::CopyLastFrom(const MappingType &pair, BufferPoolManager *buffer_pool_manager) {
  assert(GetSize() + 1 <= GetMaxSize());

  auto page = buffer_pool_manager->FetchPage(GetParentPageId());
  if (page == nullptr) {
    throw Exception("all page are pinned while CopyLastFrom");
  }
  auto parent = reinterpret_cast<BPlusTreeInternalPage *>(page);

  auto index = parent->ValueIndex(GetPageId());
  auto key = parent->KeyAt(index + 1);

  array[GetSize()] = {key, pair.second};
  IncreaseSize(1);
  parent->SetKeyAt(index + 1, pair.first);
  buffer_pool_manager->UnpinPage(parent->GetPageId(), true);
}

b.否则需要进行merge操作

-- 调用 Coalesce 函数

node
CoalesceOrRedistribute
INDEX_TEMPLATE_ARGUMENTS
template <typename N>
void BPLUSTREE_TYPE::Coalesce(N *neighbor_node, N *node,BPlusTreeInternalPage<KeyType, page_id_t, KeyComparator> *parent,int index, Transaction *transaction) {
  // assumption: neighbor_node is predecessor of node
  //LOG_DEBUG("index %d",index);
  node->MoveAllTo(neighbor_node,index,buffer_pool_manager_);
  LOG_DEBUG("size %d",node->GetSize());
  // adjust parent
  parent->Remove(index);

  //recursive
  if (CoalesceOrRedistribute(parent, transaction)) {
    transaction->AddIntoDeletedPageSet(parent->GetPageId());
  }

}

Internal内的 Remove 函数

INDEX_TEMPLATE_ARGUMENTS
void B_PLUS_TREE_INTERNAL_PAGE_TYPE::Remove(int index) {
  assert(0 <= index && index < GetSize());
  memmove(array+index,array+index+1,(GetSize()-index-1)*sizeof(MappingType));
  IncreaseSize(-1);

}

好了删除算法已经实现了。首先我们可以通过test函数

cd build
make b_plus_tree_delete_test
./test/b_plus_tree_delete_test --gtest_also_run_disabled_tests
zAV7v2Z.png!mobile

然后我们自己做一些test。这里我就拿一个例子来看

插入10、5、7、4、9得到下图是正确的:star2:

QVf6Rju.png!mobile

然后删除元素7

7zYbyq7.png!mobile

可以发现是完全正确的好了。第二部分就完成了。下面就是最后一部分对于:lock:的实现和迭代器的实现


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK