c++ - Implementing Disjoint Sets (Union Find) in C++

Question

I am trying to implement Disjoint Sets for use in Kruskal's algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After reading the Wikipedia description of Disjoint Sets and after reading the description in Introduction to Algorithms (Cormen et al) I have come up with the following:

    class DisjointSet
    {
    public:
        class   Node 
        {
        public:
            int         data;
            int         rank;

            Node*       parent;

            Node() : data(0), 
                     rank(0),
                     parent(this) { } // the constructor does MakeSet
        };

        Node*   find(Node*);
        Node*   merge(Node*, Node*); // Union
    };


    DisjointSet::Node*   DisjointSet::find(DisjointSet::Node* n)
    {
        if (n != n->parent) {
            n->parent = find(n->parent);
        }
        return n->parent;
    }

    DisjointSet::Node* DisjointSet::merge(DisjointSet::Node* x,
                                          DisjointSet::Node* y)
    {
        x = find(x);
        y = find(y);

        if (x->rank > y->rank) {
            y->parent = x;
        } else {
            x->parent = y;
            if (x->rank == y->rank) {
                ++(y->rank);
            }
        }
    }

I am pretty sure this is incomplete though and that I am missing something.

Introduction to Algorithms mentions that there should be a forest of trees, but it does not give any explanation for a practical implementation of this forest. I watched CS 61B Lecture 31: Disjoint Sets ( http://www.youtube.com/watch?v=wSPAjGfDl7Q ) and here the lecturer uses only an array to store both the forest and all its trees and values. There is no explicit 'Node' type of class as I have, mentioned. I have also found many other sources (I cannot post more than one link), which also use this technique. I would be happy to do this, except that this relies on the indices of the array for lookup and since I want to store values of type other than int, I need to use something else (std::map comes to mind).

Another issue that I am unsure about is memory allocation because I am using C++. I am storing trees of pointers and my MakeSet operation will be: new DisjointSet::Node; . Now, these Nodes only have pointers to their parents, so I'm not sure how to find the bottom of a tree. How will I be able to traverse my trees to deallocate them all?

I understand the basic concept of this data structure, but I'm just a bit confused about the implementation. Any advice and suggestions would be most welcome, thank you.

score 3 · Accepted Answer

无论如何都不是一个完美的实现（毕竟我确实写了它！），但这有帮助吗？

/***
 * millipede: DisjointSetForest.h
 * Copyright Stuart Golodetz, 2009. All rights reserved.
 ***/

#ifndef H_MILLIPEDE_DISJOINTSETFOREST
#define H_MILLIPEDE_DISJOINTSETFOREST

#include <map>

#include <common/exceptions/Exception.h>
#include <common/io/util/OSSWrapper.h>
#include <common/util/NullType.h>

namespace mp {

/**
@brief  A disjoint set forest is a fairly standard data structure used to represent the partition of
        a set of elements into disjoint sets in such a way that common operations such as merging two
        sets together are computationally efficient.

This implementation uses the well-known union-by-rank and path compression optimizations, which together
yield an amortised complexity for key operations of O(a(n)), where a is the (extremely slow-growing)
inverse of the Ackermann function.

The implementation also allows clients to attach arbitrary data to each element, which can be useful for
some algorithms.

@tparam T   The type of data to attach to each element (arbitrary)
*/
template <typename T = NullType>
class DisjointSetForest
{
    //#################### NESTED CLASSES ####################
private:
    struct Element
    {
        T m_value;
        int m_parent;
        int m_rank;

        Element(const T& value, int parent)
        :   m_value(value), m_parent(parent), m_rank(0)
        {}
    };

    //#################### PRIVATE VARIABLES ####################
private:
    mutable std::map<int,Element> m_elements;
    int m_setCount;

    //#################### CONSTRUCTORS ####################
public:
    /**
    @brief  Constructs an empty disjoint set forest.
    */
    DisjointSetForest()
    :   m_setCount(0)
    {}

    /**
    @brief  Constructs a disjoint set forest from an initial set of elements and their associated values.

    @param[in]  initialElements     A map from the initial elements to their associated values
    */
    explicit DisjointSetForest(const std::map<int,T>& initialElements)
    :   m_setCount(0)
    {
        add_elements(initialElements);
    }

    //#################### PUBLIC METHODS ####################
public:
    /**
    @brief  Adds a single element x (and its associated value) to the disjoint set forest.

    @param[in]  x       The index of the element
    @param[in]  value   The value to initially associate with the element
    @pre
        -   x must not already be in the disjoint set forest
    */
    void add_element(int x, const T& value = T())
    {
        m_elements.insert(std::make_pair(x, Element(value, x)));
        ++m_setCount;
    }

    /**
    @brief  Adds multiple elements (and their associated values) to the disjoint set forest.

    @param[in]  elements    A map from the elements to add to their associated values
    @pre
        -   None of the elements to be added must already be in the disjoint set forest
    */
    void add_elements(const std::map<int,T>& elements)
    {
        for(typename std::map<int,T>::const_iterator it=elements.begin(), iend=elements.end(); it!=iend; ++it)
        {
            m_elements.insert(std::make_pair(it->first, Element(it->second, it->first)));
        }
        m_setCount += elements.size();
    }

    /**
    @brief  Returns the number of elements in the disjoint set forest.

    @return As described
    */
    int element_count() const
    {
        return static_cast<int>(m_elements.size());
    }

    /**
    @brief  Finds the index of the root element of the tree containing x in the disjoint set forest.

    @param[in]  x   The element whose set to determine
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    int find_set(int x) const
    {
        Element& element = get_element(x);
        int& parent = element.m_parent;
        if(parent != x)
        {
            parent = find_set(parent);
        }
        return parent;
    }

    /**
    @brief  Returns the current number of disjoint sets in the forest (i.e. the current number of trees).

    @return As described
    */
    int set_count() const
    {
        return m_setCount;
    }

    /**
    @brief  Merges the disjoint sets containing elements x and y.

    If both elements are already in the same disjoint set, this is a no-op.

    @param[in]  x   The first element
    @param[in]  y   The second element
    @pre
        -   Both x and y must be elements in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    */
    void union_sets(int x, int y)
    {
        int setX = find_set(x);
        int setY = find_set(y);
        if(setX != setY) link(setX, setY);
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    T& value_of(int x)
    {
        return get_element(x).m_value;
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    const T& value_of(int x) const
    {
        return get_element(x).m_value;
    }

    //#################### PRIVATE METHODS ####################
private:
    Element& get_element(int x) const
    {
        typename std::map<int,Element>::iterator it = m_elements.find(x);
        if(it != m_elements.end()) return it->second;
        else throw Exception(OSSWrapper() << "No such element: " << x);
    }

    void link(int x, int y)
    {
        Element& elementX = get_element(x);
        Element& elementY = get_element(y);
        int& rankX = elementX.m_rank;
        int& rankY = elementY.m_rank;
        if(rankX > rankY)
        {
            elementY.m_parent = x;
        }
        else
        {
            elementX.m_parent = y;
            if(rankX == rankY) ++rankY;
        }
        --m_setCount;
    }
};

}

#endif

score 3 · Accepted Answer

您的实现看起来不错（除了在函数合并中，您应该声明返回 void 或将返回放在那里，我更喜欢返回 void）。问题是您需要跟踪Nodes*. 你可以通过vector<DisjointSet::Node*>在你的DisjointSet类上使用 a 或在vector其他地方使用它并声明DisjointSetas的方法来做到这一点static。

这是一个运行示例（请注意，我将 merge 更改为 return void 并且没有更改DisjointSetto be的方法static：

int main()
{
    vector<DisjointSet::Node*> v(10);
    DisjointSet ds;

    for (int i = 0; i < 10; ++i) {
        v[i] = new DisjointSet::Node();
        v[i]->data = i;
    }

    int x, y;

    while (cin >> x >> y) {
        ds.merge(v[x], v[y]);
    }

    for (int i = 0; i < 10; ++i) {
        cout << v[i]->data << ' ' << v[i]->parent->data << endl;
    }

    return 0;
}

使用此输入：

它打印预期的：

你的森林是树木的组成：

   7    1       5    6   9
 /    / | \              |
0    2  3  4             8

因此，您的算法很好，据我所知，Union-find 具有最佳复杂性，并且您可以Node在vector. 所以你可以简单地解除分配：

for (int i = 0; i < int(v.size()); ++i) {
    delete v[i];
}

score 3 · Accepted Answer

Boost 有一个不相交的集合实现：

http://www.boost.org/doc/libs/1_54_0/libs/disjoint_sets/disjoint_sets.html

score 2 · Accepted Answer

我不能谈论算法，但是对于内存管理，通常你会使用一个叫做智能指针的东西，它会释放它指向的东西。您可以获得共享所有权和单一所有权智能指针，以及非所有权。正确使用这些将保证没有内存问题。

score 0 · Accepted Answer

看看这段代码

class Node {
    int id,rank,data;
    Node *parent;

public :

    Node(int id,int data) {
        this->id = id;
        this->data = data;
        this->rank =0;
        this->parent = this;
    }

    friend class DisjointSet;
};

class DisjointSet {
    unordered_map<int,Node*> forest;

    Node *find_set_helper(Node *aNode) {
        if( aNode->parent != aNode)
            aNode->parent = find_set_helper(aNode->parent);

        return aNode->parent;
    }

    void link(Node *xNode,Node *yNode) {
        if( xNode->rank > yNode->rank)
            yNode->parent = xNode;
        else if(xNode-> rank < yNode->rank)
            xNode->parent = yNode;
        else {
            xNode->parent = yNode;
            yNode->rank++;
        }
    }

public:
    DisjointSet() {

    }

    void make_set(int id,int data) {
        Node *aNode = new Node(id,data);
        this->forest.insert(make_pair(id,aNode));
    }

    void Union(int xId, int yId) {
        Node *xNode = find_set(xId);
        Node *yNode = find_set(yId);

        if(xNode && yNode)
            link(xNode,yNode);
    }

    Node* find_set(int id) {
        unordered_map<int,Node*> :: iterator itr = this->forest.find(id);
        if(itr == this->forest.end())
            return NULL;

        return this->find_set_helper(itr->second);
    }

    void print() {
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            cout<<"\nid : "<<itr->second->id<<"  parent :"<<itr->second->parent->id;
        }
    }

    ~DisjointSet(){
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            delete (itr->second);
        }
    }

};

score 0 · Accepted Answer

你的实现很好。您现在需要做的就是保留一组不相交的集合节点，以便您可以在它们上调用 union/find 方法。

对于 Kruskal 算法，您需要一个数组，其中每个图顶点包含一个不相交集节点。然后，当您查找要添加到子图中的下一条边时，您将使用 find 方法检查这些节点是否都已经在您的子图中。如果是，那么您可以继续前进到下一个边缘。否则，是时候将该边添加到您的子图中并在由该边连接的两个顶点之间执行联合操作。

score 0 · Accepted Answer

这篇博客文章展示了一个使用路径压缩的 C++ 实现：http: //toughprogramming.blogspot.com/2013/04/implementing-disjoint-sets-in-c.html

score 0 · Accepted Answer

为了从头开始实现不相交集，我强烈推荐阅读Mark A. Weiss所著的Data Structures & Algorithm Analysis in C++一书。

第8章，从基本的find/union开始，逐步按height/depth/rank增加union，find压缩。最后，它提供了 Big-O 分析。

相信我，它有你想知道的关于不相交集及其 C++ 实现的一切。

score 0 · Accepted Answer

对于通过路径压缩实现联合查找不相交集，以下代码似乎很容易理解

int find(int i)
{
    if(parent[i]==i)
    return i;
    else
    return parent[i]=find(parent[i]);
}
void union(int a,int b)
{
    x=find(a);y=find(b);
        if(x!=y)
        {
            if(rank[x]>rank[y])
            parent[y]=x;
            else
            {
            parent[x]=y;
            if(rank[x]==rank[y])
            rank[y]+=1;             
            }
        }
}

score 0 · Accepted Answer

如果您想问哪种样式更适合实施不相交集（矢量或地图（rb 树）），那么我可能要添加一些内容

make_set (int key , node info )：这通常是（1）添加节点和（2）使节点指向自身（父=键）的成员函数，这最初会创建一个不相交的集合。向量 O(n) 的操作时间复杂度，地图 O(n*logn) 。
find_set( int key )：这通常有两个功能，（1）通过给定的密钥找到节点（2）路径压缩。我无法真正计算路径压缩，但对于简单地搜索节点，(1) 向量 O(1) 和 (2) 映射 O(log(n)) 的时间复杂度。

最后，我想说，虽然看这里，向量实现看起来更好，两者的时间复杂度都是 O(M*α(n)) ≈ O(M*5) 或者我读过。

附言。请验证我写的内容，尽管我很确定它是正确的。

c++ - Implementing Disjoint Sets (Union Find) in C++

10 回答 10

Related

Reference