python - kosaraju 使用迭代 dfs 查找完成时间

Question

这是我为 Kosaraju 算法所做的代码的第一部分。

###### reading the data #####
with open('data.txt') as req_file:
        ori_data = []
        for line in req_file:
            line = line.split()
            if line:
                line = [int(i) for i in line]
                ori_data.append(line)

###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
    if temp[1] not in revscc_dic:
        revscc_dic[temp[1]] = [temp[0]]
    else:
        revscc_dic[temp[1]].append(temp[0])

print revscc_dic        

######## finding the G#####
scc_dic = {}
for temp in ori_data:
    if temp[0] not in scc_dic:
        scc_dic[temp[0]] = [temp[1]]
    else:
        scc_dic[temp[0]].append(temp[1])

print scc_dic        

##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
    start = i
    q=[start]
    while q:
        v=q.pop(0)
        if v not in path:
          path.append(v)
          q=revscc_dic[v]+q
print path

代码读取数据并正确形成 Grev 和 G。我已经为迭代 dfs 编写了代码。我怎样才能找到完成时间？我了解使用纸和笔查找完成时间，但我不理解完成时间作为代码的部分？？我该如何实现它。只有在此之后，我才能继续我的下一部分代码。请帮忙。提前致谢。

data.txt 文件包含：

请另存为data.txt。

score 30 · Accepted Answer

使用递归dfs，很容易看到给定顶点何时“完成”（即，当我们访问了 dfs 树中的所有子节点时）。可以在递归调用返回后计算完成时间。
然而，使用迭代dfs，这并不容易。现在我们正在使用 while 循环迭代处理队列，我们丢失了一些与函数调用相关的嵌套结构。或者更准确地说，我们不知道回溯何时发生。不幸的是，如果不向我们的顶点堆栈添加一些额外的信息，就无法知道何时发生回溯。

将完成时间添加到 dfs 实现的最快方法如下：

##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
    start = i
    q = [start]
    while q:
        v = q.pop(0)
        if v not in path:
            path.append(v)
            q = [v] + q
            for w in revscc_dic[v]:
                if w not in path: q = [w] + q
        else:
            if v not in finish_time_dic:
                finish_time_dic[v] = time
                time += 1
print path  
print finish_time_dic

这里使用的技巧是，当我们从堆栈中弹出时v，如果这是我们第一次看到它，那么我们再次将它添加回堆栈。这是使用：q = [v] + q. 我们必须先压入v堆栈，然后再压入它的邻居（我们编写在for 循环v 之前v压入' 的邻居的代码）- 否则这个技巧不起作用。最终我们将再次v从堆栈中弹出。至此，v已经完成！我们以前见过，所以，我们进入 else 案例并计算一个新的完成时间。v

对于提供的图表，finish_time_dic给出正确的完成时间：

{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}

请注意，这个 dfs 算法（完成时间修改）仍然具有O(V+E)复杂度，尽管我们将图形的每个节点推入堆栈两次。但是，存在更优雅的解决方案。我推荐阅读 Magnus Lie Hetland 的Python Algorithms: Mastering Basic Algorithms in the Python Language (ISBN: 1430232374, 9781430232377) 的第 5 章。问题 5-6 和 5-7（第 122 页）准确描述了您的问题。作者回答了这些问题，并给出了该问题的替代解决方案。

问题：

5-6 在递归 DFS 中，当您从递归调用之一返回时会发生回溯。但是迭代版本中的回溯到哪里去了？

5-7。编写可以处理确定完成时间的 DFS 的非递归版本。

答案：

5-6 在迭代版本中根本没有真正体现出来。一旦你从堆栈中弹出所有“遍历后代”，它就会隐式发生。

5-7 正如练习 5-6 中所解释的，在迭代 DFS 中回溯发生的代码中没有一点，所以我们不能只将完成时间设置在某个特定位置（如在递归中）。相反，我们需要在堆栈中添加一个标记。例如，我们可以添加表单的边，而不是将 u 的邻居添加到堆栈中(u, v)，并且在所有边之前，我们将 push (u, None)，指示的回溯点u。

score 4 · Accepted Answer

从Wikipedia可以看出，迭代 DFS 本身并不复杂。但是，计算每个节点的完成时间需要对算法进行一些调整。我们仅在第二次遇到该节点时将其从堆栈中弹出。

这是我的实现，我觉得它更清楚地展示了正在发生的事情：

step = 0  # time counter

def dfs_visit(g, v):
    """Run iterative DFS from node V"""
    global step
    total = 0
    stack = [v]  # create stack with starting vertex
    while stack:  # while stack is not empty
        step += 1
        v = stack[-1]  # peek top of stack
        if v.color:  # if already seen
            v = stack.pop()  # done with this node, pop it from stack
            if v.color == 1:  # if GRAY, finish this node
                v.time_finish = step
                v.color = 2  # BLACK, done
        else:  # seen for first time
            v.color = 1  # GRAY: discovered
            v.time_discover = step
            total += 1
            for w in v.child:  # for all neighbor (v, w)
                if not w.color:  # if not seen
                    stack.append(w)
    return total

def dfs(g):
    """Run DFS on graph"""
    global step
    step = 0  # reset step counter
    for k, v in g.nodes.items():
        if not v.color:
            dfs_visit(g, v)

我遵循CLR 算法手册的约定，并在 DFS 搜索期间使用节点着色来指定其状态。我觉得这比使用单独的列表来跟踪节点状态更容易理解。

所有节点都以白色开始。当它在搜索过程中被发现时，它被标记为gray。当我们完成它时，它被标记为black。

在 while 循环中，如果一个节点是白色的，我们将其保留在堆栈中，并将其颜色更改为灰色。如果它是灰色的，我们将其颜色更改为黑色，并设置其完成时间。如果它是黑色的，我们就忽略它。

堆栈上的节点可能是黑色的（即使在将其添加到堆栈之前进行了颜色检查）。一个白色节点可以两次添加到堆栈中（通过两个不同的邻居）。一个最终会变黑。当我们到达堆栈上的第二个实例时，我们需要确保我们不会更改它已经设置的完成时间。

以下是一些额外的支持代码：

class Node(object):
    def __init__(self, name=None):
        self.name = name
        self.child = []  # children | adjacency list
        self.color = 0  # 0: white [unvisited], 1: gray [found], 2: black [finished]
        self.time_discover = None  # DFS
        self.time_finish = None  # DFS

class Graph(object):
    def __init__(self):
        self.nodes = defaultdict(Node)  # list of Nodes
        self.max_heap = []  # nodes in decreasing finish time for SCC

    def build_max_heap(self):
        """Build list of nodes in max heap using DFS finish time"""
        for k, v in self.nodes.items():
            self.max_heap.append((0-v.time_finish, v))  # invert finish time for max heap
        heapq.heapify(self.max_heap)

要在反向图上运行 DFS，您可以在处理边文件时为每个 Node 构建一个类似于子列表的父列表，并在 dfs_visit() 中使用父列表而不是子列表。

要在 SCC 计算的最后一部分以减少完成时间的方式处理节点，您可以构建节点的最大堆，并在 dfs_visit() 中使用该最大堆，而不是简单地使用子列表。

    while g.max_heap:
        v = heapq.heappop(g.max_heap)[1]
        if not v.color:
           size = dfs_visit(g, v)
           scc_size.append(size)

score 2 · Accepted Answer

我对 Lawson 的迭代 DFS 版本产生的顺序有一些问题。这是我的版本的代码，它具有与 DFS 的递归版本的一对一映射。

n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)

# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
    if v in G:
        for w in G[v]:
            if not explored[w]:
                return False
    return True

# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
    if not explored[v]:
        stack = [v]
        while stack:
            print(stack)
            v = stack[-1]
            explored[v] = True

            # If v still has outgoing edges to explore
            if not all_explored(graph_reversed, v):
                for w in graph_reversed[v]:

                    # Explore w before others attached to v
                    if not explored[w]:
                        stack.append(w)
                        break

            # We have explored vertices findable from v
            else:
                stack.pop()
                time += 1
                finish_times[v] = time

score 1 · Accepted Answer

以下是java中的递归和迭代实现：

int time = 0;
public void dfsRecursive(Vertex vertex) {
        time += 1;
        vertex.setVisited(true);
        vertex.setDiscovered(time);
        for (String neighbour : vertex.getNeighbours()) {
            if (!vertices.get(neighbour).getVisited()) {
                dfsRecursive(vertices.get(neighbour));
            }
        }
        time += 1;
        vertex.setFinished(time);
    }

    public void dfsIterative(Vertex vertex) {
        Stack<Vertex> stack = new Stack<>();
        stack.push(vertex);
        while (!stack.isEmpty()) {
            Vertex current = stack.pop();
            if (!current.getVisited()) {
                time += 1;
                current.setVisited(true);
                current.setDiscovered(time);
                stack.push(current);
                List<String> currentsNeigbours = current.getNeighbours();
                for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
                    String currentNeigbour = currentsNeigbours.get(i);
                    Vertex neighBour = vertices.get(currentNeigbour);
                    if (!neighBour.getVisited())
                        stack.push(neighBour);
                }
            } else {
                if (current.getFinished() < 1) {
                    time += 1;
                    current.setFinished(time);
                }
            }
        }
    }

score -1 · Accepted Answer

首先，你应该确切地知道什么是完成时间。在递归 dfs 中，完成时间是节点 v 的所有相邻节点 [V]s 完成的时间，请记住，您需要有额外的数据结构来存储所有信息。

adj[][]  //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack  //normal stack 
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
     if (!vistied.contains[i]){
         st.push(i);
         visited.add(i)
         while(!st.isEmpty){
              int j=st.pop();
              int[] unvisitedChild= getUnvistedChild(j);
              if(unvisitedChild!=null){
                   for(int c in unvisitedChild){
                        st.push(c);
                        visited.add(c);
                    }
                    backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
              }
              else{ 
                   finished.add(j);
                   while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
                   {
                   parent=backtrack.pop()[0];
                   finished.add(parent);
                   }
              }
        }
    }
}

 function getUnvistedChild(int i){
     unvisitedChild[]=null
     for(int child in adj[i]){
        if(!visited.contains(child))
            unvisitedChild.add(child);
     }
     return unvisitedChild;
 }

完成时间应该是 [5, 2, 8, 3, 6, 9, 1, 4, 7]

python - kosaraju 使用迭代 dfs 查找完成时间

5 回答 5

Related

Reference