python - 面试准备：优化 swapLexOrder

Question

面试关于代码战斗的哈希图问题，需要帮助优化我的蛮力解决方案。这是问题所在：

给定一个字符串 str 和指示字符串中哪些索引可以交换的对数组，返回执行允许的交换所产生的字典上最大的字符串。您可以多次交换索引。

例子

For str = "abdc" and pairs = [[1, 4], [3, 4]], the output should be
swapLexOrder(str, pairs) = "dbca".

通过交换给定的索引，您可以获得字符串：“cbda”、“cbad”、“dbac”、“dbca”。此列表中按字典顺序排列的最大字符串是“dbca”。

我目前的解决方案

通过不断添加所有可能性，直到没有新的解决方案的蛮力。这对我来说太慢了swapLexOrder('dznsxamwoj',[[1,2],[3,4],[6,5],[8,10]])，无法在我的机器上完成。任何优化提示？一个更容易通过的测试用例是swapLexOrder('abdc,[[1,4],[3,4]])= dbca

def swapLexOrder(str, pairs):
    d = {}
    d[str]=True
    while True:
        oldlen=len(d)
        for x,y in pairs:
            for s in d.keys():
                d[swp(s,x,y)]=True
        if len(d) == oldlen:
            #no more new combinations.
            return sorted(d)[-1]

def swp(str,x,y):
    x=x-1
    y=y-1
    a=str[x]
    b=str[y]
    return str[0:x]+b+str[x+1:y]+a+str[y+1:]

score 4 · Accepted Answer

我提出的解决方案是首先尝试“链接”尽可能多的对以形成可以互换的索引集 - 例如在您的第一个示例中，[[1, 4], [3, 4]]可以变为[[1, 3, 4]]. 然后可以按字典顺序对这些索引子集中的每一个进行排序以形成输出。实现是这样的：

def build_permitted_subs(pairs):
    perm = []

    for a, b in pairs:
        merged = False
        for ind, sub_perm in enumerate(perm):
            if a in sub_perm or b in sub_perm:
                sub_perm.add(a)
                sub_perm.add(b)
                merged = True
                break

        else:
            perm.append(set([a, b]))

        if merged:
            for merge_perm_ind in reversed(range(ind + 1, len(perm))):
                if perm[merge_perm_ind] & sub_perm:
                    sub_perm.update(perm[merge_perm_ind])
                    perm.pop(merge_perm_ind)

    return list(map(sorted, perm))

def swap_lex_order(swap_str, _pairs):

    pairs = [[a - 1, b - 1] for a, b in _pairs]
    out = list(swap_str)

    perm = build_permitted_subs(pairs)

    for sub_perm in perm:
        sorted_subset = sorted(sub_perm, key=lambda ind: swap_str[ind], reverse=True)

        for sort, targ in zip(sorted_subset, sub_perm):
            out[targ] = swap_str[sort]

    return "".join(out)

print(swap_lex_order("dznsxamwoj", [[1, 2], [3, 4], [6, 5], [8, 10]]))
print(swap_lex_order("abdc", [[1, 4], [3, 4]]))
print(swap_lex_order("acxrabdz",[[1,3], [6,8], [3,8], [2,7]]))

输出：

zdsnxamwoj
dbca
zdxrabca

我还重命名了你的参数 not to use str，这已经是一个非常基本的 Python 内置函数。请注意，我的代码可能不像 Pythonic 一样，但我认为它可以很好地说明算法，并且它不会受到任何主要的性能影响。我怀疑这种方法的复杂性相当低——它通常是“智能的”，因为它不会暴力破解任何东西，并且使用O(n log n)排序。第一个例子似乎是正确的。请注意，这会将每对转换为从 0 开始的，因为这对于 Python 来说更容易。

这有点依赖于能够从相邻的排列（交换对）中形成任何排列（对链接的对进行排序）。这可能并不完全直观，但它可能有助于意识到您可以仅使用列表中的相邻交换来有效地执行插入（通过不断地在方向上交换元素）。使用相邻交换排列列表的一个示例是冒泡排序，您可能会意识到，如果任何排列都可以被冒泡排序，这意味着所有排列都可以通过冒泡排序达到。

如果您有任何问题，或者有任何问题，请告诉我，我将开始详细说明/调试。（截至格林威治标准时间 19:28，我已经注意到一个错误并将其编辑掉：）。错误 #2（在测试用例 3 中有重复的 z）也应该被修复。

关于错误 #1 的更多信息：

我没有对 . 返回的索引进行排序build_permitted_subs，因此无法参考swap_str.

有关错误 #2 的更多信息：

该build_permitted_subs功能无法正常工作 - 具体来说，如果遇到可以分成两组的配对，这意味着这些组也应该加入在一起，这没有发生，现在会有两个不应该分开的组。这会导致 z 重复，因为两组都可以从 z 中提取。我草率地用一个标志和一个追溯 for 循环修复了这个问题。

score 0 · Accepted Answer

我首选的解决方案是使用不相交集来解决这个问题。关键思想是建立一个连接图，有点像链表。这表示由对连接的子串。一旦确定了连接的内容，就可以对子字符串进行排序，然后在构建字符串时从子字符串中挑选出字典顺序最高的字符。

不相交集在这里有很大帮助，因为它让我们能够以极快的速度弄清楚什么是连接的。它实际上比 log 快，它是 log*。我建议阅读维基百科页面以获得解释。通过使用 union 函数，我们可以从给定的对构建“链表”。

import collections

class DisjointSet:
    def __init__(self, string, pairs):
        self.parent = [i for i in range(len(string))]
        self.size = [1] * len(string)
        
        for a, b in pairs:
            self.union(a-1, b-1)
    
    def find_parent(self, idx):
        # O(log*(n))
        if self.parent[idx] == idx:
            return idx
        self.parent[idx] = self.find_parent(self.parent[idx])
        return self.parent[idx]
    
    def union(self, a, b):
        # O(log*(n))
        x = self.find_parent(a)
        y = self.find_parent(b)
        
        if x == y:
            return
        
        if self.size[x] < self.size[y]:
            x, y = y, x
        
        self.parent[y] = x
        self.size[x] = self.size[x] + self.size[y]

def swapLexOrder(string, pairs):
    # O(nlogn) + O(nlog*(n))
    string = list(string)
    # Build the disjoint set to figure out what pairs are connected
    disjoint = DisjointSet(string, pairs)
    graph = collections.defaultdict(list)
    
    # With the disjoint set, build the substrings connected by the pairs
    for i, c in enumerate(string):
        graph[disjoint.find_parent(i)].append(c)
    
    # Sort the substrings
    for i in range(len(string)):
        graph[i].sort()
    
    # Build the answer by picking the most lexicographic out of the substrings
    for i in range(len(string)):
        parent = disjoint.find_parent(i)
        string[i] = graph[parent][-1]
        graph[parent].pop()
    
    return "".join(string

我在这里找到了这个想法。我只是在 Python 中实现了它并添加了注释。

score 0 · Accepted Answer

def swapLexOrder(str, pairs):
    
    if not str or not pairs:
        return ('', str)[not pairs]
    lst = [''] + list(str)
    setted_pairs = list(map(set, pairs))
    while setted_pairs:
        path = setted_pairs.pop(0)
        while True:
            path1 = path.copy()
            for pair in setted_pairs:
                if path1 & pair:
                    path |= pair
                    setted_pairs.remove(pair)
            if path == path1:
                break
        optimal = sorted(lst[i] for i in path)
        for i, v in enumerate(sorted(path, reverse=True)):
            lst[v] = optimal[i]
    return ''.join(lst[1:])

score 0 · Accepted Answer

这个可能效果更好。

def swapLexOrder(str_, pairs):
n = len(str_)
str_ = list(str_)

corr = [set() for _ in range(n)]
nodes = set()
for a, b in pairs:
    corr[a-1].add(b-1)
    corr[b-1].add(a-1)
    nodes.add(a-1)
    nodes.add(b-1)

while nodes:
    active = {nodes.pop()}
    group = set()
    while active:
        group |= active
        nodes -= active
        active = {y for x in active for y in corr[x] if y in nodes}

    chars = iter(sorted((str_[i] for i in group), reverse=True))
    for i in sorted(group):
        str_[i] = next(chars)

return "".join(str_)

python - 面试准备：优化 swapLexOrder

4 回答 4

Related

Reference