algorithm - 如何找到给定集中的多重集的所有子集？

Question

假设我有一组多组D：

D = {
  {d, g, o},
  {a, e, t, t},
  {a, m, t},
}

给定一个 multiset M，比如

M = {a, a, m, t}

我想要一个算法f给我它的所有元素D都是子集（或更准确地说，“子多重集”）M：

f = {{a, m, t}}

如果我们只做一个这样的查询，扫描D（O(#D)及时）的所有元素显然是最优的。但是，如果我们想回答许多相同D和不同的查询M，我们可以通过预处理D一些更智能的数据结构来使其更快。

我们可以把所有的都D扔到一个哈希表中并遍历所有可能的子集M，在哈希表中查找每个子集，但那是O(2^#M). 小型M的很好，中型到大型的就不那么好了M。

是否有可能在多项式时间内做到这一点#M？或者也许将一个已知的 NP 完全问题简化为这个问题，以证明不可能很快？

编辑：我刚刚意识到，在最坏的情况下，我们需要输出所有的D，所以#D仍然会出现时间复杂度。因此，让我们假设输出的大小受某个常数的限制。

score 0 · Accepted Answer

这是 TernarySearchTree (TST) 的快速实现，可以帮助您解决问题。几年前，我受到 DrDobbs 上的一篇文章的启发。您可以在http://www.drdobbs.com/database/ternary-search-trees/184410528阅读更多相关信息。它提供了有关 TST 和性能分析的一些背景知识。

在您的问题描述示例中，D 将是包含“dgo”、“aett”和“amt”键的字典。值与键相同。

M 是您的搜索字符串，它基本上说“给我字典中的所有值，其中的键包含一个子集或所有这些字母”。字符的顺序并不重要。性格 '。' 在搜索中用作通配符。

对于任何给定的 M，这个算法和数据结构不需要你查看 D 的所有元素。所以在这方面它会很快。我还对访问的节点数量进行了一些测试，大多数时候，即使对于未找到的键，访问的节点数量也只是字典中总节点的一小部分。

此算法还允许您输入限制字典返回的键的最小和最大长度。

很抱歉代码冗长，但您可以测试它是完整的。

import java.util.ArrayList;
import java.io.*;

public class TSTTree<T>
{
    private TSTNode<T> root;
    private int size = 0;
    private int totalNodes = 0;

    public int getSize() { return size; }

    public int getTotalNodes() { return totalNodes; }

    public TSTTree()
    {
    }

    public TSTNode<T> getRoot() { return root; }

    public void insert(String key, T value)
    {
        if(key==null || key.length()==0) return;

        char[] keyArray = key.toCharArray();

        if(root==null) root = new TSTNode<T>(keyArray[0]);
        TSTNode<T> currentNode = root;
        TSTNode<T> parentNode = null;

        int d = 0;
        int i = 0;

        while(currentNode!=null)
        {
            parentNode = currentNode;
            d = keyArray[i] - currentNode.getSplitChar();
            if(d==0)
            {
                if((++i) == keyArray.length) // Found the key
                {
                    if(currentNode.getValue()!=null)
                        System.out.println(currentNode.getValue() + " replaced with " + value);
                    else
                        size++;
                    currentNode.setValue(value);        // Replace value at Node
                    return;
                }
                else
                    currentNode = currentNode.getEqChild();
            }
            else if(d<0)
                currentNode = currentNode.getLoChild();
            else
                currentNode = currentNode.getHiChild();
        }

        currentNode = new TSTNode<T>(keyArray[i++]);
        totalNodes++;
        if(d==0)
            parentNode.setEqChild(currentNode);
        else if(d<0)
            parentNode.setLoChild(currentNode);
        else
            parentNode.setHiChild(currentNode);

        for(;i<keyArray.length;i++)
        {
            TSTNode<T> tempNode = new TSTNode<T>(keyArray[i]);
            totalNodes++;
            currentNode.setEqChild(tempNode);
            currentNode = tempNode;
        }

        currentNode.setValue(value);        // Insert value at Node
        size++;
    }

    public ArrayList<T> find(String charsToSearch) {
        return find(charsToSearch,1,charsToSearch.length());
    }

    // Return all values with keys between minLen and maxLen containing "charsToSearch".
    public ArrayList<T> find(String charsToSearch, int minLen, int maxLen) {
        ArrayList<T> list = new ArrayList<T>();
        char[] charArray = charsToSearch.toCharArray();
        int[] charFreq = new int[256];
        for(int i=0;i<charArray.length;i++) charFreq[charArray[i]]++;
        maxLen = charArray.length>maxLen ? maxLen : charArray.length;
        pmsearch(root,charFreq,minLen,maxLen,1, list);
        return list;
    }

    public void pmsearch(TSTNode<T> node, int[] charFreq, int minLen, int maxLen, int depth, ArrayList<T> list) {
        if(node==null) return;

        char c = node.getSplitChar();
        if(isSmaller(charFreq,c))
            pmsearch(node.getLoChild(),charFreq,minLen,maxLen,depth,list);
        if(charFreq[c]>0) {
            if(depth>=minLen && node.getValue()!=null) list.add(node.getValue());
            if(depth<maxLen) {
                charFreq[c]--;
                pmsearch(node.getEqChild(),charFreq,minLen,maxLen,depth+1,list);
                charFreq[c]++;
            }
        }
        else if(charFreq['.']>0) { // Wildcard
            if(depth>=minLen && node.getValue()!=null) list.add(node.getValue());
            if(depth<maxLen) {
                charFreq['.']--;
                pmsearch(node.getEqChild(),charFreq,minLen,maxLen,depth+1,list);
                charFreq['.']++;
            }
        }            
        if(isGreater(charFreq,c))
            pmsearch(node.getHiChild(),charFreq,minLen,maxLen,depth,list);
    }

    private boolean isGreater(int[] charFreq, char c) {
        if(charFreq['.']>0) return true;

        boolean retval = false;
        for(int i=c+1;i<charFreq.length;i++) {
            if(charFreq[i]>0) {
                retval = true;
                break;
            }
        }
        return retval;
    }

    private boolean isSmaller(int[] charFreq, char c) {
        if(charFreq['.']>0) return true;

        boolean retval = false;
        for(int i=c-1;i>-1;i--) {
            if(charFreq[i]>0) {
                retval = true;
                break;
            }
        }
        return retval;
    }
}

下面是一个小测试程序。测试程序只是按照确切的顺序在您的示例中插入 4 个键/值对。如果您有一个包含很多元素的 D，那么最好先对其进行排序并以锦标赛方式构建字典（即插入中间元素，然后递归填充左半部分和右半部分）。这将确保树是平衡的。

import org.junit.*;
import org.junit.runner.*;
import java.io.*;
import java.util.*;
import org.junit.runner.notification.Failure;

public class MyTest
{
    static TSTTree<String> dictionary = new TSTTree<String>();

    @BeforeClass
    public static void initialize() {
        dictionary.insert("dgo","dgo");
        dictionary.insert("aett","aett");
        dictionary.insert("amt","amt");
    }

    @Test
    public void testMethod() {
        System.out.println("testMethod");
        ArrayList<String> result = dictionary.find("aamt");
        System.out.println("Results: ");
        for(Iterator it=result.iterator();it.hasNext();) {
            System.out.println(it.next());
        }
    }

    @Test
    // Test with a wildcard which finds "dgo" value
    public void testMethod1() {
        System.out.println("testMethod1");
        ArrayList<String> result = dictionary.find("aamtdo.");
        System.out.println("Results: ");
        for(Iterator it=result.iterator();it.hasNext();) {
            System.out.println(it.next());
        }
    }

    public static void main(String[] args) {
        Result result = JUnitCore.runClasses(MyTest.class);
        for (Failure failure : result.getFailures()) {
        System.out.println(failure.toString());
        }
    }
}

algorithm - 如何找到给定集中的多重集的所有子集？

1 回答 1

Related

Reference