opencv - OpenMP 边缘检测过滤器并行性：需要更长的时间

Question

我想对大图像应用 sobel 过滤器。

我正在使用 OpenMP 进行并行处理以优化计算时间。

在用于并行优化之后，我注意到它需要的时间比预期的要长。这是代码：

#include<iostream>
#include<cmath>
#include<opencv2/imgproc/imgproc.hpp>
#include<opencv2/highgui/highgui.hpp>

using namespace std;
using namespace cv;


// Computes the x component of the gradient vector
// at a given point in a image.
// returns gradient in the x direction
int xGradient(Mat image, int x, int y)
{
    return image.at<uchar>(y-1, x-1) +
                2*image.at<uchar>(y, x-1) +
                 image.at<uchar>(y+1, x-1) -
                  image.at<uchar>(y-1, x+1) -
                   2*image.at<uchar>(y, x+1) -
                    image.at<uchar>(y+1, x+1);
}

// Computes the y component of the gradient vector
// at a given point in a image
// returns gradient in the y direction

int yGradient(Mat image, int x, int y)
{
    return image.at<uchar>(y-1, x-1) +
                2*image.at<uchar>(y-1, x) +
                 image.at<uchar>(y-1, x+1) -
                  image.at<uchar>(y+1, x-1) -
                   2*image.at<uchar>(y+1, x) -
                    image.at<uchar>(y+1, x+1);
}




int main()
{
const clock_t begin_time = clock();
      Mat src, dst;
      int gx, gy, sum;

      // Load an image
      src = imread("/home/cgross/Downloads/pano.jpg", 0);
      dst = src.clone();
      if( !src.data )
      { return -1; }

#pragma omp parallel for private(gx, gy, sum) shared(dst)
        for(int y = 0; y < src.rows; y++)
            for(int x = 0; x < src.cols; x++)
                dst.at<uchar>(y,x) = 0.0;

#pragma omp parallel for private(gx, gy, sum) shared(dst)

        for(int y = 1; y < src.rows - 1; y++){

            for(int x = 1; x < src.cols - 1; x++){
                gx = xGradient(src, x, y);
                gy = yGradient(src, x, y);
                sum = abs(gx) + abs(gy);
                sum = sum > 255 ? 255:sum;
                sum = sum < 0 ? 0 : sum;
                dst.at<uchar>(y,x) = sum;
            }
        }

        namedWindow("final", WINDOW_NORMAL);
        imshow("final", dst);

        namedWindow("initial", WINDOW_NORMAL);
        imshow("initial", src);

std::cout << float( clock () - begin_time ) /  CLOCKS_PER_SEC<<endl;
      waitKey();


    return 0;
}

如果我注释掉编译指示（禁用 OpenMP），计算会更快（10 秒），我看不出问题出在哪里。

score 0 · Accepted Answer

与其编写自己的 Soebel，不如考虑使用其中任何一个。

1) 内置 OpenCv 函数 http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#sobel

2) 创建 Soebel 内核并使用 OpenCv filter2D() 函数。其他库、平台等具有类似的跨映像传递内核的功能，并且许多已经优化。例如，我认为 iOS 有一个叫做 vImage 的东西。

然后您可以将这些时间与您的自定义代码进行比较。

你说你有一个“大”图像，但这并不意味着我们在谈论多少像素？

您可以将图像分成多个部分并对每个部分执行过滤（使用线程等），然后将这些部分组合回来以制作新图像。我在这方面取得了很好的成功。

我也会读这个：

http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/programming-in-opencl/image-convolution-using-opencl/

opencv - OpenMP 边缘检测过滤器并行性：需要更长的时间

1 回答 1

Related

Reference