perl - Perl 挑战 - 目录迭代器

Question

你有时会听到关于 Perl 的说法，可能有 6 种不同的方法来解决同一个问题。优秀的 Perl 开发人员通常对在各种可能的实现方法之间做出选择有充分的洞察力。

所以一个例子 Perl 问题：

一个简单的脚本，它递归地遍历目录结构，查找最近修改的文件（在某个日期之后，这将是可变的）。将结果保存到文件中。

Perl 开发人员的问题是：您最好的方法是什么？

score 17 · Accepted Answer

这听起来像是File::Find::Rule的工作：

#!/usr/bin/perl
use strict;
use warnings;
use autodie;  # Causes built-ins like open to succeed or die.
              # You can 'use Fatal qw(open)' if autodie is not installed.

use File::Find::Rule;
use Getopt::Std;

use constant SECONDS_IN_DAY => 24 * 60 * 60;

our %option = (
    m => 1,        # -m switch: days ago modified, defaults to 1
    o => undef,    # -o switch: output file, defaults to STDOUT
);

getopts('m:o:', \%option);

# If we haven't been given directories to search, default to the
# current working directory.

if (not @ARGV) {
    @ARGV = ( '.' );
}

print STDERR "Finding files changed in the last $option{m} day(s)\n";


# Convert our time in days into a timestamp in seconds from the epoch.
my $last_modified_timestamp = time() - SECONDS_IN_DAY * $option{m};

# Now find all the regular files, which have been modified in the last
# $option{m} days, looking in all the locations specified in
# @ARGV (our remaining command line arguments).

my @files = File::Find::Rule->file()
                            ->mtime(">= $last_modified_timestamp")
                            ->in(@ARGV);

# $out_fh will store the filehandle where we send the file list.
# It defaults to STDOUT.

my $out_fh = \*STDOUT;

if ($option{o}) {
    open($out_fh, '>', $option{o});
}

# Print our results.

print {$out_fh} join("\n", @files), "\n";

score 15 · Accepted Answer

在主要通过标准库解决问题的地方使用它们。

File::Find 在这种情况下工作得很好。

在 perl 中可能有很多方法可以做某事，但是如果存在一个非常标准的库来做某事，则应该使用它，除非它有自己的问题。

#!/usr/bin/perl

use strict;
use File::Find();

File::Find::find( {wanted => \&wanted}, ".");

sub wanted {
  my (@stat);
  my ($time) = time();
  my ($days) = 5 * 60 * 60 * 24;

  @stat = stat($_);
  if (($time - $stat[9]) >= $days) {
    print "$_ \n";
  }
}

score 9 · Accepted Answer

没有六种方法可以做到这一点，有旧方法和新方法。旧的方法是使用 File::Find，你已经有几个例子了。File::Find 有一个非常糟糕的回调接口，20 年前它很酷，但从那时起我们就继续前进了。

这是一个真实的（稍微修改过的）程序，我用它来清除我的一个生产服务器上的垃圾。它使用 File::Find::Rule，而不是 File::Find。File::Find::Rule 有一个很好的声明式接口，易于阅读。

Randal Schwartz 还编写了 File::Finder，作为 File::Find 的包装器。它非常好，但它并没有真正起飞。

#! /usr/bin/perl -w

# delete temp files on agr1

use strict;
use File::Find::Rule;
use File::Path 'rmtree';

for my $file (

    File::Find::Rule->new
        ->mtime( '<' . days_ago(2) )
        ->name( qr/^CGItemp\d+$/ )
        ->file()
        ->in('/tmp'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(20) )
        ->name( qr/^listener-\d{4}-\d{2}-\d{2}-\d{4}.log$/ )
        ->file()
        ->maxdepth(1)
        ->in('/usr/oracle/ora81/network/log'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(10) )
        ->name( qr/^batch[_-]\d{8}-\d{4}\.run\.txt$/ )
        ->file()
        ->maxdepth(1)
        ->in('/var/log/req'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(20) )
        ->or(
            File::Find::Rule->name( qr/^remove-\d{8}-\d{6}\.txt$/ ),
            File::Find::Rule->name( qr/^insert-tp-\d{8}-\d{4}\.log$/ ),
        )
        ->file()
        ->maxdepth(1)
        ->in('/home/agdata/import/logs'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(90) )
        ->or(
            File::Find::Rule->name( qr/^\d{8}-\d{6}\.txt$/ ),
            File::Find::Rule->name( qr/^\d{8}-\d{4}\.report\.txt$/ ),
        )
        ->file()
        ->maxdepth(1)
        ->in('/home/agdata/redo/log'),

) {
    if (unlink $file) {
        print "ok $file\n";
    }
    else {
        print "fail $file: $!\n";
    }
}

{
    my $now;
    sub days_ago {
        # days as number of seconds
        $now ||= time;
        return $now - (86400 * shift);
    }
}

score 8 · Accepted Answer

File::Find是解决这个问题的正确方法。重新实现其他模块中已经存在的东西是没有用的，但是应该不鼓励重新实现标准模块中的东西。

score 8 · Accepted Answer

其他人提到了 File::Find，这是我要走的路，但你要求一个迭代器，File::Find 不是（File::Find::Rule 也不是）。您可能想查看File::Next或File::Find::Object，它们确实具有迭代接口。Mark Jason Dominus 在Higher Order Perl的第 4.2.2 章中详细介绍了构建您自己的内容。

score 4 · Accepted Answer

我的首选方法是这样使用 File::Find 模块：

use File::Find;
find (\&checkFile, $directory_to_check_recursively);

sub checkFile()
{
   #examine each file in here. Filename is in $_ and you are chdired into it's directory
   #directory is also available in $File::Find::dir
}

score 4 · Accepted Answer

如前所述，有我的File::Finder ，但也有来自 Find Files Incrementally (Linux Magazine)的 iterator-as-a-tied-hash 解决方案。

score 3 · Accepted Answer

我将File::Find::Closures写成一组可以与 File::Find 一起使用的闭包，因此您不必自己编写。有几个 mtime 函数应该处理

使用文件::查找；
使用 File::Find::Closures qw(:all);

我的（$wanted，$list_reporter）=find_by_modified_after（时间 - 86400）；
#my( $wanted, $list_reporter ) = find_by_modified_before( 时间 - 86400 );

File::Find::find( $wanted, @directories );

我的@modified = $list_reporter->();

您实际上并不需要使用该模块，因为我主要将其设计为一种您可以查看代码并窃取您想要的部分的方式。在这种情况下，它有点棘手，因为处理 stat 的所有子例程都依赖于第二个子例程。不过，您会很快从代码中获得灵感。

祝你好运，

score 0 · Accepted Answer

使用标准模块确实是一个好主意，但出于兴趣，我回到了不使用外部模块的基本方法。我知道这里的代码语法可能不是每个人都喜欢的。

可以通过提供迭代器访问来改进使用更少的内存（输入列表在达到一定大小后可能会暂时暂停），并且可以通过回调 ref 扩展条件检查。

sub mfind {
    my %done;

    sub find {
        my $last_mod = shift;
        my $path = shift;

        #determine physical link if symlink
        $path = readlink($path) || $path;        

        #return if already processed
        return if $done{$path} > 1;

        #mark path as processed
        $done{$path}++;

        #DFS recursion 
        return grep{$_} @_
               ? ( find($last_mod, $path), find($last_mod, @_) ) 
                : -d $path
                   ? find($last_mod, glob("$path/*") )
                       : -f $path && (stat($path))[9] >= $last_mod 
                           ? $path : undef;
    }

    return find(@_);
}

print join "\n", mfind(time - 1 * 86400, "some path");

score -1 · Accepted Answer

我编写了一个子程序，它用读取目录readdir，抛出“。” 和“..”目录，如果它找到一个新目录，则递归，并检查我正在寻找的文件（在你的情况下，你会想要使用utimeor stat）。到递归完成时，应该已经检查了每个文件。

我认为此脚本所需的所有功能都在这里简要描述： http ://www.cs.cf.ac.uk/Dave/PERL/node70.html

输入和输出的语义是一个相当简单的练习，我将留给你。

score -2 · Accepted Answer

我冒着被否决的风险，但恕我直言，“ls”（带有适当的参数）命令以一种最知名的高性能方式进行。在这种情况下，通过 shell 将 perl 代码中的“ls”通过管道传输，将结果返回到数组或散列，这可能是一个很好的解决方案。

编辑：也可以“查找”使用，如评论中所建议的那样。

perl - Perl 挑战 - 目录迭代器

11 回答 11

Related

Reference