0

给定一个包含内容的文件:

insert_job: J1
insert_job: J2
box_name: J1
insert_job: J3
box_name: J2
insert_job: J4
box_name: J1
insert_job: J5
box_name: J4
insert_job: J6
box_name: J4

我想将其显示如下(使用选项卡来识别子父关系):

J1
    J2
        J3
    J4
        J5
        J6
鲍罗丁的 test_data2:
------------------------------
insert_job: JS11-LR_BaselIII
insert_job:JS11-Check_Batch_Run_Numbers
box_name:JS11-LR_BaselIII
insert_job: 11000000-开始
box_name:JS11-Check_Batch_Run_Numbers
insert_job: 11000000-runbox
box_name:JS11-Check_Batch_Run_Numbers
insert_job: JS11-Load_Session_Date
box_name:JS11-LR_BaselIII
insert_job: JS110000-开始
box_name: JS11-Load_Session_Date
insert_job: JS110000-runbox
box_name: JS11-Load_Session_Date
insert_job: JS11-Start_RiskWatch
box_name:JS11-LR_BaselIII
insert_job: JS110004-开始
box_name:JS11-Start_RiskWatch
insert_job: JS110004-runbox
box_name:JS11-Start_RiskWatch
insert_job: JS11-Start_UDS
box_name:JS11-LR_BaselIII
insert_job: JS110001-开始
box_name: JS11-Start_UDS
insert_job: JS110001-runbox
box_name: JS11-Start_UDS
insert_job: JS11-Pool_Processing
box_name:JS11-LR_BaselIII
insert_job: JS110002-开始
box_name: JS11-Pool_Processing

Ed的解决方案中的语法错误:

sdpvvrsp810{alelai}: gawk -f tst.awk testjobs3
gawk: tst.awk:2: /^box_name/   { box = $2; jobs[box][job] }
gawk: tst.awk:2:                                    ^ syntax error
gawk: tst.awk:9:         for (job in jobs[box])
gawk: tst.awk:9:                         ^ syntax error
4

3 回答 3

1

这是一个较短的 perl 版本,适用于您的示例数据。

sub parse {
  local $/ = undef;
  my $text = <>;
  my ($root) = $text =~ /insert_job:\s*(\S+)/;
  my @links = $text =~ /insert_job:\s*(\S+)\s*box_name:\s*(\S+)/g;
  my $children = {}; 
  while (@links) {
    my $child = shift @links;
    my $parent = shift @links;
    push @{$children->{$parent}}, $child;
  }
  my $print = sub {
    my ($print, $parent, $indent) = @_;
    print "\t" x $indent, $parent, "\n";
    $print->($print, $_, $indent + 1) foreach (@{$children->{$parent} || []});
  };
  $print->($print, $root, 0);
}

parse;
于 2014-05-27T02:50:51.290 回答
1

该程序按您的要求执行。它期望输入文件的路径作为命令行上的参数。

它首先构建一个散列,将每个作业的名称与该框中的所有作业相关联。下一行中没有框名称的作业将被推送到根作业列表中。最后,调用递归子例程print_tree以转储从每个根开始的依赖关系树。

use strict;
use warnings;

my ($curr_job, %jobs, @roots);
while (<>) {
  next unless my ($op, $id) = /(\w+): ([\w-]+)/;
  if ($op eq 'insert_job') {
    push @roots, $curr_job if $curr_job;
    $curr_job = $id;
    $jobs{$id} = [] unless $jobs{$id};
  }
  elsif ($op eq 'box_name') {
    push @{ $jobs{$id} }, $curr_job;
    $curr_job = undef;
  }
}
push @roots, $curr_job if $curr_job;

print_tree($_) for @roots;

sub print_tree {
  my ($root, $indent) = (@_, 0);
  printf "%s%s\n", ' ' x 4 x $indent, $root;
  print_tree($_, $indent + 1) for @{ $jobs{$root} };
}

输出

J1
    J2
        J3
    J4
        J5
        J6

输出 2

JS11-LR_BaselIII
    JS11-Check_Batch_Run_Numbers
        11000000-runbox
        11000000-start
    JS11-Load_Session_Date
        JS110000-runbox
        JS110000-start
    JS11-Pool_Processing
        JS110002-start
    JS11-Start_RiskWatch
        JS110004-runbox
        JS110004-start
    JS11-Start_UDS
        JS110001-runbox
        JS110001-start
于 2014-05-27T03:13:53.080 回答
0

将 GNU awk 用于真正的多维数组:

$ cat tst.awk
/^insert_job/ { job = $2; if (root == "") root = job }
/^box_name/   { box = $2; jobs[box][job] }
END           { prtBox(root) }

function prtBox(box,    job) {
    printf "%*s%s\n", indent, "", box
    indent += 2
    if (box in jobs)
        for (job in jobs[box])
            prtBox(job)
    indent -= 2
}

.

$ awk -f tst.awk file
J1
  J2
    J3
  J4
    J5
    J6
于 2014-05-27T20:42:20.073 回答