bash - 使用与 ABC 中的文件 B 相同的行数将多个文件合并为一个

Question

这是一个奇怪的问题，我一直在环顾四周，找不到任何与我想做的事情相匹配的东西。

我想做的是；

文件 A、文件 B、文件 C 5 行、3 行、2 行。

将所有文件加入一个文件中，匹配相同数量的文件 B 输出应该是

文件 A、文件 B、文件 C 3 行、3 行、3 行。

所以在文件 AI 中必须删除两行，在文件 C 中我必须复制 1 行，这样我才能匹配与文件 B 相同的行。

我正在考虑进行计数以查看每个文件首先有多少行

count1=`wc -l FileA| awk '{print $1}'`
count2=`wc -l FileB| awk '{print $1}'`
count3=`wc -l FileC| awk '{print $1}'`

Then to do a gt then file B remove lines, else add lines.

但是我迷路了，因为我不确定如何继续这样做，我从未见过有人试图这样做。

谁能指出我的想法？

输出应如下图所示；

输出谢谢。

score 2 · Accepted Answer

您可以使用命令 resp获取文件的前n行。您可以使用.headsedecho

我将使用sed，因为它允许就地编辑文件（因此您不必处理临时文件）：

#!/bin/bash

fix_numlines() {
  local filename=$1
  local wantlines=$2
  local havelines=$(grep -c . "${filename}")
  head -${wantlines} "${filename}"
  if [ $havelines -lt $wantlines ]; then
    for i in $(seq $((wantlines-havelines))); do echo; done
  fi
}

lines=$(grep -c . fileB)
fix_numlines fileA ${lines}
fix_numlines fileB ${lines}
fix_numlines fileC ${lines}

如果你想要分栏输出，它甚至更简单：

paste fileA fileB fileC | head -$(grep -c . fileB)

score 2 · Accepted Answer

请您尝试以下操作。我已经制作@了分隔符，您也可以根据需要更改它。

paste -d'@' file1 file2 file3 |
awk -v file2_lines="$(wc -l < file2)" '
BEGIN{
  FS=OFS="@"
}
FNR<=file2_lines{
  $1=$1?$1:prev_first
  $3=$3?$3:prev_third
  print
  prev_first=$1
  prev_third=$3
}'

运行上述代码的示例：

可以说以下是 Input_file(s)：

cat file1
File1_line1
File1_line2
File1_line3
File1_line4
File1_line5

cat file2
File2_line1
File2_line2
File2_line3

cat file3
File3_line1
File3_line2

当我以脚本形式运行上面的代码时，输出将是：

./script.ksh
File1_line1@File2_line1@File3_line1
File1_line2@File2_line2@File3_line2
File1_line3@File2_line3@File3_line2

score 1 · Accepted Answer

另一个在列中输出的 GNU awk：

$ gawk  -v seed=$RANDOM -v n=2 '  # n parameter is the file index number 
BEGIN {                           # ... which defines the record count
    srand(seed)                   # random record is printed when not enough records
}
{
    a[ARGIND][c[ARGIND]=FNR]=$0   # hash all data to a first
}
END {
    for(r=1;r<=c[n];r++)          # loop records
        for(f=1;f<=ARGIND;f++)    # and fields for below output
            printf "%s%s",((r in a[f])?a[f][r]:a[f][int(rand()*c[f])+1]),(f==ARGIND?ORS:OFS)
}' a b c                          # -v n=2 means the second file ie. b

输出：

a1 b1 c1
a2 b2 c2
a3 b3 c1

如果您不喜欢随机选择记录，请替换int(rand()*c[f])+1]为c[f].

$ gawk ' # remember GNU awk only NR==FNR { # count given files records bnr=FNR next } { print # output records of a b c if(FNR==bnr) # ... up to bnr records nextfile # and skip to next file } ENDFILE { # if you get to the end of the file if(bnr>FNR) # but bnr not big enough for(i=FNR;i<bnr;i++) # loop some print # and duplicate the last record of the file }' b a b c # first the file to count then all the files to print

score 0 · Accepted Answer

要使文件具有n行，您可以使用以下函数（用法：）toLength n file。如果文件太长，则省略最后一行，如果文件太短，则重复最后一行。

toLength() {
    { head -n"$1" "$2"; yes "$(tail -n1 "$2")"; } | head -n"$1"
}

将所有文件设置为 FileB 的长度并并排显示它们使用

n="$(wc -l < FileB)"
paste <(toLength "$n" FileA) FileB <(toLength "$n" FileC) | column -ts$'\t'

正如用户umläute所观察到的，并排输出使事情变得更加容易。但是，他们使用空行来填充短文件。以下解决方案重复最后一行以使短文件更长。

stretch() {
    cat "$1"
    yes "$(tail -n1 "$1")"
}
paste <(stretch FileA) FileB <(stretch FileC) | column -ts$'\t' |
head -n"$(wc -l < FileB)"

score 0 · Accepted Answer

这是一种干净的方式，使用awk我们只读取每个文件一次的地方：

awk -v n=2 '
     BEGIN{ while(1) {
              for(i=1;i<ARGC;++i) {
                 if (b[i]=(getline tmp < ARGV[i])) a[i] = tmp
              }
              if (b[n]) for(i=1;i<ARGC;++i) print a[i] > ARGV[i]".new"
              else {break}
            }
          }'  f1 f2 f3 f4 f5 f6

这通过以下方式工作：

引导文件由索引定义n。这里我们选择引导文件为f2.
我们不按顺序处理标准读取记录、字段中的文件，而是使用BEGIN并行读取文件的块。
我们执行一个无限循环while(1)，如果前导文件没有更多输入，我们将在其中中断。
每个周期，我们使用 . 读取每个文件的新行getline。如果文件i有新行，则将其存储在中a[i]，并将结果设置getline为b[i]。如果文件i已到达末尾，请记住最后一行。
使用 .检查线索文件的结果b[n]。如果我们仍然读取一行，则将所有行打印到文件f1.new, f2.new, ... 中，否则，跳出无限循环。

bash - 使用与 ABC 中的文件 B 相同的行数将多个文件合并为一个

5 回答 5

Related

Reference