bash - 1）基于另一个文件头重新排序一个 csv 文件和 2）将一个 csv 文件的一列合并到另一列并删除重复

Question

我有两个 csv 文件。两个文件可能具有相同或不同的数据。File2 只有文件 1 中的几列。文件 2 中的某些列可能有不同的标题。例如，文件 2 用名称代替了名字

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
grey07,2070,04ap67,lg2070,Laura,Grey,Depot,London
johnson81,4081,30no86,cj4081,Craig,Johnson,Depot,London
jenkins46,9346,14ju73,mj9346,Mary,Jenkins,Engineering,Manchester
smith79,5079,09ja61,js5079,Jamie,Smith,Engineering,Manchester

文件 2

Department,First name,Last name,One-time password
Sales,Rachel,Booker,12se74
Depot,Laura,Grey,04ap67
Depot,Craig,Johnson,30no86
Engineering,Mary,Jenkins,14ju73
Engineering,Jamie,Smith,09ja61

问题 1： 我想重新排序 csv 文件 2 中的列，以根据标题匹配 csv 文件 1 中的顺序。

根据文件 1 排序的所需输出标头

One-time password,Name,Last name,Department
12se74,Dash,Bok,Sales
04ap67,Claire,Trans,Accounts
30no86,Shane,Walter,Depot
14ju73,Leon,Jenkins,Engineering
09ja61,Oliver,Den,Engineering

问题 2： 根据标题删除重复项将 File2 列合并到文件 1 例如，如果 First Name 、 Last Name、 Department 列相同，则那些重复项删除那些重复项。其他列可能相同也可能不同。 因此希望实现基于条件的重复记录删除

问题 3： 将文件 2 转换为文件 1 模板按顺序添加缺失的列。最后根据某些标题比较和删除重复的列。例如。如果名字、姓氏和密码相同，则它们是重复的，其他列可能相同或不同。

问题 4： 将特定列从文件 2 复制到文件 1 保留顺序 例如。文件 2 有名称列将该列替换为文件 1 的名字列

试过：

awk -v FS=, -v OFS=, 'FNR==NR{hash[FNR]=$5; next}{$2 = hash[FNR]}1' file file2

以上答案来自https://unix.stackexchange.com/questions/674038/replace-a-column-value-in-csv-file-from-another-file

以上似乎工作。但它需要将第 numbet 列指定为 $5 和 $2。任何人都可以帮助修改上述命令以指定标题而不是列号

试过了

awk -v FS=, -v OFS=, '{ for (i=1;i<=NF;i++) { if (i=="name") var=$i }; FNR==NR{hash[FNR]=$5; next}{$var = hash[FNR] }' file file2

不工作

score 0 · Accepted Answer

您可以使用Miller轻松完成这一切，它在此处作为静态二进制文件提供。将mlr可执行文件放在 PATH 中的某个位置，您就完成了安装。

对于初学者，我假设我们正在处理两个列名没有不一致的CSV 文件：

cat file1.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester

cat file2.csv

Department,First name,Last name,One-time password
Engineering,Oliver,Den,09ja61
Sales,Rachel,Booker,12se74

重命名指定字段：

mlr --csv rename 'First name,Name' file2.csv

Department,Name,Last name,One-time password
Engineering,Oliver,Den,09ja61
Sales,Rachel,Booker,12se74

`file2.csv`根据的标题重新排序的列`file1.csv`：

mlr --csv reorder -f "$(head -n 1 file1.csv)" file2.csv

One-time password,First name,Last name,Department
09ja61,Oliver,Den,Engineering
12se74,Rachel,Booker,Sales

`file2.csv`根据以下标题添加缺少的列`file1.csv`：

mlr --csv template -t file1.csv file2.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
,,09ja61,,Oliver,Den,Engineering,
,,12se74,,Rachel,Booker,Sales,

根据唯一性删除重复项`One-time password,First name,Last name`：

mlr --csv head -n 1 -g 'One-time password,First name,Last name' fileX.csv

连接`file1.csv`和`file2.csv`：

mlr --csv unsparsify file1.csv file2.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
,,09ja61,,Oliver,Den,Engineering,
,,12se74,,Rachel,Booker,Sales,

根据唯一性连接`file1.csv`并`file2.csv`删除重复项`One-time password,First name,Last name`：

该命令由一系列操作组成。

mlr --csv unsparsify then head -n 1 -g 'One-time password,First name,Last name' file1.csv file2.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
,,09ja61,,Oliver,Den,Engineering,

最后，假设该列First name被调用Name，file2.csv并且您想要连接file1.csv并file2.csv删除基于唯一性的重复项One-time password,First name,Last name。

您可以通过rename在前一个命令前面添加一个操作来实现：

mlr --csv rename 'Name,First name' then unsparsify then head -n 1 -g 'One-time password,First name,Last name' file1.csv file2.csv

bash - 1）基于另一个文件头重新排序一个 csv 文件和 2）将一个 csv 文件的一列合并到另一列并删除重复

1 回答 1

重命名指定字段：

file2.csv根据 的标题重新排序 的列file1.csv：

file2.csv根据以下标题添加缺少的列file1.csv：

根据唯一性删除重复项One-time password,First name,Last name：

连接file1.csv和file2.csv：

根据唯一性连接file1.csv并file2.csv删除重复项One-time password,First name,Last name：

Related

Reference

`file2.csv`根据的标题重新排序的列`file1.csv`：

`file2.csv`根据以下标题添加缺少的列`file1.csv`：

根据唯一性删除重复项`One-time password,First name,Last name`：

连接`file1.csv`和`file2.csv`：

根据唯一性连接`file1.csv`并`file2.csv`删除重复项`One-time password,First name,Last name`：