0

(这可能是我想念的相当简单的东西;但我似乎无法弄清楚,也没有在搜索中找到任何答案)

我需要比较具有相同列的两个 CSV 文件,并按如下方式输出行差异(Unicode 文本中的最终输出):

  • 如果 FileA 中存在行但 FileB 中不存在行,则将该行标记为“Good”
  • 如果 FileB 中存在行但 FileA 中不存在行,则将该行标记为“Bad”

假设我有以下示例数据:

File A:
Column1,Column2,Column3
Tommy,4133,20180204
Suzie,5200,20210112
Tammy,221,20201010

File B:
Column1,Column2,Column3
Tommy,4133,20180204
Nicky,5200,20190520

这是我当前的代码(从这个站点借用了启用哈希的 Compare-Object2,因为交付的 Compare-Object 太慢了——仅供参考,我使用 Get-Content 而不是 Import-Csv,因为它快了 50 倍因为我们正在比较整行。而 MyHeader 变量只是为了保留原始文件的标题列值)

Compare-Object2 (Get-Content $FileA) (Get-Content $FileB) -PassThru |
Select-Object @{l=[string]$MyHeader;e={$_.InputObject}},
              @{n='Row Label'; e={ @{'=>' = 'Bad' ; '<=' = 'Good'}[$_.SideIndicator]}},
              @{n='Placeholder'; e={@{'*'='0'}['*']}} |
Sort-Object 'Row Label' -Descending | Export-Csv "$FinalCSV" -NoType;

#Removing " char to create CSV with original and added columns together
Set-Content "$FinalCSV" ((Get-Content "$FinalCSV") -replace '"');

#Convert csv to tab delimited
Import-Csv "$FinalCSV" | Export-Csv "$FinalTXT"  -NoTypeInformation -Delimiter "`t";

#Remove " char and convert to unicode
Set-Content -Encoding UNICODE "$FinalTXT" ((Get-Content "$FinalTXT") -replace '"')

这完美地工作(我知道其中一些在最后是多余的;但是,嘿:这是我能做的最好的——但绝对也可以随意修复这些部分!)创建一个好的和坏的输出文件 - - 两个 400K 行的文件大约需要 40 秒。

Result File:
Column1 Column2 Column3 Row Label   Placeholder
Suzie   5200    20210112    Good    0
Tammy   221 20201010    Good    0
Nicky   5200    20210112    Bad 0

问题是,我现在需要将它们创建为单独的文件:一个好文件,一个坏文件。所以新需要的输出是:

ResultFileGood:
Column1 Column2 Column3 Row Label   Placeholder
Suzie   5200    20210112    Good    0
Tammy   221 20201010    Good    0

ResultFileBad:
Column1 Column2 Column3 Row Label   Placeholder
Nicky   5200    20210112    Bad 0

而且我只知道必须有一种方法可以做到这一点,而不必运行两次比较 - 使用 Where-Object 道具或某种循环。我就是想不通;所以我来找专家。

谢谢

编辑:感谢 postanote,一种可行的替代方法是只输出组合文件,然后将其拆分,这绝对比运行整个比较例程两次要快。还是想看看有没有办法直接在比较导出中做,不用中间文件;但这绝对是一个可行的选择,我现在正在使用

$FinalHeader = get-content "$FinalTXT" | Select -First 1
$BadOutput = Select-String -Path $FinalTXT -Pattern ('Bad   0')
$GoodOutput = Select-String -Path $FinalTXT -Pattern ('Good 0')
@($FinalHeader,$BadOutput.Line) | Out-File "$FinalBadTXT" -Encoding UNICODE;
@($FinalHeader,$GoodOutput.Line) | Out-File "$FinalGoodTXT" -Encoding UNICODE;
4

1 回答 1

0

继续我的评论。

你有很多事情要做;即,一些代理功能等。

像你一样混合这些项目,你最终会得到这样的东西......(当然非常简化,因为你要展示你的输入,你迫使我们猜测想出一个。)

psEdit -filenames 'D:\temp\book1.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Main0,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>

psEdit -filenames 'D:\temp\book3.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
Branch3,jjj,kkk,lll
Branch4,mmm,nnn,ooo
#>

更新:

删除所有以前的东西,因为它们不是你的那杯茶......

;-}

Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt') | 
Export-Csv -Path 'D:\Temp\CompareObject.csv' -NoTypeInformation -Force

(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\<=') -replace '.*CompareObject.*:\"|\"\,.*' | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force

(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\=>') -replace '.*CompareObject.*:\"|\"\,.*' | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force

$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'

$FileList | 
ForEach-Object {
    "`n********* Getting content $PSItem *********`n"
    Import-Csv -Path  "D:\temp\$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********


Site    Dept Office Floor
----    ---- ------ -----
Main0   aaa  bbb    ccc  

********* Getting content DifferenceObject.csv *********

Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo 
#>

所以,至于你的最后一条评论:


虽然该方法仍然使用中间文件;我承认我完全没有考虑导出组合文件然后拆分它的简单方法。***

好的,那么,不使用“中间文件”。

($ComparedObjects = Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt'))
# Results
<#
InputObject         SideIndicator
-----------         -------------
Main0,aaa,bbb,ccc   <=           
Branch3,jjj,kkk,lll =>           
Branch4,mmm,nnn,ooo => 
#>

($ComparedObjects -match '<=').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor 
# Results
<#
Site  Dept Office Floor
----  ---- ------ -----
Main0 aaa  bbb    ccc  
#>

($ComparedObjects -match '=>').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor 
# Results
<#
Site    Dept Office Floor
----    ---- ------ -----
Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo 
#>

然后只是导出到 csv。

($ComparedObjects -match '<=').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force

($ComparedObjects -match '=>').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force

根据需要回读

$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'

$FileList | 
ForEach-Object {
    "`n********* Getting content $PSItem *********`n"
    Import-Csv -Path  "D:\temp\$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********


Site    Dept Office Floor
----    ---- ------ -----
Main0   aaa  bbb    ccc  

********* Getting content DifferenceObject.csv *********

Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo  
#>

更新

根据你的评论——


“问题是最终的输出需要:带有附加列的 Unicode 制表符分隔的文本。”


(($ComparedObjects -match '<=').InputObject) -replace ',', "`t" | 
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor  | 
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\ReferenceObject.csv'
# Results
<#
Site  Dept Office Floor
----  ---- ------ -----
Main0 aaa  bbb    ccc  
#>


(($ComparedObjects -match '=>').InputObject) -replace ',', "`t" | 
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor  | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\DifferenceObject.csv'
# Results
<#
Site    Dept Office Floor
----    ---- ------ -----
Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo  
#>

或者对于额外的列内容,你可以这样做......

$ComparedObjects -match '<=' | 
Select-Object -Property @{
    Name       = 'Site'
    Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
    Name       = 'Dept'
    Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
    Name       = 'Office'
    Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
    Name       = 'Floor'
    Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
    Name       = 'Label'
    Expression = {'Good'}
}, 
@{
    Name       = 'Placeholder'
    Expression = {0}
} |  
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\ReferenceObject.csv') -replace '"','' -replace ',', "`t" | 
Set-Content -PassThru 'D:\temp\ReferenceObject.csv'
Import-Csv -Path 'D:\temp\ReferenceObject.csv' -Delimiter "`t" | 
Format-Table -AutoSize
# Results
<#
Site  Dept Office Floor Label Placeholder
----  ---- ------ ----- ----- -----------
Main0 aaa  bbb    ccc   Good  0 
#>


$ComparedObjects -match '=>' | 
Select-Object -Property @{
    Name       = 'Site'
    Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
    Name       = 'Dept'
    Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
    Name       = 'Office'
    Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
    Name       = 'Floor'
    Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
    Name       = 'Label'
    Expression = {'Good'}
}, 
@{
    Name       = 'Placeholder'
    Expression = {0}
} | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\DifferenceObject.csv') -replace '"','' -replace ',', "`t" | 
Set-Content -PassThru 'D:\temp\DifferenceObject.csv'
Import-Csv -Path 'D:\temp\DifferenceObject.csv' -Delimiter "`t" | 
Format-Table -AutoSize
# Results
<#
Site    Dept Office Floor Label Placeholder
----    ---- ------ ----- ----- -----------
Branch3 jjj  kkk    lll   Good  0          
Branch4 mmm  nnn    ooo   Good  0 
#>
于 2021-05-20T04:35:42.903 回答