powershell - 有没有办法遍历 Powershell 中的 publicsuffix 列表？

Question

我正在尝试测试一个 Web 过滤解决方案，所以我有一个 powershell 循环遍历 URL 列表并返回 Web 响应。问题是您经常访问可能未找到未经授权的 403 或 404 的 cdn 或其他站点，您需要找到根域。

我发现的唯一合乎逻辑的解决方案是将它与 publicsuffix 列表交叉引用。从我所看到的情况来看，它唯一不能很好地运行的语言是 PowerShell。我想知道是否有人遇到过这个问题或有解决方案。

score 1 · Accepted Answer

虽然您的解决方案有效，但还有一种更简洁、更快的替代方案：

$url = 'https://publicsuffix.org/list/public_suffix_list.dat'
(Invoke-RestMethod $url) -split "`n" -match '^[^/\s]' |
  Set-Content .\public_suffix_list.dat

Invoke-RestMethod $url将指定 URL 处的文本文件作为单个字符串返回。
-split "`n"将字符串拆分为行数组
-match '^[^/\s]'匹配那些以 ( ) 开头^的字符（来自包含在中的集合[...]）不是 ( ^) 文字/而不是空白字符 ( /s)，这有效地过滤掉了注释/（假设的）非数据行。

上面将 data-lines-only 数组保存到file，就像您的解决方案一样。

请注意，确定给定 URL 是否具有公共后缀不仅仅涉及与数据行的后缀匹配，因为后者具有通配符标签 ( *) 并涉及异常（以开头的行!） - 请参阅https://publicsuffix.org/list/

score 0 · Accepted Answer

# You can use whatever directory
$workingdirectory = "C:\"

# Downloads the public suffix list
Invoke-WebRequest -Uri "https://publicsuffix.org/list/public_suffix_list.dat" -OutFile "$workingdirectory\public_suffix_list.dat"

# Gets the content of the file, removes the empty spaces, removes all the
# comments that has // and outputs it to a file
(gc $workingdirectory\public_suffix_list.dat) |
    ? { $_.Trim() -ne "" } |
    Select-String -Pattern "//" -NotMatch |
    Set-Content "$workingdirectory\public_suffix_list.dat"

powershell - 有没有办法遍历 Powershell 中的 publicsuffix 列表？

2 回答 2

Related

Reference