一旦开始处理许多列,将整个文件拉入内存就变得更容易了。(即使对于令人惊讶的大文件,这也很有效。)
# Read the lines in
set f [open $filename]
set lines [split [string trim [read $f]] "\n"]
close $f
# Do an initial clean up of the data; \S+ matches non-whitespace
set headers [regexp -all -inline {\S+} [lindex $lines 0]]
set data [lmap line [lrange $lines 1 end] {regexp -all -inline {\S+} $line}]
# Properly we'd also validate the data to handle non-numeric junk, but this is just an example...
现在我们可以定义一个过程来按名称获取列的平均值:
proc columnAverage {name} {
global headers data
# Look up which column it is
set idx [lsearch -exact $headers $name]
if {$idx < 0} {
error "no such column \"$name\""
}
# Get the data from just that column
set column [lmap row $data {lindex $row $idx}]
# Calculate the mean of the column: sum / count
return [expr {[tcl::mathop::+ {*}$column] / double([llength $column])}]
}
你会这样称呼它:
puts "average of Elec is [columnAverage Elec]"