4

可以说我有一个复杂的类型:

class Policy
{
    string Name { get; set; }
    DateTime InceptionDate { get; set; }
    DateTime ExpirationDate { get; set; }
    List<Location> Locations { get; set; }
}

class Location
{
    string Street { get; set; }
    string City { get; set; }
    string State { get; set; }
    string PostalCode { get; set; }
}

如何将集合Locations转换为特色列以供 ML.NET 理解?

4

2 回答 2

3

可以对原始类型数组进行特征化。

如果你的班级看起来像这样:

class Policy
{
    string Name { get; set; }
    DateTime InceptionDate { get; set; }
    DateTime ExpirationDate { get; set; }
    float[] Locations { get; set; }
}

然后Locations将转换为Vector类型R4(映射为float)。

然后你创建一个SchemaDefinition

var env = new LocalEnvironment();
var schemaDef = SchemaDefinition.Create(typeof(Policy));

如果向量的大小在编译时未知,您还需要:

int vectorSize = 4
schemaDef["Locations"].ColumnType = new VectorType(NumberType.R4, vectorSize);

如果向量的大小是固定的,您可以在VectorType属性上添加属性:

class Policy
{
    string Name { get; set; }
    DateTime InceptionDate { get; set; }
    DateTime ExpirationDate { get; set; }

    [VectorType(4)]
    float[] Locations { get; set; }
}

然后创建DataView

var data = new List<Policy>();
var dataView = env.CreateStreamingDataView(data, schemaDef);

在您的情况下,Locations是一个类,所以我相信您首先需要通过连接本示例中的值将其转换为原始数组:

public class IrisData
{
    public float Label;
    public float SepalLength;
    public float SepalWidth;
    public float PetalLength;
    public float PetalWidth;
}

public class IrisVectorData
{
    public float Label;
    public float[] Features;
}

static void Main(string[] args)
{
    // Here's a data array that we want to work on.
    var dataArray = new[] {
        new IrisData{Label=1, PetalLength=1, SepalLength=1, PetalWidth=1, SepalWidth=1},
        new IrisData{Label=0, PetalLength=2, SepalLength=2, PetalWidth=2, SepalWidth=2}
    };

    // Create the ML.NET environment.
    var env = new Microsoft.ML.Runtime.Data.TlcEnvironment();

    // Create the data view.
    // This method will use the definition of IrisData to understand what columns there are in the 
    // data view.
    var dv = env.CreateDataView<IrisData>(dataArray);

    // Now let's do something to the data view. For example, concatenate all four non-label columns
    // into 'Features' column.
    dv = new Microsoft.ML.Runtime.Data.ConcatTransform(env, dv, "Features", 
        "SepalLength", "SepalWidth", "PetalLength", "PetalWidth");

    // Read the data into an another array, this time we read the 'Features' and 'Label' columns
    // of the data, and ignore the rest.
    // This method will use the definition of IrisVectorData to understand which columns and of which types
    // are expected to be present in the input data.
    var arr = dv.AsEnumerable<IrisVectorData>(env, reuseRowObject: false)
        .ToArray();
}

但是我还没有真正尝试过这个案例,所以我在这里无法提供更多帮助。

另请在此处查看模式理解文档

于 2018-10-26T13:40:27.980 回答
2

可以在此处找到使用新 API 将数据从内存读取到 ML 管道的示例。复制相关代码,尽管该链接有一些额外的有用注释:

var mlContext = new MLContext();

IEnumerable<CustomerChurnInfo> churnData = GetChurnInfo();

var trainData = mlContext.CreateStreamingDataView(churnData);

var dynamicLearningPipeline = mlContext.Transforms.Categorical.OneHotEncoding("DemographicCategory")
    .Append(new ConcatEstimator(mlContext, "Features", "DemographicCategory", "LastVisits"))
    .Append(mlContext.BinaryClassification.Trainers.FastTree("HasChurned", "Features", numTrees: 20));

var dynamicModel = dynamicLearningPipeline.Fit(trainData);
于 2018-10-19T18:04:31.933 回答