我的 ml.net 控制台应用程序出现问题。这是我第一次在 Visual Studio 中使用 ml.net,所以我按照microsoft.com 的这个教程进行操作,这是一个使用二进制分类的情感分析。
我正在尝试以 tsv 文件的形式处理一些测试数据以获得正面或负面的情绪分析,但在调试时我收到警告,有 1 个格式错误和 2 个错误值。
我决定在 Stack 上向所有伟大的开发人员询问是否有人可以帮助我找到解决方案。
下面是调试的图像:
这是我的测试数据的链接:
wiki-data
wiki-test-data
最后,这是我的代码,供那些重现问题的人使用:
有 2 个 c# 文件:SentimentData.cs 和 Program.cs。
1 - SentimentData.cs:
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.ML.Runtime.Api;
namespace MachineLearningTut
{
public class SentimentData
{
[Column(ordinal: "0")]
public string SentimentText;
[Column(ordinal: "1", name: "Label")]
public float Sentiment;
}
public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Sentiment;
}
}
2 - 程序.cs:
using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Threading.Tasks;
namespace MachineLearningTut
{
class Program
{
const string _dataPath = @".\Data\wikipedia-detox-250-line-data.tsv";
const string _testDataPath = @".\Data\wikipedia-detox-250-line-test.tsv";
const string _modelpath = @".\Data\Model.zip";
static async Task Main(string[] args)
{
var model = await TrainAsync();
Evaluate(model);
Predict(model);
}
public static async Task<PredictionModel<SentimentData, SentimentPrediction>> TrainAsync()
{
var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader (_dataPath).CreateFrom<SentimentData>());
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
pipeline.Add(new FastForestBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });
PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();
await model.WriteAsync(path: _modelpath);
return model;
}
public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)
{
var testData = new TextLoader(_testDataPath).CreateFrom<SentimentData>();
var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("-------------------------------------");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
Console.WriteLine($"Auc: {metrics.Auc:P2}");
Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
}
public static void Predict(PredictionModel<SentimentData, SentimentPrediction> model)
{
IEnumerable<SentimentData> sentiments = new[]
{
new SentimentData
{
SentimentText = "Please refrain from adding nonsense to Wikipedia."
},
new SentimentData
{
SentimentText = "He is the best, and the article should say that."
}
};
IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);
Console.WriteLine();
Console.WriteLine("Sentiment Predictions");
Console.WriteLine("---------------------");
var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));
foreach (var item in sentimentsAndPredictions)
{
Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
}
Console.WriteLine();
}
}
}
如果有人想查看解决方案的代码或更多详细信息,请在聊天中询问我,我会发送。提前致谢!!![竖起大拇指]