0

我想从数据库中检索一些单词,然后我将判断段落是正段还是负段
数据库文件格式是这样的。其中一些关键词有正负分

Word                Pos_Score           Neg_Score

Able                .324                .834
Country             .987                .213
Love                .378                .734 
agree               .546                .123
industry            .289                .714
guests              .874                .471

段落将是这样的。

I agree with you.  It seems an intelligent tourist industry allows its guests to either immerse fully, in part, or not, depending upon the guest.  That is why the ugly American charges have always confused me.  

现在,如果在数据库文件中找到单词,我会将段落的每个单词与数据库文件进行比较,然后我将检索单词的 Pos_Scoe 和 Neg_Score 分数,当整个段落最后比较时,这些分数将存储在变量中 Pos_Score 将单独添加,Neg_Score 将单独添加。这将是结果。
我尝试的代码是这个

    private void button1_Click(object sender, EventArgs e)
            {
                string MyConString = "server=localhost;" +
                   "database=sentiwornet;" + "password=zia;" +
                   "User Id=root;";
                MySqlConnection connection = new MySqlConnection(MyConString);
                MySqlCommand command = connection.CreateCommand();
                MySqlDataReader Reader;
                StreamReader reader = new StreamReader("D:\\input.txt");
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    string[] parts = line.Split(' ');

                    foreach (string part in parts)
                    {
                        command.CommandText = "SELECT Pos_Score FROM score WHERE Word = 'part'";
                        command.CommandText = "SELECT Neg_Score FROM score WHERE Word = 'part'";
                        //var 
                        connection.Open();
                        Reader = command.ExecuteReader();

                    }
                }

            }
4

2 回答 2

2

首先,这个查询效率极低。相反,如果您的段落足够小,我将通过将参数作为 CSV 列表传递来执行数据库内的所有连接,然后转换为 SQL 中的表。以下函数将执行此操作(由http://codebank.wordpress.com/2007/03/06/simple-sql-csv-to-table-2/提供):

警告:您将需要使用类似的东西去除所有标点符号string.Replace(new[] { '.', ',' ... etc })

此外,我的代码可能无法完全按照您的意愿执行——它甚至可能无法编译——但这就是编程的乐趣。这为您提供了我对如何解决相当复杂的问题的总体思路。

编辑: 我刚刚意识到您正在使用 MySql。这段代码适用于 MSSQL——我从未使用过 CLR 中的 MySql,所以我不知道所有类是否都是等价的。你可能需要回到你之前做的事情。

CSV 到列表

Create Function dbo.fn_CSVToTable (@CSVList Varchar(MAX))
Returns @Table Table (ColumnData Varchar(50))
As
Begin
If right(@CSVList, 1) <> ','
Select @CSVList = @CSVList + ','

Declare @Pos    Smallint,
@OldPos Smallint
Select  @Pos    = 1,
@OldPos = 1

While   @Pos < Len(@CSVList)
Begin
Select  @Pos = CharIndex(',', @CSVList, @OldPos)
Insert into @Table
Select  LTrim(RTrim(SubString(@CSVList, @OldPos, @Pos - @OldPos))) Col001
Select  @OldPos = @Pos + 1
End

Return
End

SQL 过程

CREATE PROCEDURE dbo.spGetWordScores (@csv varchar(MAX))
AS
select POS_SCORE, NEG_SCORE, WORD from score
inner join dbo.fn_CSVToTable(@csv) input
    on input.ColumnData = score.WORD

新的 C# 代码

var MyConString = "server=localhost;" +
               "database=sentiwornet;" + "password=zia;" +
               "User Id=root;";
var connection = new MySqlConnection(MyConString);

//Each line in the array will probably be one paragraph.
var fileLines = File.ReadAllLines("D:\\input.txt");
foreach (var line in fileLines)
{
        //Format your line into words by removing punctuation. I'm not going to bother
        //with that code because it is trivial.
        //var csv = line.Split(' ');

        var command = connection.CreateCommand();
                    command.CommandText = "exec spGetWordScores";
                    command.Parameters.AddWithValue("@csv", csv);
        var ds = command.ExecuteDataSet();

        //Now you have a DataSet with your word scores. do with them what you will.
}

有用的扩展方法

public static class Extensions
{
    public static DataSet ExecuteDataSet(this SqlCommand command)
    {
        using (SqlDataAdapter da = new SqlDataAdapter(command)) {
        DataSet ds = new DataSet();

        // Fill the DataSet using default values for DataTable names, etc
        da.Fill(ds);

        return ds;
        }
    }
}
于 2013-03-18T18:20:42.017 回答
0

来回访问数据库会影响您的性能。最好编写一个存储过程来接收您的输入字符串,将其拆分并计算分数 - 这样所有处理都将在一台机器上进行,并且您将通过不传达部分结果来节省大量时间。

于 2013-03-18T18:05:03.337 回答