-1

我有一个巨大的文本文件,由大约 18.000 个文本样本组成,由一致数量的正向斜杠分隔://///////////////

Example:

////////////////////
P11/000568

name: john
age: 18
hobbies: cycling, drawing

////////////////////
P12/000894
name: eva
age: 19
hobbies: football, soccer, baseball

我想为“足球”这个词“挖掘”这个文本文件,这样每次程序“看到”这个词时,都会在 excel 中创建一个新行,其中包含主题名称和上面的代码

      A            B           age
1 P12/000894       eva         19

我尝试了一些代码(vba-excel,将每一块文本读入一个数组),但由于 txt 文件太大,我的电脑在运行代码时正在融化。关于最佳方法的任何想法(vba-excel 优先,powerpivot/powerquery??)

谢谢 C

4

2 回答 2

0

这是您在 PowerQuery 中想要的:

let
    //Delimiter is REQUIRED, Default of "," messes the table!
    Source = Csv.Document(File.Contents("C:\Users\Eugene\Desktop\test.txt"),[Delimiter="`", Columns=1, Encoding=1251, QuoteStyle=QuoteStyle.None]),
    Split = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false),{"Parameter", "Value"}),
    FilteredExcessiveRows = Table.SelectRows(Split, each ([Parameter] <> "" and [Parameter] <> "////////////////////")),
    AddCode = Table.AddColumn(FilteredExcessiveRows, "Code", each if _[Value] = null then _[Parameter] else null),
    FillDown = Table.FillDown(AddCode,{"Code"}),
    RemoveDuplicates = Table.Distinct(FillDown, {"Code", "Parameter"}),
    Filtered2 = Table.SelectRows(RemoveDuplicates, each ([Value] <> null)),
    Pivot = Table.Pivot(Filtered2, List.Distinct(Filtered2[Parameter]), "Parameter", "Value"),
    Filtered3 = Table.SelectRows(Pivot, each Text.Contains([hobbies], "soccer")),
    RemoveHobbies = Table.RemoveColumns(Filtered3,{"hobbies"})
in
    RemoveHobbies
于 2016-12-02T02:14:54.790 回答
0

编辑:现在有一段视频可以解释两种方法。快速方法在下面的代码中。

在 Excel / Power Query 中,可以使用以下代码完成:

let
    Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],
    Typed = Table.TransformColumnTypes(Source,{{"Text", type text}}),
    RemovedBlankRows = Table.SelectRows(Typed, each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    Combined = Table.FromColumns({List.Skip(RemovedBlankRows[Text],4),List.Skip(RemovedBlankRows[Text],1),List.Skip(RemovedBlankRows[Text],2),List.Skip(RemovedBlankRows[Text],3)},{"Original","Code","Name","Age"}),
    Filtered = Table.SelectRows(Combined, each Text.Contains([Original], "soccer")),
    Removed = Table.RemoveColumns(Filtered,{"Original"}),
    Transformed = Table.TransformColumns(Removed, {{"Name", each Text.RemoveRange(_, 0, 6), type text},{"Age", each Int64.From(Text.RemoveRange(_, 0, 5)), Int64.Type}})
in
    Transformed
于 2016-11-23T06:11:53.327 回答