问题标签 [unix-text-processing]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

21 问题

0 投票

1 回答

175 浏览

regex - 正则表达式匹配 nginx 位置块？

我正在开发一个 bash 脚本，该脚本可以将 nginx 位置块添加到采用 URL 的文件中。为了防止重复，如果它已经存在，这个脚本也会删除它们。

为了删除一个已经存在的块，我在下面制作了正则表达式。 ^location\s\/${URLGOESHERE} {[\s\S]*?(?=\n{2,})$

正则表达式需要匹配整个多行块，如下所示：

我希望正则表达式匹配块内的任何内容，直到右括号}文件内将有多个块，例如

我制作的正则表达式有效，但前提是块前后有空行。因此，对于我的正则表达式 URL2，pcgrep 将找不到 3（之前或之后没有换行符）和 4（文件末尾没有换行符）

我想知道是否可以使正则表达式完全匹配块而不需要这些空行。

regex unix-text-processing

2021-06-22T09:58:22.743

0 投票

2 回答

83 浏览

awk - 根据重复的第一个单词删除行，忽略大小写

我有 1M个 fasttext 格式的词向量（忽略包含词汇大小和暗淡的第一行）。每行是一个单词，后跟 300 个数字，所有空格分隔，例如。

如何保留单词出现的第一行，忽略大小写，并删除所有其他行？例如，因为Word最先出现，所以WORD删除带有的行，输出为

我可以使用tr '[:upper:]' '[:lower:]' < wiki-news-300d-1M.vec将所有单词转换为小写，但这会破坏单词的大小写。如果包括数字在内的整行匹配，我知道如何删除所有重复的行，但这在这里没有用。我的 python 解决方案是保留一个存储每个单词的小写字母的字典，并根据该字典检查每一行的单词，但我对 awk/sed（甚至 grep）解决方案感到好奇。

2021-06-23T23:15:46.673

0 投票

5 回答

73 浏览

sed - 提取字符串之间的子字符串

我有一个包含如下文本的文件：

我想提取###.

我想要的输出是这样的：

我尝试了以下方法：

这几乎可以工作，但似乎只抓取每行的第一个实例，所以我的输出中的第一行只抓取

而不是

sed grep unix-text-processing

2021-06-24T14:50:58.050

0 投票

2 回答

53 浏览

csv - Miller - Ignore valid field names when using -N

I'm using miller to process some CSV files like so:

It works well, but some of the CSV files contain field names and some do not, which is why I'm using -N. In the files that have field names, they get printed in the output. You would think that having the headerless-csv-output bundled in the N flag they wouldn't, but they are. Maybe it's a bug? Anyway, how would do I prevent the field names from being printed? If the input needs to be altered somehow and piped in that's fine, but the output is being uniquely processed.

Here's the documentation I've been referencing:

my.csv

Expected output

Present output

csv miller unix-text-processing

2021-06-24T19:46:26.240

0 投票

1 回答

47 浏览