shell - Count bytes in a field

Question

I have a file that looks like this:

ASDFGHJ|ASDFEW|ASFEWFEWAFEWASDFWE FEWFDWAEWA FEWDWDFEW|EWFEW|ASKOKJE
IOJIKNH|ASFDFEFW|ASKDFJEO JEWIOFJS IEWOFJEO SJFIEWOF WE|WEFEW|ASFEWAS

I'm having trouble with this file because it's written in Cyrillic and the database complains about number of bytes (vs number of characters). I want to check if, for example, the first field is larger than 10 bytes, the second field is larger than 30 bytes, etc.

I've been trying a lot of different things: awc, wc... I know with wc -c I can count bytes but how can I retrieve only the lines that have a field that is larger than X?

Any idea?

score 3 · Accepted Answer

这是一个 Perl 单行程序，如果以字节为单位的字段长于数组中的相应成员，则打印整行@m：

perl -F'\|' -Mbytes -lane '@m=(10,10,30,10); print if grep { bytes::length $_ > shift @m } @F' file

顾名思义，bytes::length忽略编码并以字节为单位返回每个字段的长度。切换到 Perl 启用自动拆分模式，该模式创建一个包含所有字段-a的数组。@F我使用管道|作为分隔符（它需要用反斜杠转义）。开关从-l行尾删除换行符，确保您的最终字段是正确的长度。

该-n开关告诉 Perl 循环遍历文件中的每一行。grep根据块中的条件过滤数组@F。我shift用来删除并返回的第一个元素@m，以便将中的每个字段@F与中的相应元素进行比较@m。如果过滤列表包含任何元素（即，如果任何字段长于其限制），则过滤列表将在此上下文中评估为真。

score 3 · Accepted Answer

如果您愿意使用，perl那么这可能会有所帮助。我添加了评论以使您更容易理解：

#!/usr/bin/perl

use strict;
use warnings;
use bytes;

## Change the file to path where your file is located
open my $data, '<', 'file';    

## Define an array with acceptable sizes for each fields
my @size = qw( 10 30 ... );        

LINE: while(<$data>) {         ## Read one line at a time      
    chomp;                     ## Remove the newline from each line read

    ## Split the line on | and store each fields in an array
    my @fields = split /\|/;   

    for ( 0 .. $#fields ) {    ## Iterate over the array

        ## If the size is less than desired size move to next line
        next LINE unless bytes::length($fields[$_]) > $size[$_];  
    }

    ## If all sizes matched  print the line
    print "$_\n";  
}

score 1 · Accepted Answer

要获取某个特定的字节数，FIELD可以LINE发出以下awk命令：

awk -F'|' -v LINE=1 -v FIELD=3 'NR==LINE{print $FIELD}' input.txt | wc -c

要打印每个字段的字节数，您可以使用一个小循环：

awk -F'|' '{for(i=1;i<NF;i++)print $i}' a.txt | \
while read field ; do 
    nb=$(wc -c <<<"$field")
    echo "$field $nb"

    # Check if the field is too long
    if [ "$nb" -gt 40 ] ; then
        echo "field $field is too long"
        exit 1
    fi
done

shell - Count bytes in a field

3 回答 3

Related

Reference