0

下面的程序可以根据一些规范生成随机数据(这里的例子是 2 列)

它适用于我的 PC 上的几十万行(应该取决于 RAM)。我需要扩展到数千万行。

如何优化程序以直接写入磁盘?辅助我如何“缓存”解析规则的执行,因为它总是重复 5000 万次的相同模式?

注意:要使用下面的程序,只需键入 generate-blocks 和 save-blocks 输出将是 db.txt

Rebol[]

specs: [
    [3 digits 4 digits 4 letters]
    [2 letters 2 digits]
]

;====================================================================================================================


digits: charset "0123456789"
letters: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
separator: charset ";"

block-letters: [A B C D E F G H I J K L M N O P Q R S T U V W X Y Z]

blocks: copy []

generate-row: func[][
    Foreach spec specs [

        rule: [

            any [

                [
                    set times integer! [['digits (                          
                                repeat n times [                    
                                block: rejoin [block random 9]                          
                            ]

                            )
                            | 
                            'letters (repeat n times [                  
                                block: rejoin [ block to-string pick block-letters random 24]                       
                            ]

                            )
                        ]
                        |
                        [
                            'letters (repeat n times [block: rejoin [ block to-string pick block-letters random 24]                     
                            ]

                            )       
                            | 
                        'digits (repeat n times [block: rejoin [block random 9]]

                        )   
                        ]
                    ]
                    |
                    {"} any separator {"}
                ]

            ]

            to end

        ]
        block: copy ""
        parse spec rule
        append blocks block
    ]
]

generate-blocks: func[m][
  repeat num m [  
    generate-row
  ]
]

quote: func[string][
    rejoin [{"} string {"}]
]

save-blocks: func[file][
    if exists? to-rebol-file file [
        answer: ask rejoin ["delete " file "? (Y/N): "]
        if (answer = "Y") [
            delete %db.txt
        ]
    ]
    foreach [field1 field2] blocks [
        write/lines/append %db.txt rejoin [quote field1 ";" quote field2]
    ]
]
4

1 回答 1

2

使用 open 与 /direct 和 /lines 细化直接写入文件而不缓冲内容:

file: open/direct/lines/write %myfile.txt
loop 1000 [
  t: random "abcdefghi"
  append file t
]
Close file

这将在没有缓冲的情况下写入 1000 条随机行。您还可以准备一行行(比如说 10000 行),然后将其直接写入文件,这比逐行写入要快。

file: open/direct/lines/write %myfile.txt
loop 100 [
  b: copy []
  loop 1000 [append b random "abcdef"]
  append file b
]
close file

这会快得多,不到一秒 100000 行。希望这会有所帮助。

请注意,您可以根据您的电脑内存的需要更改数字 100 和 1000,并使用 b:make block!1000代替b:copy[],会更快。

于 2010-05-05T07:10:12.190 回答