0

假设有两个字符串 - PR-ACT-SOURCE-DETAIL-1 和 PR-ACT-SOURCE-DETAIL-2。我想比较这两个字符串并找出发现差异的位置。

我试图以这种方式处理这种情况 -

 PERFORM VARYING N FROM 1 BY 1 UNTIL N > 5000                                                                  
    IF PR-ACT-SOURCE-DETAIL-1 OF TRANSACTION-RECORD-1(N:1)   
        IS NOT EQUAL TO                                    
       PR-ACT-SOURCE-DETAIL-2 OF TRANSACTION-RECORD-2(N:1)  

        MOVE 'Y' TO WS-DIFF-FOUND   
        DISPLAY 'DIFFERENCE FOUND AT POSITION' N
    END-IF
 END-PERFORM

上面代码的问题是执行循环发生了 5000 次,如果我需要比较这样的 10,000 个字符串,那么执行时间就会变得太长。

有没有其他方法可以做同样的事情,需要更少的执行时间。

4

4 回答 4

1

这里有三个想法可以探索,它们可能会减少程序的整体运行时间

第一个是在找到第一个差异时终止循环。即使在确定变量包含差异之后,您当前的代码仍将继续运行整个变量。如果您只需要知道存在差异以及差异开始的位置,您可以尝试以下操作:

 MOVE 'N' TO WS-DIFF-FOUND
 PERFORM VARYING N FROM 1 BY 1
           UNTIL N > LENGTH OF PR-ACT-SOURCE-DETAIL-1
              OR WS-DIFF-FOUND = 'Y'
     IF PR-ACT-SOURCE-DETAIL-1 (N:1) <> PR-ACT-SOURCE-DETAIL-2
        MOVE 'Y' TO WS-DIFF-FOUND
     END-IF
  END-PERFORM

  IF WS-DIFF-FOUND = 'Y'
     do whatever process you need to do
  END-IF

LENGTH OF请注意,在上面我更改了硬编码变量长度(5000)以使用带有特殊寄存器的变量的实际声明长度。这样,如果您在将来的维护期间更改可变长度,循环迭代器会“自动”调整(少一件事出错)。

如果您要比较的大多数数据实际上是相等的,差异是一个罕见的例外,那么您可以尝试先对数据项进行直接相等比较,然后仅在发现差异时执行逐个字符的测试。这可能会提供一些改进,但需要进行基准测试以验证它是否确实是一种改进。一些编译器可能会生成非常有效的代码来进行这种比较,而其他编译器则不会。试一试...

 IF PR-ACT-SOURCE-DETAIL-1 = PR-ACT-SOURCE-DETAIL-2
    MOVE 'N' TO WS-DIFF-FOUND
 ELSE
    use the PERFORM VAYRING loop shown above
 END-IF
 IF WS-DIFF-FOUND = 'Y'
 ...

最后一个想法是查看声明N并确保您为编译器使用最有效的数据类型。例如,如果N声明为:

 01 N      PIC 9(7).

当使用上述方法递增和计算适当的偏移量时,编译器可能不会生成非常有效的代码。另一方面,类似:

 01 N      PIC 9(9) BINARY.

可能会导致更有效的循环。这在很大程度上取决于您使用的编译器以及您提供给它的选项。有时,这些微小的差异会对程序性能产生重大影响。

于 2013-09-17T14:12:44.903 回答
0

下面的程序使用两种不同的方法,每种方法有两种变体:

  1. 使用 PIC 9 作为索引的顺序
  2. 使用 PIC s9 COMP-5 作为索引的顺序
  3. 二进制搜索
  4. 具有较少循环的二分搜索

在最坏的情况下(最后一个字节的差异)

  • 由于使用了本机数据项,方法 2 比方法 1 快约 1.5 倍
  • 方法 3 比方法 1 快大约 11 倍

笔记:

  • 当差异在前 250 个字节中时,二分查找比顺序查找要慢。
  • 方法 3 和 4 是等价的
  • 性能可能会受到 COBOL 运行时的实现的影响。

兼容性:

  • ANSI-85 中定义的 EXIT PERFORM 语句。
  • COMP-5 不是 ANSI(但几乎所有编译器都支持),可以用表示 BINARY-LONG 的用法代替。

代码:

   PROGRAM-ID. COMPSTR.
   WORKING-STORAGE SECTION.
   01 N PIC S9(9) COMP-5.
   01 N1 PIC 9(9).
   01 STRLEN PIC S9(9) COMP-5.
   01 CMPLEN PIC S9(9) COMP-5.
   01 CHUNK-BASE-OFFSET PIC S9(9) COMP-5.
   01 CHUNK-BASE-LENGTH PIC S9(9) COMP-5.
   01 CHUNK-OFFSET PIC S9(9) COMP-5.
   01 CHUNK-LENGTH PIC S9(9) COMP-5.
   01 STR1 PIC X(5000).
   01 STR2 PIC X(5000).
   01 WS-DIFF-FOUND PIC X.

   01 DIFF-TIME PIC 9(7)V99 COMP-5.
   01 EMPTY-PERFORM-TIME PIC 9(7)V99 COMP-5.
   78 LOOPS VALUE 10000.
   01 START-TIME.
      03 START-H PIC 99.
      03 START-M PIC 99.
      03 START-S PIC 99.
      03 START-T PIC 99.
   01 END-TIME.
      03 END-H PIC 99.
      03 END-M PIC 99.
      03 END-S PIC 99.
      03 END-T PIC 99.
   01 X PIC X.
   PROCEDURE DIVISION.
   MAIN-LOGIC.
       MOVE 5000 TO STRLEN

       ACCEPT START-TIME FROM TIME
       PERFORM LOOPS TIMES
           PERFORM EMPTY-PERFORM
       END-PERFORM
       ACCEPT END-TIME FROM TIME
       PERFORM TIME-DIFF
       MOVE DIFF-TIME TO EMPTY-PERFORM-TIME
       DISPLAY "EMPTY-PERFORM: " EMPTY-PERFORM-TIME

       MOVE ALL SPACES TO STR1 STR2
       MOVE "X" TO STR2(5000:1)
       PERFORM TEST-ALL

       MOVE ALL SPACES TO STR1 STR2
       MOVE "X" TO STR2(1:1)
       PERFORM TEST-ALL

       MOVE ALL SPACES TO STR1 STR2
       MOVE "X" TO STR2(2500:1)
       PERFORM TEST-ALL

       MOVE ALL SPACES TO STR1 STR2
       MOVE "X" TO STR2(250:1)
       PERFORM TEST-ALL

       ACCEPT X
       EXIT PROGRAM
       STOP RUN
       .

   TEST-ALL.
       ACCEPT START-TIME FROM TIME
       PERFORM LOOPS TIMES
           PERFORM COMPARE-1
       END-PERFORM
       ACCEPT END-TIME FROM TIME
       PERFORM TIME-DIFF
       DISPLAY "COMPARE-1: " DIFF-TIME " DIFFERENCE AT: " N1

       ACCEPT START-TIME FROM TIME
       PERFORM LOOPS TIMES
           PERFORM COMPARE-2
       END-PERFORM
       ACCEPT END-TIME FROM TIME
       PERFORM TIME-DIFF
       DISPLAY "COMPARE-2: " DIFF-TIME " DIFFERENCE AT: " N

       ACCEPT START-TIME FROM TIME
       PERFORM LOOPS TIMES
           PERFORM COMPARE-3
       END-PERFORM
       ACCEPT END-TIME FROM TIME
       PERFORM TIME-DIFF
       DISPLAY "COMPARE-3: " DIFF-TIME " DIFFERENCE AT: " N

       ACCEPT START-TIME FROM TIME
       PERFORM LOOPS TIMES
           PERFORM COMPARE-4
       END-PERFORM
       ACCEPT END-TIME FROM TIME
       PERFORM TIME-DIFF
       DISPLAY "COMPARE-4: " DIFF-TIME " DIFFERENCE AT: " N
       .

   EMPTY-PERFORM.
       .

   COMPARE-1.
       PERFORM VARYING N1 FROM 1 BY 1 UNTIL N1 > 5000                                                                  
           IF STR1(N1:1) IS NOT EQUAL TO STR2(N1:1)
               MOVE 'Y' TO WS-DIFF-FOUND
               EXIT PERFORM
           END-IF
       END-PERFORM
       .

   COMPARE-2.
       PERFORM VARYING N FROM 1 BY 1 UNTIL N > 5000                                                                  
           IF STR1(N:1) IS NOT EQUAL TO STR2(N:1)
               MOVE 'Y' TO WS-DIFF-FOUND
               EXIT PERFORM
           END-IF
       END-PERFORM
       .

   COMPARE-3.
       IF STR1 = STR2
           MOVE 0 TO N
       ELSE
           MOVE 1 TO CMPLEN
           PERFORM UNTIL CMPLEN >= STRLEN
              COMPUTE CMPLEN = CMPLEN * 2
           END-PERFORM
           MOVE 1 TO CHUNK-BASE-OFFSET
           COMPUTE CHUNK-BASE-LENGTH = CMPLEN / 2
           PERFORM UNTIL 1 = 2
               MOVE CHUNK-BASE-OFFSET TO CHUNK-OFFSET
               MOVE CHUNK-BASE-LENGTH TO CHUNK-LENGTH
               PERFORM 2 TIMES
                   IF CHUNK-OFFSET + CHUNK-LENGTH - 1 > STRLEN
                       COMPUTE CHUNK-LENGTH =
                               STRLEN - CHUNK-OFFSET + 1
                   END-IF
                   IF STR1(CHUNK-OFFSET:CHUNK-LENGTH)
                        IS NOT EQUAL TO
                        STR2(CHUNK-OFFSET:CHUNK-LENGTH)
                       MOVE CHUNK-OFFSET TO CHUNK-BASE-OFFSET
                       COMPUTE CHUNK-BASE-LENGTH =
                               CHUNK-BASE-LENGTH / 2
                       EXIT PERFORM
                   ELSE
                       ADD CHUNK-LENGTH TO CHUNK-OFFSET
                   END-IF
               END-PERFORM
               IF CHUNK-BASE-LENGTH = 0
                   EXIT PERFORM
               END-IF
           END-PERFORM
           MOVE CHUNK-OFFSET TO N
       END-IF
       .

   COMPARE-4.
       IF STR1 = STR2
           MOVE 0 TO N
       ELSE
           MOVE 1 TO CMPLEN
           PERFORM UNTIL CMPLEN >= STRLEN
              COMPUTE CMPLEN = CMPLEN * 2
           END-PERFORM
           MOVE 1 TO CHUNK-BASE-OFFSET
           COMPUTE CHUNK-BASE-LENGTH = CMPLEN / 2
           PERFORM UNTIL 1 = 2
               MOVE CHUNK-BASE-OFFSET TO CHUNK-OFFSET
               MOVE CHUNK-BASE-LENGTH TO CHUNK-LENGTH
               PERFORM 2 TIMES
                   IF CHUNK-OFFSET + CHUNK-LENGTH - 1 > STRLEN
                       COMPUTE CHUNK-LENGTH =
                               STRLEN - CHUNK-OFFSET + 1
                   END-IF
                   IF STR1(CHUNK-OFFSET:CHUNK-LENGTH)
                        IS NOT EQUAL TO
                        STR2(CHUNK-OFFSET:CHUNK-LENGTH)
                       MOVE CHUNK-OFFSET TO CHUNK-BASE-OFFSET
                       PERFORM UNTIL CHUNK-BASE-LENGTH <
                                     CHUNK-LENGTH
                           COMPUTE CHUNK-BASE-LENGTH =
                                   CHUNK-BASE-LENGTH / 2
                       END-PERFORM
                       EXIT PERFORM
                   ELSE
                       ADD CHUNK-LENGTH TO CHUNK-OFFSET
                   END-IF
               END-PERFORM
               IF CHUNK-BASE-LENGTH = 0
                   EXIT PERFORM
               END-IF
           END-PERFORM
           MOVE CHUNK-OFFSET TO N
       END-IF
       .

   TIME-DIFF.
       COMPUTE DIFF-TIME = (END-H - START-H) * 3600 +
                           (END-M - START-M) * 60 +
                           (END-S - START-S) +
                           (END-T - START-T) / 100
       .
于 2013-10-02T13:15:47.443 回答
0

假设您的 COBOL 编译器很紧凑,那么您所拥有的是逐字节比较字符串的方式。正如其他答案所建议的那样,这实际上取决于您是想要所有差异还是只是字符串不同,以及第一个差异在哪里。

我自己,我可能会让 COBOL 先比较完整的字符串,然后再逐字节比较,如果它们不相等。很有可能,编译器用于完整字符串比较的代码比它从您的手动方法中收集的代码更严格。

于 2013-09-23T12:45:32.173 回答
0
PERFORM 
  VARYING N  
   FROM 1 
   BY 1 
     UNTIL ( N GREATER THAN 5000 )
      OR ( byte-field-1 ( N : 1 ) 
          EQUAL TO byte-field-2 ( N : 1 ) )
END-PERFORM

EVALUATE TRUE
  WHEN N GREATER THAN 5000
    match
  WHEN N LESS THAN 5000
    no match
  WHEN OTHER
    IF ( byte-field-1 ( N : 1 ) 
         EQUAL TO byte-field-2 ( N : 1 ) )
        match
    ELSE
        no match
    END-IF
END-WHEN

一旦发现不匹配,这将停止搜索。

如果您的 10,000 个字符串中有很多不匹配,这只会真正帮助您的表现。

10,000 * 5,000 只是 50,000,000 - 为什么会有这么大的问题?

如果您完整地描述您的数据,可能还有其他解决方案。

您应该删除愚蠢的限定条件,为 N 取一个好名称,并为 5000 设置一个字段,其值检查为包含您的字符串的字段的长度。

你真的有一个“字符串”,还是一个数据块?COBOL 中不像其他语言那样存在字符串。

了解你的数据,描述你的数据,解释为什么性能是一个问题。您使用哪种编译器和硬件?

我不确定标签字符串和比较能让你得到多少。

于 2013-09-17T13:14:25.803 回答