1

我在MOSEK中遇到 CPU 亲和力和线性整数编程问题。我的程序使用 中的multiprocessing模块并行化Python,因此 MOSEK 在每个进程上同时运行。这台机器有 48 个内核,所以我使用Pool该类运行 48 个并发进程。他们的文档声明API 是线程安全的。

启动程序后,以下是top. 它表明大约 50% 的 CPU 处于空闲状态。仅显示顶部输出的前 20 行。

top - 22:04:42 up 5 days, 14:38,  3 users,  load average: 10.67, 13.65, 6.29
Tasks: 613 total,  47 running, 566 sleeping,   0 stopped,   0 zombie
%Cpu(s): 46.3 us,  3.8 sy,  0.0 ni, 49.2 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
GiB Mem:   503.863 total,  101.613 used,  402.250 free,    0.482 buffers
GiB Swap:   61.035 total,    0.000 used,   61.035 free.   96.250 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
115517 njmeyer   20   0  171752  27912  11632 R  98.7  0.0   0:02.52 python
115522 njmeyer   20   0  171088  27472  11632 R  98.7  0.0   0:02.79 python
115547 njmeyer   20   0  171140  27460  11568 R  98.7  0.0   0:01.82 python
115550 njmeyer   20   0  171784  27880  11568 R  98.7  0.0   0:01.64 python
115540 njmeyer   20   0  171136  27456  11568 R  92.5  0.0   0:01.91 python
115551 njmeyer   20   0  371636  31100  11632 R  92.5  0.0   0:02.93 python
115539 njmeyer   20   0  171132  27452  11568 R  80.2  0.0   0:01.97 python
115515 njmeyer   20   0  171748  27908  11632 R  74.0  0.0   0:03.02 python
115538 njmeyer   20   0  171128  27512  11632 R  74.0  0.0   0:02.51 python
115558 njmeyer   20   0  171144  27528  11632 R  74.0  0.0   0:02.28 python
115554 njmeyer   20   0  527980  28728  11632 R  67.8  0.0   0:02.15 python
115524 njmeyer   20   0  527956  28676  11632 R  61.7  0.0   0:02.34 python
115526 njmeyer   20   0  527956  28704  11632 R  61.7  0.0   0:02.80 python

我检查了文档的MOSEK 参数部分,没有看到任何与 CPU 亲和性相关的内容。它们有一些与优化器中的多线程相关的标志。这些标志被设置off为默认值,并且在冗余设置时off没有任何变化。

我检查了正在运行的 python 作业的 cpu 亲和力,其中许多都绑定到同一个 cpu。但是,奇怪的部分是我无法设置 cpu 亲和力,或者至少在我更改它之后它似乎很快又被更改了。

我选择了其中一项工作并通过运行来设置 cpu 亲和力taskset -p 0xFFFFFFFFFFFF 115526。我这样做了 10 次,中间间隔 1 秒。这是每次taskset调用后的 cpu 亲和性掩码。

pid 115526's current affinity mask: 10
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 7
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 200000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47

似乎某些东西在运行时不断改变 CPU 亲和力。

我也试过设置父进程的cpu亲和度,但效果一样。

这是我正在运行的代码。

import mosek
import sys
import cPickle as pickle
import multiprocessing
import time

def mosekOptim(aCols,aVals,b,c,nCon,nVar,numTrt):
    """Solve the linear integer program.


    Solve the program
    max c' x
    s.t. Ax <= b

    """

    ## setup mosek
    with mosek.Env() as env, env.Task() as task:
        task.appendcons(nCon)
        task.appendvars(nVar)
        inf = float("inf")


        ## c
        for j,cj in enumerate(c):
            task.putcj(j,cj)


        ## bounds on A
        bkc = [mosek.boundkey.fx] + [mosek.boundkey.up
                                     for i in range(nCon-1)]

        blc = [float(numTrt)] + [-inf for i in range(nCon-1)]
        buc = b


        ## bounds on x
        bkx = [mosek.boundkey.ra for i in range(nVar)]
        blx = [0.0]*nVar
        bux = [1.0]*nVar

        for j,a in enumerate(zip(aCols,aVals)):
            task.putarow(j,a[0],a[1])

        for j,bc in enumerate(zip(bkc,blc,buc)):
            task.putconbound(j,bc[0],bc[1],bc[2])

        for j,bx in enumerate(zip(bkx,blx,bux)):
            task.putvarbound(j,bx[0],bx[1],bx[2])

        task.putobjsense(mosek.objsense.maximize)

        ## integer type
        task.putvartypelist(range(nVar),
                            [mosek.variabletype.type_int
                             for i in range(nVar)])

        task.optimize()

        task.solutionsummary(mosek.streamtype.msg)

        prosta = task.getprosta(mosek.soltype.itg)
        solsta = task.getsolsta(mosek.soltype.itg)

        xx = mosek.array.zeros(nVar,float)
        task.getxx(mosek.soltype.itg,xx)

    if solsta not in [ mosek.solsta.integer_optimal,
                   mosek.solsta.near_integer_optimal ]:
        print "".join(mosekMsg)
        raise ValueError("Non optimal or infeasible.")
    else:
        return xx


def reps(secs,*args):
    start = time.time()
    while time.time() - start < secs:
        for i in range(100):
            mosekOptim(*args)


def main():
    with open("data.txt","r") as f:
        data = pickle.loads(f.read())

    args = (60,) + data

    pool = multiprocessing.Pool()
    jobs = []
    for i in range(multiprocessing.cpu_count()):
        jobs.append(pool.apply_async(reps,args=args))
    pool.close()
    pool.join()

if __name__ == "__main__":
    main()

代码解开我预先计算的数据。这些对象是线性规划的约束和系数。我有代码和这个数据文件托管在这个存储库中。

有没有其他人在 MOSEK 上遇到过这种行为?有关如何进行的任何建议?

4

1 回答 1

2

我联系了支持人员,他们建议设置MSK_IPAR_NUM_THREADS1. 我的问题需要几分之一秒的时间来解决,所以它看起来从来不像是在使用多个内核。应该检查文档的默认值。

在我的代码中,我在语句task.putintparam(mosek.iparam.num_threads,1)之后添加了。with这解决了问题。

于 2016-03-07T15:31:46.793 回答