我在MOSEK中遇到 CPU 亲和力和线性整数编程问题。我的程序使用 中的multiprocessing
模块并行化Python
,因此 MOSEK 在每个进程上同时运行。这台机器有 48 个内核,所以我使用Pool
该类运行 48 个并发进程。他们的文档声明API 是线程安全的。
启动程序后,以下是top
. 它表明大约 50% 的 CPU 处于空闲状态。仅显示顶部输出的前 20 行。
top - 22:04:42 up 5 days, 14:38, 3 users, load average: 10.67, 13.65, 6.29
Tasks: 613 total, 47 running, 566 sleeping, 0 stopped, 0 zombie
%Cpu(s): 46.3 us, 3.8 sy, 0.0 ni, 49.2 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
GiB Mem: 503.863 total, 101.613 used, 402.250 free, 0.482 buffers
GiB Swap: 61.035 total, 0.000 used, 61.035 free. 96.250 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
115517 njmeyer 20 0 171752 27912 11632 R 98.7 0.0 0:02.52 python
115522 njmeyer 20 0 171088 27472 11632 R 98.7 0.0 0:02.79 python
115547 njmeyer 20 0 171140 27460 11568 R 98.7 0.0 0:01.82 python
115550 njmeyer 20 0 171784 27880 11568 R 98.7 0.0 0:01.64 python
115540 njmeyer 20 0 171136 27456 11568 R 92.5 0.0 0:01.91 python
115551 njmeyer 20 0 371636 31100 11632 R 92.5 0.0 0:02.93 python
115539 njmeyer 20 0 171132 27452 11568 R 80.2 0.0 0:01.97 python
115515 njmeyer 20 0 171748 27908 11632 R 74.0 0.0 0:03.02 python
115538 njmeyer 20 0 171128 27512 11632 R 74.0 0.0 0:02.51 python
115558 njmeyer 20 0 171144 27528 11632 R 74.0 0.0 0:02.28 python
115554 njmeyer 20 0 527980 28728 11632 R 67.8 0.0 0:02.15 python
115524 njmeyer 20 0 527956 28676 11632 R 61.7 0.0 0:02.34 python
115526 njmeyer 20 0 527956 28704 11632 R 61.7 0.0 0:02.80 python
我检查了文档的MOSEK 参数部分,没有看到任何与 CPU 亲和性相关的内容。它们有一些与优化器中的多线程相关的标志。这些标志被设置off
为默认值,并且在冗余设置时off
没有任何变化。
我检查了正在运行的 python 作业的 cpu 亲和力,其中许多都绑定到同一个 cpu。但是,奇怪的部分是我无法设置 cpu 亲和力,或者至少在我更改它之后它似乎很快又被更改了。
我选择了其中一项工作并通过运行来设置 cpu 亲和力taskset -p 0xFFFFFFFFFFFF 115526
。我这样做了 10 次,中间间隔 1 秒。这是每次taskset
调用后的 cpu 亲和性掩码。
pid 115526's current affinity mask: 10
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 7
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 200000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
似乎某些东西在运行时不断改变 CPU 亲和力。
我也试过设置父进程的cpu亲和度,但效果一样。
这是我正在运行的代码。
import mosek
import sys
import cPickle as pickle
import multiprocessing
import time
def mosekOptim(aCols,aVals,b,c,nCon,nVar,numTrt):
"""Solve the linear integer program.
Solve the program
max c' x
s.t. Ax <= b
"""
## setup mosek
with mosek.Env() as env, env.Task() as task:
task.appendcons(nCon)
task.appendvars(nVar)
inf = float("inf")
## c
for j,cj in enumerate(c):
task.putcj(j,cj)
## bounds on A
bkc = [mosek.boundkey.fx] + [mosek.boundkey.up
for i in range(nCon-1)]
blc = [float(numTrt)] + [-inf for i in range(nCon-1)]
buc = b
## bounds on x
bkx = [mosek.boundkey.ra for i in range(nVar)]
blx = [0.0]*nVar
bux = [1.0]*nVar
for j,a in enumerate(zip(aCols,aVals)):
task.putarow(j,a[0],a[1])
for j,bc in enumerate(zip(bkc,blc,buc)):
task.putconbound(j,bc[0],bc[1],bc[2])
for j,bx in enumerate(zip(bkx,blx,bux)):
task.putvarbound(j,bx[0],bx[1],bx[2])
task.putobjsense(mosek.objsense.maximize)
## integer type
task.putvartypelist(range(nVar),
[mosek.variabletype.type_int
for i in range(nVar)])
task.optimize()
task.solutionsummary(mosek.streamtype.msg)
prosta = task.getprosta(mosek.soltype.itg)
solsta = task.getsolsta(mosek.soltype.itg)
xx = mosek.array.zeros(nVar,float)
task.getxx(mosek.soltype.itg,xx)
if solsta not in [ mosek.solsta.integer_optimal,
mosek.solsta.near_integer_optimal ]:
print "".join(mosekMsg)
raise ValueError("Non optimal or infeasible.")
else:
return xx
def reps(secs,*args):
start = time.time()
while time.time() - start < secs:
for i in range(100):
mosekOptim(*args)
def main():
with open("data.txt","r") as f:
data = pickle.loads(f.read())
args = (60,) + data
pool = multiprocessing.Pool()
jobs = []
for i in range(multiprocessing.cpu_count()):
jobs.append(pool.apply_async(reps,args=args))
pool.close()
pool.join()
if __name__ == "__main__":
main()
代码解开我预先计算的数据。这些对象是线性规划的约束和系数。我有代码和这个数据文件托管在这个存储库中。
有没有其他人在 MOSEK 上遇到过这种行为?有关如何进行的任何建议?