我正在开发一个带有 tensorrt 和 python 的项目。不用担心这个问题可能不涉及任何 gpu 计算。我觉得在这种情况下,我们可以简单地将 tensorrt 视为 numpy,因为两个包在离开 python 时都会丢弃 GIL。我已经做了很多工作来使 tensorrt 部分正常工作。希望这个问题与 tensorrt 没有太大关系。首先,多线程确实适用于 tensorrt + python,因为 tensorrt 在执行它的主要执行功能时会丢弃 GIL。但我仍然发现在某些情况下它不起作用。仍然有并行工作,但每个线程都有很大的开销。它仅在执行时间足够长时才有效,因此开销可以忽略不计。虽然可能是正确的选择是切换到 c++,但我真的想进一步优化它并坚持使用 python。我设置了一个实验,看看当子线程数只有 1 时有多少开销。我使用 yappi 来分析实验。这是一些代码,这不是我的真实代码,但应该足够了。这里我没有展示run_trt_execution函数,因为我觉得没有必要,如果有人觉得你需要了解更多关于tensorrt的知识来回答我的问题,我也很乐意分享我对它的小知识。
yappi.start()
run_trt_execution(args)
yappi.stop()
yappi.start()
cur_thread = threading.Thread(target=run_trt_execution,
args = args)
cur_thread.start()
cur_thread.join()
yappi.stop()
这是配置,因为 yappi 记录了 cpu 和挂墙时间,所以我做了实验以获得两种设置的两次这里是可能太长的日志:
cpu multithread profile ______________________________+++++++++++++++++++++++++++++++++++++++==
name ncall tsub ttot tavg
..hon3.8/threading.py:859 Thread.run 1 0.000005 0.008511 0.008511
..tc_multithread.py:44 run_inference 1 0.006554 0.008506 0.008506
..da/cuda.py:244 DeviceArray.copy_to 1 0.000026 0.001400 0.001400
..rter.py:159 LazyModule.__getattr__ 5 0.000014 0.001319 0.000264
..aphy/mod/importer.py:90 import_mod 5 0.000035 0.001301 0.000260
..r/logger.py:375 Logger.module_info 5 0.000024 0.000893 0.000179
..y:360 Logger._str_from_module_info 5 0.000018 0.000819 0.000164
..hy/logger/logger.py:363 try_append 15 0.000023 0.000802 0.000053
..aphy/logger/logger.py:372 <lambda> 5 0.000017 0.000753 0.000151
..ython3.8/posixpath.py:387 realpath 5 0.000020 0.000726 0.000145
..uda/cuda.py:237 DeviceArray.nbytes 2 0.000013 0.000616 0.000308
..3.8/posixpath.py:396 _joinrealpath 5 0.000093 0.000564 0.000113
../cuda.py:368 DeviceArray.copy_from 1 0.000022 0.000521 0.000521
..raphy/cuda/cuda.py:118 Cuda.memcpy 2 0.000507 0.000512 0.000256
..tlib/__init__.py:109 import_module 5 0.000016 0.000326 0.000065
..rtlib._bootstrap>:1002 _gcd_import 5 0.000013 0.000307 0.000061
..lib._bootstrap>:986 _find_and_load 5 0.000045 0.000283 0.000057
..uda/cuda.py:352 DeviceArray.resize 1 0.000006 0.000281 0.000281
..lib/python3.8/posixpath.py:71 join 30 0.000115 0.000211 0.000007
../python3.8/posixpath.py:164 islink 30 0.000066 0.000209 0.000007
..python3.8/posixpath.py:372 abspath 5 0.000015 0.000140 0.000028
..ython3.8/posixpath.py:334 normpath 5 0.000057 0.000097 0.000019
..b._bootstrap>:157 _get_module_lock 10 0.000045 0.000096 0.000010
..bootstrap>:194 _lock_unlock_module 5 0.000015 0.000083 0.000017
..>:147 _ModuleLockManager.__enter__ 5 0.000013 0.000082 0.000016
..._bootstrap>:1017 _handle_fromlist 15 0.000048 0.000074 0.000005
..python3.8/posixpath.py:41 _get_sep 40 0.000043 0.000064 0.000002
..ib/python3.8/posixpath.py:60 isabs 10 0.000026 0.000053 0.000005
..hy/logger/logger.py:207 Logger.log 5 0.000018 0.000049 0.000010
<frozen importlib._bootstrap>:176 cb 10 0.000027 0.000043 0.000004
..bootstrap>:58 _ModuleLock.__init__ 10 0.000022 0.000033 0.000003
.._bootstrap>:78 _ModuleLock.acquire 10 0.000024 0.000029 0.000003
..bootstrap>:103 _ModuleLock.release 10 0.000020 0.000025 0.000003
..p>:151 _ModuleLockManager.__exit__ 5 0.000008 0.000022 0.000004
..aphy/logger/logger.py:370 <lambda> 5 0.000008 0.000016 0.000003
..tlib._bootstrap>:937 _sanity_check 5 0.000008 0.000011 0.000002
..aphy/logger/logger.py:371 <lambda> 5 0.000006 0.000010 0.000002
../_internal.py:250 _ctypes.__init__ 2 0.000005 0.000005 0.000002
..p>:143 _ModuleLockManager.__init__ 5 0.000004 0.000004 0.000001
..polygraphy/util/util.py:499 volume 3 0.000004 0.000004 0.000001
..hy/logger/logger.py:276 should_log 5 0.000004 0.000004 0.000001
..0 DeviceArray._check_dtype_matches 2 0.000003 0.000003 0.000002
..olygraphy/cuda/cuda.py:24 void_ptr 4 0.000003 0.000003 0.000001
..ygraphy/cuda/cuda.py:59 Cuda.check 2 0.000002 0.000002 0.000001
..core/_internal.py:304 _ctypes.data 2 0.000002 0.000002 0.000001
..olygraphy/cuda/cuda.py:149 wrapper 2 0.000001 0.000001 0.000001
../cuda.py:203 try_get_stream_handle 2 0.000001 0.000001 0.000001
Function stats for (_MainThread) (0)
Clock type: CPU
Ordered by: totaltime, desc
name ncall tsub ttot tavg
..hon3.8/threading.py:859 Thread.run 1 0.000005 0.008511 0.008511
..tc_multithread.py:44 run_inference 1 0.006554 0.008506 0.008506
..da/cuda.py:244 DeviceArray.copy_to 1 0.000026 0.001400 0.001400
..rter.py:159 LazyModule.__getattr__ 5 0.000014 0.001319 0.000264
..aphy/mod/importer.py:90 import_mod 5 0.000035 0.001301 0.000260
..r/logger.py:375 Logger.module_info 5 0.000024 0.000893 0.000179
..y:360 Logger._str_from_module_info 5 0.000018 0.000819 0.000164
..hy/logger/logger.py:363 try_append 15 0.000023 0.000802 0.000053
..aphy/logger/logger.py:372 <lambda> 5 0.000017 0.000753 0.000151
..ython3.8/posixpath.py:387 realpath 5 0.000020 0.000726 0.000145
..uda/cuda.py:237 DeviceArray.nbytes 2 0.000013 0.000616 0.000308
..3.8/posixpath.py:396 _joinrealpath 5 0.000093 0.000564 0.000113
../cuda.py:368 DeviceArray.copy_from 1 0.000022 0.000521 0.000521
..raphy/cuda/cuda.py:118 Cuda.memcpy 2 0.000507 0.000512 0.000256
..tlib/__init__.py:109 import_module 5 0.000016 0.000326 0.000065
..rtlib._bootstrap>:1002 _gcd_import 5 0.000013 0.000307 0.000061
..lib._bootstrap>:986 _find_and_load 5 0.000045 0.000283 0.000057
..uda/cuda.py:352 DeviceArray.resize 1 0.000006 0.000281 0.000281
..lib/python3.8/posixpath.py:71 join 30 0.000115 0.000211 0.000007
../python3.8/posixpath.py:164 islink 30 0.000066 0.000209 0.000007
..python3.8/posixpath.py:372 abspath 5 0.000015 0.000140 0.000028
..n3.8/threading.py:834 Thread.start 1 0.000011 0.000099 0.000099
..ython3.8/posixpath.py:334 normpath 5 0.000057 0.000097 0.000019
..b._bootstrap>:157 _get_module_lock 10 0.000045 0.000096 0.000010
..bootstrap>:194 _lock_unlock_module 5 0.000015 0.000083 0.000017
..>:147 _ModuleLockManager.__enter__ 5 0.000013 0.000082 0.000016
..._bootstrap>:1017 _handle_fromlist 15 0.000048 0.000074 0.000005
..python3.8/posixpath.py:41 _get_sep 40 0.000043 0.000064 0.000002
..on3.8/threading.py:979 Thread.join 1 0.000006 0.000054 0.000054
..hon3.8/threading.py:540 Event.wait 1 0.000008 0.000053 0.000053
..ib/python3.8/posixpath.py:60 isabs 10 0.000026 0.000053 0.000005
..hy/logger/logger.py:207 Logger.log 5 0.000018 0.000049 0.000010
..:1017 Thread._wait_for_tstate_lock 1 0.000012 0.000046 0.000046
<frozen importlib._bootstrap>:176 cb 10 0.000027 0.000043 0.000004
...8/threading.py:270 Condition.wait 1 0.000012 0.000038 0.000038
..8/threading.py:761 Thread.__init__ 1 0.000015 0.000036 0.000036
..bootstrap>:58 _ModuleLock.__init__ 10 0.000022 0.000033 0.000003
.._bootstrap>:78 _ModuleLock.acquire 10 0.000024 0.000029 0.000003
..bootstrap>:103 _ModuleLock.release 10 0.000020 0.000025 0.000003
..p>:151 _ModuleLockManager.__exit__ 5 0.000008 0.000022 0.000004
..n3.8/threading.py:944 Thread._stop 1 0.000016 0.000021 0.000021
..aphy/logger/logger.py:370 <lambda> 5 0.000008 0.000016 0.000003
..tlib._bootstrap>:937 _sanity_check 5 0.000008 0.000011 0.000002
..aphy/logger/logger.py:371 <lambda> 5 0.000006 0.000010 0.000002
...8/threading.py:505 Event.__init__ 1 0.000004 0.000009 0.000009
..ing.py:255 Condition._release_save 1 0.000007 0.000008 0.000008
../_internal.py:250 _ctypes.__init__ 2 0.000005 0.000005 0.000002
..n3.8/_weakrefset.py:81 WeakSet.add 1 0.000004 0.000005 0.000005
..8/threading.py:1306 current_thread 2 0.000004 0.000005 0.000002
..reading.py:246 Condition.__enter__ 1 0.000003 0.000004 0.000004
..p>:143 _ModuleLockManager.__init__ 5 0.000004 0.000004 0.000001
..polygraphy/util/util.py:499 volume 3 0.000004 0.000004 0.000001
..hreading.py:222 Condition.__init__ 1 0.000004 0.000004 0.000004
..hy/logger/logger.py:276 should_log 5 0.000004 0.000004 0.000001
..0 DeviceArray._check_dtype_matches 2 0.000003 0.000003 0.000002
..reading.py:1095 _MainThread.daemon 2 0.000003 0.000003 0.000001
..hreading.py:249 Condition.__exit__ 1 0.000002 0.000003 0.000003
..ython3.8/_weakrefset.py:38 _remove 1 0.000002 0.000003 0.000003
..olygraphy/cuda/cuda.py:24 void_ptr 4 0.000003 0.000003 0.000001
..reading.py:261 Condition._is_owned 1 0.000002 0.000003 0.000003
..ython3.8/threading.py:734 _newname 1 0.000003 0.000003 0.000003
...py:258 Condition._acquire_restore 1 0.000001 0.000002 0.000002
..ygraphy/cuda/cuda.py:59 Cuda.check 2 0.000002 0.000002 0.000001
..core/_internal.py:304 _ctypes.data 2 0.000002 0.000002 0.000001
..ng.py:1177 _make_invoke_excepthook 1 0.000001 0.000001 0.000001
..olygraphy/cuda/cuda.py:149 wrapper 2 0.000001 0.000001 0.000001
../cuda.py:203 try_get_stream_handle 2 0.000001 0.000001 0.000001
..n3.8/threading.py:513 Event.is_set 2 0.000001 0.000001 0.000000
wall multithread profile________________________________++++++++++++++++++++++++++++++++==
Clock type: WALL
Ordered by: totaltime, desc
name ncall tsub ttot tavg
..hon3.8/threading.py:859 Thread.run 1 0.000005 0.009640 0.009640
..tc_multithread.py:44 run_inference 1 0.008012 0.009635 0.009635
..on3.8/threading.py:979 Thread.join 1 0.000004 0.009415 0.009415
..:1017 Thread._wait_for_tstate_lock 1 0.000009 0.009408 0.009408
..rter.py:159 LazyModule.__getattr__ 5 0.000011 0.000975 0.000195
..aphy/mod/importer.py:90 import_mod 5 0.000030 0.000963 0.000193
..da/cuda.py:244 DeviceArray.copy_to 1 0.000022 0.000955 0.000955
..r/logger.py:375 Logger.module_info 5 0.000022 0.000681 0.000136
../cuda.py:368 DeviceArray.copy_from 1 0.000032 0.000628 0.000628
..y:360 Logger._str_from_module_info 5 0.000016 0.000623 0.000125
..hy/logger/logger.py:363 try_append 15 0.000010 0.000607 0.000040
..aphy/logger/logger.py:372 <lambda> 5 0.000016 0.000577 0.000115
..ython3.8/posixpath.py:387 realpath 5 0.000013 0.000550 0.000110
..raphy/cuda/cuda.py:118 Cuda.memcpy 2 0.000513 0.000518 0.000259
..3.8/posixpath.py:396 _joinrealpath 5 0.000062 0.000447 0.000089
..uda/cuda.py:352 DeviceArray.resize 1 0.000008 0.000365 0.000365
..n3.8/threading.py:834 Thread.start 1 0.000015 0.000328 0.000328
..uda/cuda.py:237 DeviceArray.nbytes 2 0.000009 0.000319 0.000160
..hon3.8/threading.py:540 Event.wait 1 0.000009 0.000277 0.000277
...8/threading.py:270 Condition.wait 1 0.000016 0.000260 0.000260
../python3.8/posixpath.py:164 islink 30 0.000038 0.000227 0.000008
..tlib/__init__.py:109 import_module 5 0.000012 0.000221 0.000044
..rtlib._bootstrap>:1002 _gcd_import 5 0.000010 0.000207 0.000041
..lib._bootstrap>:986 _find_and_load 5 0.000033 0.000191 0.000038
..lib/python3.8/posixpath.py:71 join 30 0.000074 0.000126 0.000004
..python3.8/posixpath.py:372 abspath 5 0.000008 0.000087 0.000017
..b._bootstrap>:157 _get_module_lock 10 0.000040 0.000067 0.000007
..>:147 _ModuleLockManager.__enter__ 5 0.000009 0.000065 0.000013
..ython3.8/posixpath.py:334 normpath 5 0.000042 0.000062 0.000012
..bootstrap>:194 _lock_unlock_module 5 0.000009 0.000049 0.000010
..._bootstrap>:1017 _handle_fromlist 15 0.000034 0.000047 0.000003
..8/threading.py:761 Thread.__init__ 1 0.000018 0.000042 0.000042
..python3.8/posixpath.py:41 _get_sep 40 0.000027 0.000037 0.000001
..hy/logger/logger.py:207 Logger.log 5 0.000016 0.000036 0.000007
..ib/python3.8/posixpath.py:60 isabs 10 0.000018 0.000030 0.000003
<frozen importlib._bootstrap>:176 cb 10 0.000018 0.000025 0.000002
.._bootstrap>:78 _ModuleLock.acquire 10 0.000018 0.000023 0.000002
..bootstrap>:58 _ModuleLock.__init__ 10 0.000017 0.000022 0.000002
..bootstrap>:103 _ModuleLock.release 10 0.000014 0.000016 0.000002
..p>:151 _ModuleLockManager.__exit__ 5 0.000006 0.000016 0.000003
..aphy/logger/logger.py:370 <lambda> 5 0.000006 0.000013 0.000003
...8/threading.py:505 Event.__init__ 1 0.000006 0.000011 0.000011
..n3.8/threading.py:944 Thread._stop 1 0.000006 0.000008 0.000008
..aphy/logger/logger.py:371 <lambda> 5 0.000006 0.000007 0.000001
../_internal.py:250 _ctypes.__init__ 2 0.000006 0.000006 0.000003
..tlib._bootstrap>:937 _sanity_check 5 0.000005 0.000006 0.000001
..n3.8/_weakrefset.py:81 WeakSet.add 1 0.000005 0.000005 0.000005
..hreading.py:222 Condition.__init__ 1 0.000005 0.000005 0.000005
..reading.py:246 Condition.__enter__ 1 0.000004 0.000005 0.000005
..8/threading.py:1306 current_thread 2 0.000004 0.000005 0.000002
..hy/logger/logger.py:276 should_log 5 0.000004 0.000004 0.000001
..ython3.8/_weakrefset.py:38 _remove 1 0.000004 0.000004 0.000004
..olygraphy/cuda/cuda.py:24 void_ptr 4 0.000003 0.000003 0.000001
..polygraphy/util/util.py:499 volume 3 0.000003 0.000003 0.000001
..0 DeviceArray._check_dtype_matches 2 0.000003 0.000003 0.000002
..p>:143 _ModuleLockManager.__init__ 5 0.000003 0.000003 0.000001
..ing.py:255 Condition._release_save 1 0.000003 0.000003 0.000003
..reading.py:261 Condition._is_owned 1 0.000002 0.000003 0.000003
..hreading.py:249 Condition.__exit__ 1 0.000003 0.000003 0.000003
..ython3.8/threading.py:734 _newname 1 0.000003 0.000003 0.000003
..ygraphy/cuda/cuda.py:59 Cuda.check 2 0.000002 0.000002 0.000001
...py:258 Condition._acquire_restore 1 0.000001 0.000002 0.000002
..reading.py:1095 _MainThread.daemon 2 0.000002 0.000002 0.000001
..core/_internal.py:304 _ctypes.data 2 0.000001 0.000001 0.000000
..olygraphy/cuda/cuda.py:149 wrapper 2 0.000001 0.000001 0.000000
..n3.8/threading.py:513 Event.is_set 2 0.000001 0.000001 0.000000
..ng.py:1177 _make_invoke_excepthook 1 0.000001 0.000001 0.000001
../cuda.py:203 try_get_stream_handle 2 0.000000 0.000000 0.000000
Function stats for (Thread) (1)
Clock type: WALL
Ordered by: totaltime, desc
name ncall tsub ttot tavg
..hon3.8/threading.py:859 Thread.run 1 0.000005 0.009640 0.009640
..tc_multithread.py:44 run_inference 1 0.008012 0.009635 0.009635
..rter.py:159 LazyModule.__getattr__ 5 0.000011 0.000975 0.000195
..aphy/mod/importer.py:90 import_mod 5 0.000030 0.000963 0.000193
..da/cuda.py:244 DeviceArray.copy_to 1 0.000022 0.000955 0.000955
..r/logger.py:375 Logger.module_info 5 0.000022 0.000681 0.000136
../cuda.py:368 DeviceArray.copy_from 1 0.000032 0.000628 0.000628
..y:360 Logger._str_from_module_info 5 0.000016 0.000623 0.000125
..hy/logger/logger.py:363 try_append 15 0.000010 0.000607 0.000040
..aphy/logger/logger.py:372 <lambda> 5 0.000016 0.000577 0.000115
..ython3.8/posixpath.py:387 realpath 5 0.000013 0.000550 0.000110
..raphy/cuda/cuda.py:118 Cuda.memcpy 2 0.000513 0.000518 0.000259
..3.8/posixpath.py:396 _joinrealpath 5 0.000062 0.000447 0.000089
..uda/cuda.py:352 DeviceArray.resize 1 0.000008 0.000365 0.000365
..uda/cuda.py:237 DeviceArray.nbytes 2 0.000009 0.000319 0.000160
../python3.8/posixpath.py:164 islink 30 0.000038 0.000227 0.000008
..tlib/__init__.py:109 import_module 5 0.000012 0.000221 0.000044
..rtlib._bootstrap>:1002 _gcd_import 5 0.000010 0.000207 0.000041
..lib._bootstrap>:986 _find_and_load 5 0.000033 0.000191 0.000038
..lib/python3.8/posixpath.py:71 join 30 0.000074 0.000126 0.000004
..python3.8/posixpath.py:372 abspath 5 0.000008 0.000087 0.000017
..b._bootstrap>:157 _get_module_lock 10 0.000040 0.000067 0.000007
..>:147 _ModuleLockManager.__enter__ 5 0.000009 0.000065 0.000013
..ython3.8/posixpath.py:334 normpath 5 0.000042 0.000062 0.000012
..bootstrap>:194 _lock_unlock_module 5 0.000009 0.000049 0.000010
..._bootstrap>:1017 _handle_fromlist 15 0.000034 0.000047 0.000003
..python3.8/posixpath.py:41 _get_sep 40 0.000027 0.000037 0.000001
..hy/logger/logger.py:207 Logger.log 5 0.000016 0.000036 0.000007
..ib/python3.8/posixpath.py:60 isabs 10 0.000018 0.000030 0.000003
<frozen importlib._bootstrap>:176 cb 10 0.000018 0.000025 0.000002
.._bootstrap>:78 _ModuleLock.acquire 10 0.000018 0.000023 0.000002
..bootstrap>:58 _ModuleLock.__init__ 10 0.000017 0.000022 0.000002
..bootstrap>:103 _ModuleLock.release 10 0.000014 0.000016 0.000002
..p>:151 _ModuleLockManager.__exit__ 5 0.000006 0.000016 0.000003
..aphy/logger/logger.py:370 <lambda> 5 0.000006 0.000013 0.000003
..aphy/logger/logger.py:371 <lambda> 5 0.000006 0.000007 0.000001
../_internal.py:250 _ctypes.__init__ 2 0.000006 0.000006 0.000003
..tlib._bootstrap>:937 _sanity_check 5 0.000005 0.000006 0.000001
..hy/logger/logger.py:276 should_log 5 0.000004 0.000004 0.000001
..olygraphy/cuda/cuda.py:24 void_ptr 4 0.000003 0.000003 0.000001
..polygraphy/util/util.py:499 volume 3 0.000003 0.000003 0.000001
..0 DeviceArray._check_dtype_matches 2 0.000003 0.000003 0.000002
..p>:143 _ModuleLockManager.__init__ 5 0.000003 0.000003 0.000001
..ygraphy/cuda/cuda.py:59 Cuda.check 2 0.000002 0.000002 0.000001
..core/_internal.py:304 _ctypes.data 2 0.000001 0.000001 0.000000
..olygraphy/cuda/cuda.py:149 wrapper 2 0.000001 0.000001 0.000000
../cuda.py:203 try_get_stream_handle 2 0.000000 0.000000 0.000000
one thread cpu time ____________________________++++++++++++++++++++++++++++++++++++++++++++++++++++
name ncall tsub ttot tavg
..tc_multithread.py:44 run_inference 1 0.005786 0.007297 0.007297
..da/cuda.py:244 DeviceArray.copy_to 1 0.000016 0.001010 0.001010
..rter.py:159 LazyModule.__getattr__ 5 0.000010 0.000940 0.000188
..aphy/mod/importer.py:90 import_mod 5 0.000025 0.000928 0.000186
..r/logger.py:375 Logger.module_info 5 0.000018 0.000621 0.000124
..y:360 Logger._str_from_module_info 5 0.000014 0.000568 0.000114
..hy/logger/logger.py:363 try_append 15 0.000018 0.000555 0.000037
..aphy/logger/logger.py:372 <lambda> 5 0.000013 0.000517 0.000103
..ython3.8/posixpath.py:387 realpath 5 0.000014 0.000497 0.000099
..raphy/cuda/cuda.py:118 Cuda.memcpy 2 0.000475 0.000479 0.000239
../cuda.py:368 DeviceArray.copy_from 1 0.000021 0.000474 0.000474
..3.8/posixpath.py:396 _joinrealpath 5 0.000068 0.000383 0.000077
..uda/cuda.py:237 DeviceArray.nbytes 2 0.000007 0.000377 0.000188
..tlib/__init__.py:109 import_module 5 0.000012 0.000250 0.000050
..uda/cuda.py:352 DeviceArray.resize 1 0.000006 0.000247 0.000247
..rtlib._bootstrap>:1002 _gcd_import 5 0.000016 0.000236 0.000047
..lib._bootstrap>:986 _find_and_load 5 0.000035 0.000213 0.000043
..lib/python3.8/posixpath.py:71 join 30 0.000081 0.000150 0.000005
../python3.8/posixpath.py:164 islink 30 0.000048 0.000129 0.000004
..python3.8/posixpath.py:372 abspath 5 0.000011 0.000097 0.000019
..ython3.8/posixpath.py:334 normpath 5 0.000040 0.000066 0.000013
..>:147 _ModuleLockManager.__enter__ 5 0.000012 0.000066 0.000013
..b._bootstrap>:157 _get_module_lock 10 0.000033 0.000065 0.000006
..bootstrap>:194 _lock_unlock_module 5 0.000011 0.000057 0.000011
..._bootstrap>:1017 _handle_fromlist 15 0.000032 0.000050 0.000003
..python3.8/posixpath.py:41 _get_sep 40 0.000031 0.000046 0.000001
..ib/python3.8/posixpath.py:60 isabs 10 0.000019 0.000038 0.000004
..hy/logger/logger.py:207 Logger.log 5 0.000013 0.000035 0.000007
<frozen importlib._bootstrap>:176 cb 10 0.000021 0.000033 0.000003
.._bootstrap>:78 _ModuleLock.acquire 10 0.000022 0.000026 0.000003
..bootstrap>:58 _ModuleLock.__init__ 10 0.000016 0.000024 0.000002
..bootstrap>:103 _ModuleLock.release 10 0.000016 0.000020 0.000002
..p>:151 _ModuleLockManager.__exit__ 5 0.000007 0.000018 0.000004
..aphy/logger/logger.py:370 <lambda> 5 0.000007 0.000012 0.000002
..uda/cuda.py:196 Stream.synchronize 1 0.000003 0.000008 0.000008
..aphy/logger/logger.py:371 <lambda> 5 0.000005 0.000008 0.000002
..tlib._bootstrap>:937 _sanity_check 5 0.000005 0.000007 0.000001
..cuda.py:76 Cuda.stream_synchronize 1 0.000004 0.000005 0.000005
../_internal.py:250 _ctypes.__init__ 2 0.000004 0.000004 0.000002
..p>:143 _ModuleLockManager.__init__ 5 0.000003 0.000003 0.000001
..olygraphy/cuda/cuda.py:24 void_ptr 5 0.000003 0.000003 0.000001
..hy/logger/logger.py:276 should_log 5 0.000003 0.000003 0.000001
..polygraphy/util/util.py:499 volume 3 0.000002 0.000002 0.000001
..0 DeviceArray._check_dtype_matches 2 0.000002 0.000002 0.000001
..ygraphy/cuda/cuda.py:59 Cuda.check 3 0.000002 0.000002 0.000001
..olygraphy/cuda/cuda.py:149 wrapper 3 0.000001 0.000001 0.000000
..core/_internal.py:304 _ctypes.data 2 0.000001 0.000001 0.000001
../cuda.py:203 try_get_stream_handle 2 0.000001 0.000001 0.000000
one thread wall time__________________++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Clock type: WALL
Ordered by: totaltime, desc
name ncall tsub ttot tavg
..tc_multithread.py:44 run_inference 1 0.005838 0.006989 0.006989
..da/cuda.py:244 DeviceArray.copy_to 1 0.000013 0.000788 0.000788
..rter.py:159 LazyModule.__getattr__ 5 0.000005 0.000601 0.000120
..aphy/mod/importer.py:90 import_mod 5 0.000020 0.000593 0.000119
..raphy/cuda/cuda.py:118 Cuda.memcpy 2 0.000470 0.000472 0.000236
..r/logger.py:375 Logger.module_info 5 0.000013 0.000389 0.000078
..y:360 Logger._str_from_module_info 5 0.000010 0.000355 0.000071
..hy/logger/logger.py:363 try_append 15 0.000008 0.000345 0.000023
../cuda.py:368 DeviceArray.copy_from 1 0.000022 0.000340 0.000340
..aphy/logger/logger.py:372 <lambda> 5 0.000010 0.000325 0.000065
..ython3.8/posixpath.py:387 realpath 5 0.000011 0.000309 0.000062
..3.8/posixpath.py:396 _joinrealpath 5 0.000043 0.000244 0.000049
..uda/cuda.py:237 DeviceArray.nbytes 2 0.000006 0.000227 0.000113
..uda/cuda.py:352 DeviceArray.resize 1 0.000006 0.000184 0.000184
..tlib/__init__.py:109 import_module 5 0.000006 0.000167 0.000033
..rtlib._bootstrap>:1002 _gcd_import 5 0.000010 0.000160 0.000032
..lib._bootstrap>:986 _find_and_load 5 0.000039 0.000149 0.000030
../python3.8/posixpath.py:164 islink 30 0.000027 0.000097 0.000003
..lib/python3.8/posixpath.py:71 join 30 0.000050 0.000078 0.000003
..python3.8/posixpath.py:372 abspath 5 0.000004 0.000052 0.000010
..>:147 _ModuleLockManager.__enter__ 5 0.000008 0.000045 0.000009
..b._bootstrap>:157 _get_module_lock 10 0.000023 0.000042 0.000004
..ython3.8/posixpath.py:334 normpath 5 0.000027 0.000037 0.000007
..bootstrap>:194 _lock_unlock_module 5 0.000006 0.000033 0.000007
..._bootstrap>:1017 _handle_fromlist 15 0.000018 0.000027 0.000002
..ib/python3.8/posixpath.py:60 isabs 10 0.000012 0.000021 0.000002
..python3.8/posixpath.py:41 _get_sep 40 0.000016 0.000021 0.000001
..hy/logger/logger.py:207 Logger.log 5 0.000009 0.000021 0.000004
.._bootstrap>:78 _ModuleLock.acquire 10 0.000015 0.000017 0.000002
..bootstrap>:58 _ModuleLock.__init__ 10 0.000012 0.000016 0.000002
<frozen importlib._bootstrap>:176 cb 10 0.000009 0.000015 0.000001
..bootstrap>:103 _ModuleLock.release 10 0.000010 0.000013 0.000001
..p>:151 _ModuleLockManager.__exit__ 5 0.000004 0.000012 0.000002
..aphy/logger/logger.py:370 <lambda> 5 0.000002 0.000007 0.000001
..aphy/logger/logger.py:371 <lambda> 5 0.000004 0.000005 0.000001
..uda/cuda.py:196 Stream.synchronize 1 0.000001 0.000005 0.000005
..cuda.py:76 Cuda.stream_synchronize 1 0.000004 0.000004 0.000004
..p>:143 _ModuleLockManager.__init__ 5 0.000003 0.000003 0.000001
../_internal.py:250 _ctypes.__init__ 2 0.000002 0.000002 0.000001
..olygraphy/cuda/cuda.py:24 void_ptr 5 0.000002 0.000002 0.000000
..hy/logger/logger.py:276 should_log 5 0.000002 0.000002 0.000000
..polygraphy/util/util.py:499 volume 3 0.000001 0.000001 0.000000
..0 DeviceArray._check_dtype_matches 2 0.000001 0.000001 0.000000
..tlib._bootstrap>:937 _sanity_check 5 0.000001 0.000001 0.000000
..ygraphy/cuda/cuda.py:59 Cuda.check 3 0.000000 0.000000 0.000000
..core/_internal.py:304 _ctypes.data 2 0.000000 0.000000 0.000000
../cuda.py:203 try_get_stream_handle 2 0.000000 0.000000 0.000000
..olygraphy/cuda/cuda.py:149 wrapper 3 0.000000 0.000000 0.000000
您可以看到线程版本的所有内容都较慢。我只是想知道为什么会这样。