我已经使用 pgmpy 库训练了一个贝叶斯网络。我希望找到一个新事件的联合概率(作为给定其父母的每个变量的概率的乘积,如果有的话)。
目前我正在做
infer = VariableElimination(model)
evidence = dict(x_test.iloc[0])
result = infer.query(variables=[], evidence=evidence, joint=True)
print(result)
这x_test
是测试数据框。
这result
是非常大的输出,包含所有训练数据及其概率的组合。
+----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+------------------------------------------+-----------------+---------------------------+-----------------------------------------+------------------------------+------------------------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| data_devicetype | data_username | data_applicationtype | event_type | servicename | data_applicationname | tenantname | data_origin | geoip_country_name | phi(data_devicetype,data_username,data_applicationtype,event_type,servicename,data_applicationname,tenantname,data_origin,geoip_country_name) |
+==============================================================================================================================================+====================================+==========================================+=================+===========================+=========================================+==============================+========================+===========================+=================================================================================================================================================+
| data_devicetype(Mozilla_5_0_Windows_NT_10_0_Win64_x64_AppleWebKit_537_36_KHTML_like_Gecko_Chrome_94_0_4606_81_Safari_537_36) | data_username(christofer) | data_applicationtype(Custom_Application) | event_type(sso) | servicename(saml_runtime) | data_applicationname(GD) | tenantname(amx-sni-ksll0) | data_origin(1_0_64_66) | geoip_country_name(Japan) | 0.0326 |
+----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+------------------------------------------+-----------------+---------------------------+-----------------------------------------+------------------------------+------------------------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| data_devicetype(Mozilla_5_0_Windows_NT_10_0_Win64_x64_AppleWebKit_537_36_KHTML_like_Gecko_Chrome_94_0_4606_81_Safari_537_36) | data_username(marty) | data_applicationtype(Custom_Application) | event_type(sso) | servicename(saml_runtime) | data_applicationname(VAULT) | tenantname(login_pqr_com) | data_origin(1_0_64_66) | geoip_country_name(Japan) | 0.0156 |
+----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+------------------------------------------+-----------------+---------------------------+-----------------------------------------+------------------------------+------------------------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| data_devicetype(Mozilla_5_0_Windows_NT_10_0_Win64_x64_AppleWebKit_537_36_KHTML_like_Gecko_Chrome_94_0_4606_81_Safari_537_36) | data_username(lincon) | data_applicationtype(Custom_Application) | event_type(sso) | servicename(saml_runtime) | data_applicationname(apps_think4ch_com) | tenantname(login_abc_com) | data_origin(1_0_64_66) | geoip_country_name(Japan) | 0.0113 |
......contd
请帮助我了解如何找出新事件的概率(即测试数据中的一行)。概率表达式为P(data_devicetype, data_username, data_applicationtype, event_type, servicename, data_applicationname, tenantname, data_origin, geoip_country_name)