machine-learning - 使用条件随机场进行序列学习？

Question

我是顺序学习（和机器学习）的新手，我正在尝试了解如何使用条件随机场来解决我的问题。

我有一个数据集，它是我的应用程序的最终用户何时何地工作的顺序日志。例如，以下数据集将只有 User1 的值

User   Facility   Weekday
User1  FacilityA  Monday
User1  FacilityB  Tuesday
User1  FacilityC  Wednesday
 ...     ...         ...

我正在尝试解决以下问题：给定用户工作的工作日和设施，他们接下来将在哪个设施和工作日工作？

为了解决这个问题，我开始研究 Conditional Random Fields，但是对于任何库都很难使用它。

我尝试使用以下库： 1. PyStruct ( https://pystruct.github.io/ ) 但由于这个问题，这对我不起作用：Index out of bounds: Fitting SSVM using Pystruct

CRFSuite（http://www.chokkan.org/software/crfsuite/）（这依赖于 libBFGS。当我在我的 ubuntu 机器上安装 libbfgs 时没有任何错误，为 CRFSuite 运行“make install”仍然失败并说它无法识别 libBFGS）

所以我转向另一个图书馆.. 3. CRF++ ( https://taku910.github.io/crfpp/ )

我能够安装 CRF++ 并且还能够运行他们发行版中给出的示例。但是，我需要一些帮助来了解如何修改模板文件以适合我的用例......

另外，我在想我的标签将是来自上述数据集的设施+工作日的串联字符串。

我是序列学习的新手，目前正在努力研究如何解决这个问题......

任何建议都会非常有帮助，因为我似乎有点卡在这里..

谢谢！

score 1 · Accepted Answer

Yes, since you are trying to predict two label ( Facility and Day ), concatenating of labels will be required. Else, you can also learn 2 different models for predicting each label (see point 3).
I think you should look into regression models for this problem rather than CRFs.
I think the arrangement of the data should be in such a way that log history of a user is learned easily. Can you tell me the 'minimum' history you have for 'any' user ( last 3 logins? 5 logins? 7 logins? ) ?

Assuming you have last 3 logins of every user. Then, if in your place, I would arrange the data in a different manner and learn 2 different models, one to predict day and another to predict facility. An example of arrangement of data and template file for predicting day is here. You similarly, change name of days of week to facility names and learn a model for predicting facility. Also you can think of and add more features to the ones that I have suggested. If you have more user data (say occupation or age or something ) then you should definitely try adding more columns to the training data and add these columns as features in template file. Remember, the testing file should arranged in the same way as training file (except last column can be empty/missing, because it is the label that is to be predicted by the model during testing).

If you want to go ahead and predict both label in one model, you can try concatenation (in the example that I've given you, day will now become day_facility).

machine-learning - 使用条件随机场进行序列学习？

1 回答 1

Related

Reference