1

我正在尝试调整对 Carrot2 REST API 的调用:

        $client = new Client();
        try {
            $params = [
                'multipart'=> [
                    ['name'=> 'dcs.c2stream', 'contents' => $xml],
                    ['name' => 'dcs.algorithm', 'contents' => 'lingo'],
                    ['name' => 'dcs.output.format', 'contents' => 'JSON'],
                    ['name' => 'dcs.clusters.only', 'contents' => 'true'],
                    ['name' => 'MultilingualClustering.defaultLanguage', 'contents' => 'FRENCH'],
                    ['name' => 'preprocessing.labelFilters.minLengthLabelFilter.minLength', 'contents' => 5],
                    ['name' => 'preprocessing.documentAssigner.minClusterSize', 'contents' => 4]
                ],
                'debug' => false
            ];
$response = $client->request('POST', 'http://devbox:8080/dcs/rest', $params);

术语参数“preprocessing.labelFilters.minLengthLabelFilter.minLength”和“preprocessing.documentAssigner.minClusterSize”对请求没有影响。

我在 lingo algorithm 的文档中找到了它们。

感谢帮助 !

4

1 回答 1

1

有了好的 docker 镜像,一切都很好(docker pull touane/carrot2):

        $c2Payload = [
        'algorithm' => 'Lingo',
        'language' => 'French',
        'parameters' => [
            'preprocessing' => [
                'documentAssigner' => [
                    'minClusterSize' => 4
                ],
                'labelFilters' => [
                    'minLengthLabelFilter' => [
                        'minLength' => 8
                    ],
                    'completeLabelFilter' => [
                        'labelOverrideThreshold' =>  0.35
                    ]
                ]
            ],
            'scoreWeight' => 1, // Tri par score
            'clusterBuilder' => [
                'phraseLabelBoost' => 2.5
            ],
            'dictionaries' => [
                'wordFilters' => [
                    ['exact' => $this->getParameter('carrot2')['stop_words']]
                ]
            ],
            'matrixBuilder' => [
                'termWeighting' => [
                    '@type' => 'LinearTfIdfTermWeighting'
                ],
                'boostFields' => ['title']
            ]
        ],
        'documents' => []
    ];

            $client = new Client();
        $params = [
            'body' => json_encode($c2Payload ),
            'debug' => false
        ];
        $response = $client->request('POST', $this->getParameter('carrot2')['url'], $params);
于 2021-07-27T12:14:19.993 回答