machine-learning - 我想建立一个交互式机器学习系统，使用尽可能多的现成软件对物品进行分类

Question

我有一个复杂的问题，我想用机器学习来解决，但是因为我是机器学习的新手，所以我会先尝试解决一个非常简单的问题来练习。基本问题是：我可以通过下载/定制现成的（免费或商业）软件来创建多少？

想象一个能够理解动物园里动物的系统。我将追踪它们的一些简单特征：它们吃什么？它们是夜间活动的还是白天活动的？它们如何移动（滑行、游泳、飞行、行走、爬行），它们有几条腿？等等。每只动物可能有十几个属性，也可能是训练集中已知的所有属性。

我想用一些关于一群动物的事实来启动这个系统，我会告诉它这些都是真实的事实。然后，我基本上想说“好吧，我有一只新动物”并让它问我关于动物的信息。根据它对所拥有数据的了解，我希望它优先考虑问题（即，首先询问最有用的问题）。当它了解新动物时，我希望它开始猜测答案。例如，“它是夜间活动吗？我认为，有 68% 的信心，它是”，我会告诉它“是的，你是对的”。

我想添加关于动物的新属性。也许他们是否是掠食者。显然，我必须从一开始就提供数据，但我希望系统具有足够的适应性，能够接受这样的新属性，并在获取数据时逐渐开始对其相关性建立信心。

一个有趣的类似系统是在20q.net，它播放“20个问题”并且非常好。我并不是想玩那个游戏，但那是我正在寻找的那种互动性。我的难题看起来有点像 20 个问题的问题。将有成百上千的已知属性。任何给定的“事物”可能只有几十个答案是已知的，而其他数百个的答案将是未知的。根据已知情况，解决难题的系统将不得不选择要询问的问题以获取更多信息。

我见过 Weka，我什至在其中加载了一些数据集。这似乎是构建我的交互式系统的正确引擎吗？是否有工具包（Weka 或其他）可以让构建这样的系统变得简单？即，我可以下载（或购买）和定制多少，我需要自己构建多少？

score 0 · Accepted Answer

Hello :) Nice to meet you. I'm trying to do something similar.

I have researched the necessary data types required for a program in order to understand, learn and, therefore, ask (to learn some more).

My approach is to teach the program just like a little baby. But a little baby gets bored, distracted and doesn't get it the first time. So a computer program would act the same, until it has the basic tools for understanding, like language.

My current model is this

create a (blank) database; -- this database has some tables like : words, sentences, actions etc.
create a program that can receive input (sentences); parses the sentences, saves words, tries to make sense of the sentence (see python nltk); if the sentence requires output then try to search for output by looking in the database for saved actions. If there are no actions then the program asks the user for the result or hints. If you build a script that searches the answer on the web (good luck) then the program asks the user if it is allowed to look-up the answer online. If the program cannot resolve the users' request, then it builds up interest on that request.
I'm also trying to create a tree of synapses based on the input-output process. EG: the table of synapses has these fields : 1. id; 2. id_object (more on that later) 3. id_parent (the id of the parent synapse). This would be useful if, say, you would want the program to be multi-tasking and remember the discussions so that you would not have to repeat every command over and over.

For instance: I'll prompt the program to search a "THING" for me and "PERFORM" an "ACTION" with that "THING". This requires RAM-like learning (temporary memory) in order to firstly search for the "THING" and (after and IF it finds it) "PERFORM" the (heh based on the programs' synapses history) best matched "ACTION" that it could find.

This is not an answer to your question because you need to understand databases and nltk before you can continue. This is my concept (or approach) of creating a literally learning machine.

Cheers!

machine-learning - 我想建立一个交互式机器学习系统，使用尽可能多的现成软件对物品进行分类

1 回答 1

Related

Reference