algorithm - How can we use a machine learning algorithm on this type of data?

Question

Here is the scenario:

We have a website with the capability for students to create an e-portfolio, which is like a profile page combined with your projects you can add to it.

For each student portfolio we are going to have an educator review the portfolio and give it a set of scores based on the content of the portfolio. So a set of scores which will be summed to a total score will be associated with each students portfolio.

So we have score data, associated with portfolio data and we want to use this data as supervised training data for a machine learning algorithm. So then the computer can examine thousands of these cases, looks for patterns, provide insight and be able to predict scores for other portfolios.

Here is the data we are collecting for each person:

**Portfolio data:**

About: 'Text paragraph data written by the student about themselves'
Skills: 'Text Bullet list of skills'
Career Interests: 'Text Bullet list of career interests'
Work Experience: 'Text paragraph'
Education History: 'Student fills out Universities, majors, gpa, and dates attended'
Courses: 'Text bullet list of courses'
Interests: 'Text paragraph data written by student about interests'
Works: 'Each student adds works to there portfolio and enter the following data'
   Work Title: 'Text title'
   Attachments: 'File and documents attached to the portfolio (jpg, doc, pdf, youtube, dropbox, etc.)
   Work description: 'Text Description of work'
   category of works: 'Selected from list of categories'
   tags: 'list of test tags student adds to work'
   My contribution: 'Text description of students contribution to project'


**Score data we are collecting for each portfolio, each key area rated from 1-100:**

Content completeness:
Selection of Works:
Reflection:
Academic Concepts:
Presentation and Appearance:
Layout and Readability:
Use of Multimedia:
Audience:
Organization of content:
Written Communication:
TOTAL SCORE:

We plan to collect thousands of students portfolios and scores over time. What kind of algorithm could we use to analyze this data to find correlation between portfolios that received similar scores? Then use this data to predict how successful a portfolio will be once a student has filled it out. Please let me know if any of this is confusing or if you need more information, thanks so much!

score 1 · Accepted Answer

你在这里试图解决很多问题。

首先想到的是进行特征提取，然后应用回归来预测分数。现在，由于您使用的不仅仅是投资组合中的文本信息，因此您需要的不仅仅是文本功能。我不知道哪些功能可以帮助您将作品集的“展示和外观”与他们的分数相关联。一种方法是获取颜色、字体、字体大小信息并将它们表示为特征。为了从文本中获得洞察力，您可以使用向量空间模型来表示您的文本。

我会尽快回来写一个详细的答案。如果现在所有这些听起来都太模糊，我很抱歉。

algorithm - How can we use a machine learning algorithm on this type of data?

1 回答 1

Related

Reference