-2

Note: I have completely changed the original question!

I do have several texts, which consists of several words. Words are categorized into difficulty categories from 1 to 6, 1 being the easiest one and 6 the hardest (or from common to least common). However, obviously not all words can be put into these categories, because they are countless words in the english language.

Each category has twice as many words as the category before.

  1. Level: 100 words in total (100 new)
  2. Level: 200 words in total (100 new)
  3. Level: 400 words in total (200 new)
  4. Level: 800 words in total (400 new)
  5. Level: 1600 words in total (800 new)
  6. Level: 3200 words in total (1600 new)

When I use the term level 6 below, I mean introduced in level 6. So it is part of the 1600 new words and can't be found in the 1600 words up to level 5.

How would I rate the difficulty of an individual text? Compare these texts:

An easy one

would only consist of very basic vocabulary:

I drive a car.

Let's say these are 4 level 1 words.

A medium one

This old man is cretinous.

This is a very basic sentence which only comes with one difficult word.

A hard one

would have some advanced vocabulary in there too:

I steer a gas guzzler.

So how much more difficult is the second or third of the first one? Let's compare text 1 and text 3. I and a are still level 1 words, gas might be lvl 2, steer is 4 and guzzler is not even in the list. cretinous would be level 6. How to calculate a difficulty of these texts, now that I've classified the vocabulary?

I hope it is more clear what I want to do now.

4

1 回答 1

0

您要解决的问题是如何量化您的定性数据。

搜索词“量化定性数据”可能会对您有所帮助。

对此没有通用的通用算法。做到这一点的最佳方法将取决于您要使用该指标的目的,以及您对每项任务的评分对整个项目的实际影响对您感兴趣的因素的影响。

例如,如果最难的任务通常无法解决,那么一旦项目涉及单个类型 6 任务,那么该项目可能会变得无法解决,您的指标需要反映这一点。

您还需要找到一些方法来解决丢失的数据(未评级的任务)。单个数字指标可能无法捕获您想要的有关这些项目的所有信息。

一旦您了解了该指标的用途,以及任务评级之间的相互关系(线性增加难度与分类区别),那么就有很多简单的指标可以整理这种分析。

例如,您可以根据未知任务的数量和难度高于某个阈值的任务的数量对项目进行风险评估。或者,您可以根据任务难度的加权总和对项目的持续时间进行评分,使用未知任务的默认或估计难度。

于 2013-08-20T01:40:55.763 回答