我有一个一维数组,我用它来存储我的数据集的分类特征,如下所示:(其中每个数据实例属于许多类别,类别用逗号分隔)
Administration Oral ,Aged ,Area Under Curve ,Cholinergic Antagonists/adverse effects/*pharmacokinetics/therapeutic use ,Circadian Rhythm/physiology ,Cross-Over Studies ,Delayed-Action Preparations ,Dose-Response Relationship Drug ,Drug Administration Schedule ,Female ,Humans ,Mandelic Acids/adverse effects/blood/*pharmacokinetics/therapeutic use ,Metabolic Clearance Rate ,Middle Aged ,Urinary Incontinence/drug therapy ,Xerostomia/chemically induced ,
Adult ,Anti-Ulcer Agents/metabolism ,Antihypertensive Agents/metabolism ,Benzhydryl Compounds/administration & dosage/blood/*pharmacology ,Caffeine/*metabolism ,Central Nervous System Stimulants/metabolism ,Cresols/administration & dosage/blood/*pharmacology ,Cross-Over Studies ,Cytochromes/*pharmacology ,Debrisoquin/*metabolism ,Drug Interactions ,Humans ,Male ,Muscarinic Antagonists/pharmacology ,Omeprazole/*metabolism ,*Phenylpropanolamine ,Polymorphism Genetic ,Tolterodine Tartrate ,Urinary Bladder Diseases/drug therapy ,
...
...
数组的每个元素代表数据实例所属的类别。我需要使用 one-hot 编码,所以我可以使用这些作为一个特征来训练我的算法。我知道这可以使用 scrikit-learn 来实现,但是我不确定如何实现它。(有大约 150 个可能的类别和大约 1,000 个数据实例。)