我是一名护士,我知道 python,但我不是专家,只是用它来处理 DNA 序列
我们有用人类语言编写的医院记录,我应该将这些数据插入到数据库或 csv 文件中,但它们超过 5000线条,这可能很难。所有数据都以一致的格式写入让我给你看一个例子
11/11/2010 - 09:00am : He got nausea, vomiting and died 4 hours later
我应该得到以下数据
Sex: Male
Symptoms: Nausea
Vomiting
Death: True
Death Time: 11/11/2010 - 01:00pm
另一个例子
11/11/2010 - 09:00am : She got heart burn, vomiting of blood and died 1 hours later in the operation room
我得到
Sex: Female
Symptoms: Heart burn
Vomiting of blood
Death: True
Death Time: 11/11/2010 - 10:00am
当我说 in ....... 时顺序不一致,所以 in 是一个关键字,之后的所有文本都是一个地方,直到我找到另一个关键字
在开始时他或她确定性别,得到...... ...接下来是一组症状,我应该根据分隔符进行拆分,分隔符可以是逗号、连字符或其他任何东西,但对于同一行
死亡是一致的.....小时后也应该得到多少小时,有时病人还活着并且出院了....等等
也就是说我们有很多约定,我认为如果我可以用关键字和模式标记文本,我就可以完成工作。因此,如果您知道一个有用的函数/模块/教程/工具,最好在 python 中执行此操作(如果不是 python,那么 gui 工具会很好)
一些信息:
there are a lot of rules to express various medical data but here are few examples
- Start with the same date/time format followed by a space followd by a colon followed by a space followed by He/She followed space followed by rules separated by and
- Rules:
* got <symptoms>,<symptoms>,....
* investigations were done <investigation>,<investigation>,<investigation>,......
* received <drug or procedure>,<drug or procedure>,.....
* discharged <digit> (hour|hours) later
* kept under observation
* died <digit> (hour|hours) later
* died <digit> (hour|hours) later in <place>
other rules do exist but they follow the same idea