-1

我正在尝试从 drugbank 下载转换 xml 文件。每当我尝试在 excel 2007 中导入它时,它都会说无法导入。也许是因为尺寸。谁能建议我是否可以通过其他任何方式打开此文件,以便将其保存为制表符分隔符?它的第一个文件(所有药物,包括目标、转运、载体和酶信息)在这里,http://www.drugbank.ca/downloads xml 格式

4

1 回答 1

2

这是对我原始答案的完全重写。

对于我最初的答案,我对 drugbank.xml 进行了有限的分析。我有点犹豫,但表示结构太复杂,无法转换为任何标准的制表符分隔文件。我的意思是一个可以由任何标准程序处理的制表符分隔文件。我支持该声明,但可以创建一个可能有用的非标准分隔文件。

下表显示了 drugbank.xml 的结构。

这些列是索引、级别、名称、父级和重复。对于元素 drug 和 partner,Repeats 是实际的重复数。对于其他元素,它是其父元素出现中的最大重复次数。

Inx Lvl Name------------------------------------ Pnt Repeats
  1   1   drugs                                    0       1
  2   2     drug                                   1    6711
  3   3       drugbank-id                          2       1
  4   3       name                                 2       1
  5   3       description                          2       1
  6   3       cas-number                           2       1
  7   3       general-references                   2       1
  8   3       synthesis-reference                  2       1
  9   3       indication                           2       1
 10   3       pharmacology                         2       1
 11   3       mechanism-of-action                  2       1
 12   3       toxicity                             2       1
 13   3       biotransformation                    2       1
 14   3       absorption                           2       1
 15   3       half-life                            2       1
 16   3       protein-binding                      2       1
 17   3       route-of-elimination                 2       1
 18   3       volume-of-distribution               2       1
 19   3       clearance                            2       1
 20   3       secondary-accession-numbers          2       1
 21   4         secondary-accession-number        20       5
 22   3       groups                               2       1
 23   4         group                             22       3
 24   3       taxonomy                             2       1
 25   4         kingdom                           24       1
 26   4         substructures                     24       1
 27   5           substructure                    26      35
 28   3       synonyms                             2       1
 29   4         synonym                           28      82
 30   3       salts                                2       1
 31   4         salt                              30      17
 32   3       brands                               2       1
 33   4         brand                             32     230
 34   3       mixtures                             2       1
 35   4         mixture                           34     340
 36   5           name                            35       1
 37   5           ingredients                     35       1
 38   3       packagers                            2       1
 39   4         packager                          38     173
 40   5           name                            39       1
 41   5           url                             39       1
 42   3       manufacturers                        2       1
 43   4         manufacturer                      42      91
 44   3       prices                               2       1
 45   4         price                             44     172
 46   5           description                     45       1
 47   5           cost                            45       1
 48   5           unit                            45       1
 49   3       categories                           2       1
 50   4         category                          49      11
 51   3       affected-organisms                   2       1
 52   4         affected-organism                 51       3
 53   3       dosages                              2       1
 54   4         dosage                            53      22
 55   5           form                            54       1
 56   5           route                           54       1
 57   5           strength                        54       1
 58   3       atc-codes                            2       1
 59   4         atc-code                          58      36
 60   3       ahfs-codes                           2       1
 61   4         ahfs-code                         60      11
 62   3       patents                              2       1
 63   4         patent                            62       5
 64   5           number                          63       1
 65   5           country                         63       1
 66   5           approved                        63       1
 67   5           expires                         63       1
 68   3       food-interactions                    2       1
 69   4         food-interaction                  68       6
 70   3       drug-interactions                    2       1
 71   4         drug-interaction                  70     246
 72   5           drug                            71       1
 73   5           name                            71       1
 74   5           description                     71       1
 75   3       protein-sequences                    2       1
 76   4         protein-sequence                  75      10
 77   5           header                          76       1
 78   5           chain                           76       1
 79   3       calculated-properties                2       1
 80   4         property                          79      18
 81   5           kind                            80       1
 82   5           value                           80       1
 83   5           source                          80       1
 84   3       experimental-properties              2       1
 85   4         property                          84       4
 86   5           kind                            85       1
 87   5           value                           85       1
 88   5           source                          85       1
 89   3       external-identifiers                 2       1
 90   4         external-identifier               89      13
 91   5           resource                        90       1
 92   5           identifier                      90       1
 93   3       external-links                       2       1
 94   4         external-link                     93       4
 95   5           resource                        94       1
 96   5           url                             94       1
 97   3       targets                              2       1
 98   4         target                            97     144
 99   5           actions                         98       1
100   6             action                        99       2
101   5           references                      98       1
102   5           known-action                    98       1
103   3       enzymes                              2       1
104   4         enzyme                           103      19
105   5           actions                        104       1
106   6             action                       105       3
107   5           references                     104       1
108   3       transporters                         2       1
109   4         transporter                      108      24
110   5           actions                        109       1
111   6             action                       110       3
112   5           references                     109       1
113   3       carriers                             2       1
114   4         carrier                          113       6
115   5           actions                        114       1
116   6             action                       115       1
117   5           references                     114       1
118   2     partners                               1       1
119   3       partner                            118    4227
120   4         name                             119       1
121   4         general-function                 119       1
122   4         specific-function                119       1
123   4         gene-name                        119       1
124   4         locus                            119       1
125   4         reaction                         119       1
126   4         signals                          119       1
127   4         cellular-location                119       1
128   4         transmembrane-regions            119       1
129   4         theoretical-pi                   119       1
130   4         molecular-weight                 119       1
131   4         chromosome                       119       1
132   4         species                          119       1
133   5           category                       132       1
134   5           name                           132       1
135   5           uniprot-name                   132       1
136   5           uniprot-taxon-id               132       1
137   4         essentiality                     119       1
138   4         references                       119       1
139   4         external-identifiers             119       1
140   5           external-identifier            139       9
141   6             resource                     140       1
142   6             identifier                   140       1
143   4         synonyms                         119       1
144   5           synonym                        143      38
145   4         protein-sequence                 119       1
146   5           header                         145       1
147   5           chain                          145       1
148   4         gene-sequence                    119       1
149   5           header                         148       1
150   5           chain                          148       1
151   4         pfams                            119       1
152   5           pfam                           151      15
153   6             identifier                   152       1
154   6             name                         152       1
155   4         go-classifiers                   119       1
156   5           go-classifier                  155      49
157   6             category                     156       1
158   6             description                  156       1

我有一个实用程序,它是为无法处理发送的大量 XML 文档的客户开发的。我将选定的信息提取到一个分隔文件中。尽管这些 XML 文档非常庞大,但结构很简单,在 2 级元素中没有重复。我想知道是否可以增强实用程序以接受重复并将数据输出到分隔文件,尽管是非标准分隔文件。我现在知道我可以,虽然我不确定分隔文件有多大用处。

我的输出有 97 列,每个叶子元素一列。有六个标题行,每个级别一个。其中列出了叶元素及其父元素。当一个元素重复时,该值将放置在下一个可用行上。我希望前三个药物文件的行中的几列可以说明这一点。请注意,此显示的第 61 列已被截断。

|Column 1   |Column 2    |Column 18                  |Column 25  |Column 56                   |Column 60 |Column 61                     |Column 62   |
|drugs      |drugs       |drugs                      |drugs      |drugs                       |drugs     |drugs                         |drugs       |
|drug       |drug        |drug                       |drug       |drug                        |drug      |drug                          |drug        |
|drugbank-id|name        |secondary-accession-numbers|mixtures   |external-identifiers        |targets   |targets                       |targets     |
|           |            |secondary-accession-number |mixture    |external-identifier         |target    |target                        |target      |
|           |            |                           |name       |resource                    |actions   |references                    |known-action|
|           |            |                           |           |                            |action    |                              |            |
|DB00001    |Lepirudin   |BIOD00024                  |           |Drugs Product Database (DPD)|inhibitor |# Turpie AG: Anticoagulants in|yes         |
|           |            |BTD00024                   |           |National Drug Code Directory|          |                              |            |
|           |            |                           |           |PharmGKB                    |          |                              |            |
|           |            |                           |           |UniProtKB                   |          |                              |            |
|DB00002    |Cetuximab   |BIOD00071                  |           |National Drug Code Directory|antagonist|# Hosokawa N, Yamamoto S, Ueha|yes         |
|           |            |BTD00071                   |           |GenBank                     |          |# Snyder LC, Astsaturov I, Wei|unknown     |
|           |            |                           |           |PharmGKB                    |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Negri DR, Tosi E, Valota O, |unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|           |            |                           |           |                            |          |# Overington JP, Al-Lazikani B|unknown     |
|DB00003    |Dornase Alfa|BIOD00001                  |Cauterex   |Drugs Product Database (DPD)|          |# Cramer GW, Bosso JA: The rol|yes         |
|           |            |BTD00001                   |Clorfibrase|GenBank                     |          |                              |            |
|           |            |                           |Elase      |PharmGKB                    |          |                              |            |
|           |            |                           |Fibrabene  |UniProtKB                   |          |                              |            |
|           |            |                           |Fibrase SA |                            |          |                              |            |
|           |            |                           |Fibrolan   |                            |          |                              |            |
|           |            |                           |Parkelase  |                            |          |                              |            |
|           |            |                           |Ridasa     |                            |          |                              |            |
|           |            |                           |           |                            |          |                              |            |

结果文件有 135,713 行,长度为 52,171,387 字节。这或一些简单的变化会有用吗?

于 2012-05-06T10:54:30.603 回答