1

我正在使用 OrientDB ETL 工具以 GB 为单位导入大量数据。CSV 的格式是这样的(我使用的是 orientDB 2.2):

“ 101.186.130.130”,“ 527225725”,“ 233 DJFNSDKJ”,“ 0.119836317542”“ 125.143.534.148”,“ 1122212983”,“ 1122212983”,“ 12227 SDFSDFSDFSDFS”,“ 0.01111111111111111111111111111111111111111” ,"0.0938863016658" "103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"

我需要创建两个顶点,一个具有 Column1 中的值(键是值本身),另一个顶点具有第 2 列和第 3 列中的值(它的键与两个值连接,并且都作为第二个顶点类型中的属性出现,第 4 列将是连接这两个顶点的边的属性。

我使用了下面的代码,它可以正常工作,但有一些错误,一个问题是每个 csv 行中的所有值都存储为 IpAddress 顶点内的属性,有没有办法只在其中存储 IpAddress。其次,请您告诉我连接从 csv 读取的两个值的方法。

{
  "source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
 "extractor": { "csv": {"columnsOnFirstLine": false, "columns":     ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
 "transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
            "joinFieldName": "address",
            "lookup": "PhyLocation.loc",
            "direction": "out",
    "targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
        "edgeFields": { "confidence": "${input.prob}" },
        "unresolvedLinkAction": "CREATE"
        }
    }
 ],
"loader": {
"orientdb": {
   "dbURL": "remote:/localhost/Bulk_Transfer_Test",
   "dbType": "graph",
   "dbUser": "root",
   "dbPassword": "tiger",
   "serverUser": "root",
   "serverPassword": "tiger",
   "classes": [
     {"name": "IpAddress", "extends": "V"},
     {"name": "PhyLocation", "extends": "V"},
 {"name": "Located", "extends": "E"}
   ], "indexes": [
     {"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
 {"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
   ]
}
}
}
4

0 回答 0