I'm trying to use Apache Drill (for the first time) on a JSON file that looks like this:
{
"Key1": {
"htmltags": "<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag attr3='charlie' />"
},
"Key2": {
"htmltags": "<htmltag attr1='kilo' /><htmltag attr2='lima' /><htmltag attr3='mike' />"
},
"Key3": {
"htmltags": "<htmltag attr1='november' /><htmltag attr2='foxtrot' /><htmltag attr3='sierra' />"
}
}
My initial query was the hello world of drill: SELECT * FROM DataFile.json
, and returned me the columns Key1
, Key2
, Key3
. They only had one row, and it contained the entry:
"<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag attr3='charlie' />"
[i.e., only the entry Key1.htmltags
].
I have two questions:
- Why was there only one row returned, when there were three differently valued entries for each key?
- After using the KVGEN/FLATTEN functions to get at my strings inside "htmltags" above, is there a way to drill further into (analyse and extract data from) the HTML tags?