JSON Queries in KQL
Almost everyone know the struggle with handling JSON objects in their SQL queries. It is not easy, and in worst case, you need to create a lot of subqueries to fully un-fold the content of a JSON object.
With a Python notebook, we have the possbility to use the “explode” function. This helps a lot with that work.
In the KQL world, we also have a built-in helper around JSON objects. This blog post is a small get-started guide to help you understand the usage and whereabouts around JSON query strutures in KQL.
The setup
For me to show you how KQL handles JSON, I need some demo data to work with. In this case I’ve loaded the Example 5 from the adobe open source demo repository. A deep link can be found right here.
If you want the easy way to play with this locally, you can grab the script below:
.create table demojson (dataexample:dynamic )
.ingest inline into table demojson with (format = "psv") <|
{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"DevilsFood"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"PowderedSugar"},{"id":"5006","type":"ChocolatewithSprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}
.ingest inline into table demojson with (format = "psv") <|
{"id":"0002","type":"donut","name":"Raised","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}
.ingest inline into table demojson with (format = "psv") <|
{"id":"0003","type":"donut","name":"OldFashioned","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}
The rest of this post, will use the above demo data for examples
Queries to handle the JSON object
There are several ways of querying a JSON object.
1. Parse the data using the parse_json way
The parse_json way parses the json with a new row for each item in the JSON document.
demojson
| mv-expand d=parse_json(dataexample)
This gives me the following output (cut to fit a screen)
You can possibly see that this approach is not the best for every scenario. It can work in some, if the end user (or application) can accept this output format and handle the data.
With the pase_json funtion we can also cann individual names in the dataset like this:
demojson
| mv-expand d=parse_json(dataexample).batters
Now data only returns one row for each row, and we can begin to work with it in human-readable form.
2. Be lazy and just call the names
As any other data developer, I tend to be lazy when I write my code. Using the short-cuts to get things working. So the query from before, can be re-written like this:
demojson
| extend dataexample.batters
Notice that the mv-extend, is now just extend and the name calling is made with a dot.
This gives me possibility to call every name I need in my query, just using a dot-approach. (is that even a name?)
If I want all ingredients in the batter I can then call it like this:
demojson
| extend batter = dataexample.batters.batter
| mv-expand d=parse_json(batter)
As you see now, I need to extend using the mv-extend function as I have more than one item is the nested JSON. I can call each item by its ordinal position in a 1-index array like this:
demojson
| extend batter1 = dataexample.batters.batter[1].type, batter2 = dataexample.batters.batter[2].type,batter3 = dataexample.batters.batter[3].type
Filtering and searching in JSON
I can also use all the normal filter functions on this data using all of the above mentioned ways. By using a simple where clause in the statement, I can seach for a specific item in the JSON object.
demojson
| extend batter1 = dataexample.batters.batter[1].type, batter2 = dataexample.batters.batter[2].type,batter3 = dataexample.batters.batter[3].type
| where batter1 == "Chocolate"
This gives me two rows with data.
But…. I can also search for strings directly in the JSON text.
demojson
| where dataexample contains "Cake"
The contains operator is just one of many string operators you can use. the complete list can be found at Microsoft Learn right here
Try it out yourself
This is just a quick rundown of the possibilities you can get using the KQL engine found in Fabric Eventhouse, Azure Data Explorer and other KQL based services.
Remember to join the newsletter to get notifications directly in your inbox for every new post. You find the sign-up box below.