Updates and Deletes in the KQL engine

28 May, 2024 |
Brian

Brian is the person behind the dcode.bi site. He is keen on helping others be better at what they do around data and intelligence.

Updates and Deletes

Not so long ago, in a galazy far far way, we got two new, long waited functions in the Real-Time Intelligence engine, the Kusto Query Language - also known as KQL.

Update and Delete statements are now a build in part of the engine syntax, and gives us the possibility to manipulate the data in the storage engine.

Normally, in like 98% of all cases, we would like to have the “faulty” data still in the database, as they might describe outliers or other significant elements of the entire dataset. And if we don’t need the data, we can simply just filter them out using the where clause.

Even though the underlying engine for Log Analytics, Sentinel, Graph etc is the Kusto engine, the Update and Delete statement only works in Azure Data Explorer and Real-Time Intelligence in Fabric.

If you are proficient in the SQL language, you will quite fast adobt the KQL way of doing things. It is more or less the same apporach, just using a different language.

Delete statements

The delete functionality has been with the engine for some time now, but we need to know the usage and funtionality in order to get to the more advanced update statements later in this post.

The syntax could be something like the following:

.delete table [tablename] records <| [tablename] | where [column] == "something"

You might wonder wh we need to add the table word in the delete statement. There is a logical explanation to that question - because we can also delete from the results of a materialized view:

.delete materialized-view [view name] records <| [view name] | where [column] == "something else"

As many other languages, you can add joins and filters to the delete statement syntax and get the exact deleted records you need.

The delete statement “only” does a soft delete - meaning that the record is not guaranteed to be deleted from the storage engine. What is soft delete you might ask? Well Microsoft has some good documentation on the topic right here.

Update statements

Any row and any cell, can be updated using the update operator.

Usually you would use the Materialized view option to get the updated row funtionality, but sometimes the view approach is not usable. In these cases the update statement is the way to go. If you source table has high volumes of data coming in to the table, the materialized view might take up more space and compute.

Opposite to materialized views the update statement needs an orchestrator to be executed. A Python Notebook, a Pipeline, Logic apps etc are candidates for this action.

The simple (single table) update statement

A simple example syntax could look something like this:

.update [table] on [column] <|
    [table]
    | where [column] == "something"
    | extend [updateColumn] = "new value"

Notice the notation of the [column] reference in this single table statement. This is needed for the engine to execute and know on what column the filtration is done. Yes, it might seem cumbersome, but that is just the way it is…

The extend function is the statement for the new value. If any row in the table has the same value for the where statement, then each row will be updated with the new value.

Also the where and the extend has different equal signs, the where has two and the extend has only one.

You can add as many where clauses as needed to filter your data and as many extend clauses you need to update the desired columns.

The complex update statement

This statement is a bit more tricky, but still to overcome.

The complex update statement (and update with deletes) can do all the above mentioned things in one go. So you can both update cells and delete rows based on your KQL statement.

This might come in handy when working with live data (as we usually do, in the Real-Time Analytics world) and you need to manipulate the data stored in the storage engine in one go.

Emagine a table with referece data like the one below:

.set-or-replace Reference data <|
  datatable(RowId:string, Identifier:int, Describtion:string)[
    "X", 1, "Ref 1",
    "Y", 1, "Ref 2",
    "Z", 1, "Ref 3",
    "Y", 2, "Ref 4",
    "Y", 3, "Ref 5",
    "Z", 2, "Ref 6"
  ]

Above is creating the data on the fly, the normal situation is that you already has the data stored in your KQL database. We can now use this reference table to update the data - example KQL Statement:

.update table [Tablename] delete A append B <|
    let A = [Tablename] 
        | where [column A] == "something"
        | where [column B] == "something else"
    let B = [Tablename]
        | where [column A] == "something"
        | where [column B] == "something else"
        | extend [updateColumn] = "new value"

Notice the two “sub statements” in the same statement. The first row tells the eninge to delete from the table using reference A and update the table using reference B.

A new possibility, but please don’t build a data warehouse here

These new possibilities gives us even more usage of the KQL engine - but - and this is a huge but - please do not build a full blown data warehouse in the KQL engine.

First of all, this will cost you way to much money to have your entire data estate processed by the KQL engine - and the engine is not build for this purpose.

I hope you got some ideas of how the delete and update statements works.

As always, if you have any comments, ideas or feedback, please leave a comment and don’t forget to subscribe to the newletter below this post.

comments powered by Disqus