Learn Kusto - Join types in KQL

26 Mar, 2024 |
Brian

Brian is the person behind the dcode.bi site. He is keen on helping others be better at what they do around data and intelligence.

Learn Kusto - join types in KQL

Hello - and welcme back to this series of posts around Kusto and learning the language. In this post I’ll guide you through the methods of doing joins and what to be aware of when designing your queries using joins. In Kusto the join method is a bit more complex than compared to T-SQL. But don’t worry - with this guide you’ll know how to use the features.

Join types

As with other data query languages, we can do joins to join two or more tables together for a selection of data across those tables.

As a default, the Kusto engine works with an inner join, if not other join methods are definded.

If you need to define a different join type, then the syntax is:

kind=<jointype>

With jointypes as:

  1. leftanti
  2. leftsemi
  3. leftantisemi
  4. rightanti
  5. rightsemi
  6. rightantisemi
  7. innerunique
  8. inner
  9. leftouter
  10. rightouter
  11. fullouter

Depending on the selected join type, the output will be different. A quick overview of the different output scenarioes are found below:

Join kind Schema or Records Output
leftanti S & R The result table contains columns from the left side only.
leftsemi S The result table contains columns from the left side only
leftantisemi R Returns all the records from the left side that don’t have matches from the right
rightanti S & R The result table contains columns from the right side only.
rightsemi S The result table contains columns from the right side only.
rightantisemi R Returns all the records from the right side that don’t have matches from the left.
innerunique S & R A column for every column in each of the two tables, including the matching keys. The columns of the right side will be automatically renamed if there are name clashes.
inner S Returns all matching records from left and right sides.
leftouter S & R A column for every column in each of the two tables, including the matching keys. The columns of the right side will be automatically renamed if there are name clashes.
rightouter S & R A column for every column in each of the two tables, including the matching keys. The columns of the right side will be automatically renamed if there are name clashes.
fullouter S & R A column for every column in each of the two tables, including the matching keys. The columns of the right side will be automatically renamed if there are name clashes.

The implementation

How to use the join types (kinds) are as follows.

TableA | join <kind=<kindtype>> (TableB) on CommonColumn, $left.Col1 == $right.Col2

Given above, then the easiest join to do is between two tables with the same columnnames on both sides of the join. Then you don’t have to use the “$left” and “$right” arguments as the two columns on both sides have the same name (CommonColumn).

If you need a different join kind than the default inner join then you have to use the optional join kind as stated above. Please note here that you only need one equal sign in this part of the statement.

In the case that the two columns that needs to be joined are different on the two sides of the join, you need to implement the left and right part of the syntax.

Example from using the mentioned free demo environment from the first post in this series.

I’m renaming the columns from the PopulationData table to force the example - I know that I could just use the “normal” join - this is only done for demo purposes.

StormEvents
| join kind=inner (
    PopulationData 
    | project-rename  StateString = State, PopulationString = Population
    ) on $left.State == $right.StateString

You can, as with joins from the SQL world, join on more than one column using “and” as seperator.

A small expert tip

When doing joins in the world of Kusto, it is considered a best practice to start with the smallest tables and join in the largest tables to that. In comparion with T-SQL where the normal approach is to start with the largest table (Ex. the fact table) and then join in the smaller tables (the dimensions), then Kusto is giving you a much better performance if you implement the smaller tables first. This is due to the nature of the underlying distributed compute engine and the way that each table is partitioned in the storage layer.

Summary

With this 3rd edition of the Kusto learning series, you are now ready to most of the work needed to code your own Kusto queries.

If you want more of this, you can read the entire documentation from Microsoft follwing this link

Don’t forget to sign up for news in this series of blogs using the form below.

comments powered by Disqus