Learn Kusto - Clusters for compute

16 Apr, 2024 |
Brian

Brian is the person behind the dcode.bi site. He is keen on helping others be better at what they do around data and intelligence.

Learn Kusto - Clusters for compute

In todays post I’ll dive into the choice between the options when creating the clusters for Azure Data Explorer. The underlying architecture for the Kusto/ADX clusters is build on virtual machines (VM) with specific configrations.

Dev/test vs production

When creating the clusters from the Azure portal, you are presented with 3 options when choosing the compute specification.

The compute specification is the method of setting up the clusters for the specific workload you are planning to put on the Kusto cluster.

The portal gives you these three options: Kusto compute specification

The first option will create a Data Explorer cluster based on a single VM. This option is the best if you are planning on testing the service or make proof of concepts. The single VM (dev/test) option also has some limitation to the service:

  1. They don’t scale
  2. The markup for ADX is not charged
  3. No SLA from Microsoft

When choosing the Dev/test SKU the Azure portal defaults to a configuration based on Eav4/Easv4 series VMs with 2 vCPUs, 16 GB of memory and 24 GB of cache. If you click the “Select other” text just below the compute specifications, you can choose between the first mentioned configuration or a configuration based on Dv2/DSv2 series VMs. The Dv2 option comes with 14 GB of memory and 78 GB cache.

If you want to learn more about the differences between the two options, you can read more at the Microsoft Learn below:

  1. Eav4/Easv4 series
  2. Dv2/DSv2 series

Production options

When selecting the production option, you are faced with a option between storage optimized and compute optimized. Production environments are always containing at least two (or more) VMs (also called nodes).

From the Microsoft documentation we can read the difference between the two options:

  1. Compute optimized - provides a high core to cache ratio and the lowest cost pr core. It uses local SSD disks for low ltatency I/O
  2. Storage optimized - provides a larger storage ranging from 1 TB to 4 TB per engine node and from this the lowest cost pr gigabyte. Use this option to store large volumes of data (remember that Kusto compresses data when ingested to the engine)

To help you as an end user to more easy select the configration, you can select from different sizes of SKUs:

Compute optimized

Compute optimized options

Storage optimized

Storage optimized

For even more choices, again, you can click the “select other” just below the “compute specifications”. The choise here is depending on the compute power you need (vCPUs) or the cache needed. The cost effective versions here is clearly the compute optimized versions - they cost from $0.372/h to $5.7/h. The storage optimized versions has prices between $1.624/h to $6.848/h.

The prices are given pr hour - an Azure Data Explorer/Kusto cluster are almost always running 24/7 (given the fact that it handles streaming data from ex. IoT devices). So the price pr month is between $268 and $4,104 for the compute optimized versions and $1,169 and $4,931 for the storage optimized versions.

Please remember that you need at least two VMs to run a production environment - so all prices above needs to be calculated based in the actual number of nodes/VMs.

How to choose between the two production options

The choice between the two versions of production setups can be hard to decrypt.

A rule of thumb could be:

  1. Do you need to store huge amounts of data in the cluster then go for the storage optimized version.
  2. Do you anly have a minor amount of data, then go for the compute optimized version.

If you choose the storage optimized version, then you are not limited to compute. You can get somewhat the same compute performance from the storage optimized versions - it is a matter of configuring and selecting the VM that suits your needs.

comments powered by Disqus