Accelerate - option (go fast)

Brian

Brian is the person behind the dcode.bi site. He is keen on helping others be better at what they do around data and intelligence.

Accelerate - option (go fast)

There is a new option in town - the real deal - the thing that makes data load go really fast. It is called Accelerate, and it is the option that gives you speed in shortcuts you have never seen before.

This post gives you all you need to know, to get started using the new Accelerate option in Fabric when creating shortcuts.

The environment

To set this up, I’ve created a workspace which contains the following:

  • A Lakehouse
  • An Eventhouse

I’ve loaded a 600 million rows dataset from the TPC-H standard dataset structures.

In order to load the data to the Lakehouse (and to the OneLake), I’ve first uploaded the data to a storage account in Azure (it is 1.300 parquet files) and then created a copy activity to get all parquet files to a table in the Lakehouse named lineitem600m.

As you can see from the quick stats of the pipeline - I’ve read more than 600.000.000 rows and written the same amount of rows to the Lakehouse.

Load of 600m rows

The Accelerate feature

This option and feature is currently in preview you can read the docs here.

This feature uses the Kusto engine to preload the data from the shortcut to the memory of the engine. Giving an incredible fast data processing.

It only works for data stored in the OneLake storage - so if you try to implement this feature using data from a Storage Account in Azure (or any other place), you will get an error. After a retest of the setup, it works for Delta Parquet files in the storage accounts listed in the documentation. I missed the part with the “Delta”.

The option currently has some limitation:

  • The number of columns in the external table can’t exceed 900.
  • Query performance over accelerated external delta tables which have partitions may not be optimal during preview.
  • The feature assumes delta tables with static advanced features, for example column mapping doesn’t change, partitions don’t change, and so on. To change advanced features, first disable the policy, and once the change is made, re-enable the policy.
  • Schema changes on the delta table must also be followed with the respective .alter external delta table schema, which might result in acceleration starting from scratch if there was breaking schema change.
  • Index-based pruning isn’t supported for partitions.
  • Parquet files with a compressed size higher than 6 GB won’t be cached.

Setup your shortcuts

I’ve created two shortcuts - one with the Accelerate option enabled, and one without the option enabled.

To create a shortcut - go to your Eventhouse database (a KQL database) and select “new” and “OneLake Shortcut”.

Shortcut step 1

Now select the Microsoft OneLake option.

Shortcut step 2

I’ve created a Lakehouse (with the same name, for ease of access), so I will select my Lakehouse where the table is stored.

Shortcut step 3

I will select my table (named lineitems600m) and click the “Next” button

Now, the next page in the guide, is where the new option is revealed. The default behavior for new shortcuts based on OneLake will be with the Accelerate option enabled, as highlighed below.

Shortcut step 4

I have changed the name of the shortcut to lineitems600m_accelerate to tell them apart when I do my select statements.

The same process as above has been done, just with the change to deselect the Accelerate option in the last step of the guide.

When the shortcut is created, you can alter the caching period of the data. By default data is kept in memory for 36500 days. These days are automatic calculated based on the internal storage metadata and the modifiedDate column for each row in the shortcut.

This can be changed by editing the Data policies on each shortcut.

data policies

Or by altering the shortcut’s data policy in KQL script:

.alter external table lineitem_accelerate policy query_acceleration '{"IsEnabled": true, "Hot": "1.00:00:00"}'

With the Hot variable to define the timespan needed for chaching.

Make sure the data load has been completed

When you create a shortcut with acceleration enabled, the underlying Kusto engine begins to load the data from the source to memory. Depending on the size of the data, this can take a while.

To get a status of this process, you can execute the following command:

.show external table lineitems600m_accelerate operations query_acceleration statistics

When loading the data to memory, I’ve seen that the process loads around 200 megabytes pr. minute. Compared to the load of data into the Lakehouse from the storage account with around 380 megabytes pr. second, I hope this feature will be faster at the loads after the preview period.

I have also seen that the engine can only load one table to memory at the time. So no parallel processing of the Accelerate table to memory. As this feature is still in preview, I guess this is due to some ongoing development on the engine’s capacility to load the data. On furhter investigation, this is due to the current setup of the scaling in the Eventhouse. The engine behind it, does not autoscale based on shortcuts and their data structure and size. This will hopefully be solved before GA.

The comparison

I’ve executed the following KQL query 20 times against each of the tables to get an average of executiontime. I’m using an F64 (not the trial) for this demo. you might get different numbers based on your own tests.

To get the exetutiontimes, I use the .show queries command and add a render operator to the output to get a chart as below.

.show queries 
| where Text contains "external_table('lineitem_" and Text contains "count"
| sort by StartedOn
| take 40
| sort by Text, StartedOn
| extend rownumber = row_number(1, prev(Text) != Text)
| project Text, Duration, rownumber
| render columnchart with (xcolumn=rownumber, ycolumns=Duration, series=Text)

This gives me the following output:

Shortcut comparison 1

It is barely readable. The red colmns (the ones with no Accelerate) take up too much space - but if I change the Kusto query just a bit as below (the last ysplit=panels), then we begin to be able to read the data:

.show queries 
| where Text contains "external_table('lineitem_" and Text contains "count"
| sort by StartedOn
| take 40
| sort by Text, StartedOn
| extend rownumber = row_number(1, prev(Text) != Text)
| project Text, Duration, rownumber
| render columnchart with (xcolumn=rownumber, ycolumns=Duration, series=Text, ysplit=panels)

shortcut comparion 2

The blue chart above is the one with the Accelerate option enabled, and the red one is the one without the Accelerate option enabled.

If I change the query to give me the average executiontime in a barchart, we can compare the results:

.show queries 
| where Text contains "external_table('lineitem_" and Text contains "count"
| sort by StartedOn
| take 40
| sort by Text, StartedOn
| extend rownumber = row_number(1, prev(Text) != Text)
| summarize AvgDuration=avg(Duration) by Text
| render barchart 

Shortcut comparion 3

The conclusion

When using the Accelerate option on your shortcuts in the KQL database from Real-Time Intelligence in Microsoft Fabric, you will get a performance increase of 0.334 s/ 0.003 s. Or on short - your queries will run around 100 times faster…

The loadtime from creating the shortcut to the data being this fast, needs to be taken into account. I know my setup is extreme, as many companies will have the data growing steady over time and not, as I did, load all the data at once.

I would love to see any other data engine perform this well with a few clicks with a mouse.

Disclaimer

This is my own test and not made in a lab, so some might get different results based on their test and setup. If you have any inputs to my test, please do not hesitate to reach out. I would gladly here your thoughts in this. You can always find me on LinkedIn