Speed up Power BI reports using columns

Did you know, in some cases, it is possible to add columns to your model to speed up the DAX used behind visuals to make them as much as 100 times faster?

The PBIX helper file for this article can be downloaded here.

I have come across this scenario a few times recently, so thought I would share, in case it helps you.

CAVEAT

This particular tip should help you if your report/visual meets the following conditions:

  • The report uses SEARCH or FIND as a filter
  • The terms used in SEARCH or FIND are fixed and not designed for end users to modify
  • The filtered column is on a large table

How do I know if my report uses SEARCH or FIND? These DAX functions get used when you have a Report/Page/Visual level advanced filter that uses a variant of the contains predicate.

In the example below, which uses the Wide World Importers database, the visual is filtered so that it only considers rows in the Description column that does not contain the text ‘DBA’ or ‘Joke’. This filter gets applied to the DAX query, so the bar chart in the visual will only show values for rows other than this predicate.

The Description column filtered in this case comes from the ‘Fact Sale’ table, and the assumption is that it can potentially store a different value in every row, and is not merely a duplicate of what may appear in the product dimension table.

Using the Power BI Performance Analyser for this report page, we see the DAX query for this column chart took over a second on the DAX component (1,108 milliseconds).

Clicking the Copy query link in performance analyser allows us to review the DAX being generated by this visual in another tool.

// DAX Query
DEFINE VAR __DS0FilterTable = 
  FILTER(
    KEEPFILTERS(VALUES('Fact Sale'[Description])),
    OR(
      NOT(SEARCH("DBA", 'Fact Sale'[Description], 1, 0) >= 1),
      NOT(SEARCH("Joke", 'Fact Sale'[Description], 1, 0) >= 1)
    )
  )

EVALUATE
  TOPN(
    1001,
    SUMMARIZECOLUMNS(
      'Dimension Date'[Calendar Year Label],
      __DS0FilterTable,
      "SumQuantity", CALCULATE(SUM('Fact Sale'[Quantity]))
    ),
    [SumQuantity],
    0,
    'Dimension Date'[Calendar Year Label],
    1
  )

ORDER BY
  [SumQuantity] DESC, 'Dimension Date'[Calendar Year Label]

Lines 2 through 9 show a filter table generated by the filter panel, that is used later in the main query at line 16. The filter, in this case, is converted to use the SEARCH() function to scan every row of the ‘Fact Sale'[Description] column to see if the text patterns “DBA” or “Joke” appear anywhere in the line.

Like most databases, VertiPaq is not super fast when it comes to string searching, so the idea is to convert this work to a Boolean column as follows.

The Tip

Create a calculated column on the ‘Fact Sale’ column using rows 5 through 8 from the DAX captured by performance analyser as follows. The new column only contains only two possible values (true or false). The following DAX shows what the new calculated column looks like.

Excluded Products =
    OR (
      NOT(SEARCH("DBA", 'Fact Sale'[Description], 1, 0) >= 1),
      NOT(SEARCH("Joke", 'Fact Sale'[Description], 1, 0) >= 1)
    )

This creates the following column in the database which only ends up adding 7,016 bytes to the over all size of the database, despite this value being applied to all 228,265 rows in the ‘Fact Sale’ table.

Now, we can update the filter panel to use the new column instead of the previous advanced filter. The new filter can be applied over the new [Excluded Products] column and apply a basic filter that the value should always be true.

The updated DAX query now only takes 10 milliseconds to complete, compared with 1,108 milliseconds previously. The new DAX is over 100 times faster.

How this works

The new calculated column pushes the text scanning work to the point data is loaded or refreshed into the model, and removes the scanning from the point the visual is loaded. The new filter applies a much simpler and faster test over a super-low cardinality column.

The visual loses the flexibility to allow an end user to adjust values considered by the advanced filter, but if this is not something you want to allow then adding several columns such as this can significantly speed up your report.

The technique is not specific to the report filter. If you have a calculated measure that uses the SEARCH or FIND functions to look for hard-coded values, these too can also benefit from pushing this work into calculated columns e.g. if you have a condition that always looks to see what the first (or last n characters) of a text column are to direct the flow of the logic.

Other improvements

This technique is not just for optimising SEARCH and FIND filters. It can be applied to other types of filter columns as well. Say you have a filter on a report to apply over a relative date range between certain days. Adding a column to your date table using the same logic to store a 1/0 or true/false will also provide speed improvements. Not necessarily to the same degree as the string search, but noticeable enough.

Tech Stuff

The VertiPaq Analyser details are as follows:

The total number of rows (1) in the ‘Fact Sale’ table is 228,265, and the total column size in bytes (2) for the new column is only 7,016. When compared with the total size of all columns in the model (3) is 68,745,324 bytes, the new column has only increased the overall size of the database by 0.01%.

The number of Run Length Encoding (RLE) runs for the new column was just two and the number of bits used for the column was just 1.

6 thoughts on “Speed up Power BI reports using columns

  1. Hi Phil,

    How do you deduce/find-out that the number of Run Length Encoding (RLE) runs for the new calculated column is just two?

    The number of distinct or in this case unique values is two; however, I would consider it very unlikely that a calculated column on a table with 228000 rows would produce only two Run Length Encoding (RLE) runs.
    That would require that the fact sale table is sorted in such a way that the strings returning FALSE are all in the first part and all strings returning TRUE are in the following part or vice versa.

    The internal compressing/arranging of the data is still a little bit of a puzzle to me, so any explanation would help.

    Best regards Jes.

    1. Hi Jes,

      I’m pleased you noticed that. I have a blog post coming where I will go into this in more detail. But to answer your question in short, I used the following DMV.

      SELECT * FROM $SYSTEM.TMSCHEMA_COLUMN_STORAGES

      Statistics_RLERuns

  2. So, shouldn’t the formula be AND() instead of OR() – you want records where it does not contain JOKE and it also doesn’t contain DBA? It seems to me any records that contain just one of the terms wouldn’t get filtered out.

  3. Hi Wanderlei, You are probably right, however the main thing I was trying to show was the difference in performance before and after. Changing the OR() to an AND() probably wouldn’t change the numbers too much in this case. 🙂

Leave a Reply