This is a quick post about how to get the most out of Power BI (Fabric) model performance, keeping those models under control, refresh times low and live dashboards snappy. In this post I am going to focus on the data storage engine used in Power BI.
The same engine, with some tweaks, powers SQL Server columnstore technology, so what I will cover here also applies to that.
VertiPaq
VertiPaq is the data engine in Power BI. Its how the data is stored and retrieved once you have got it into your Power BI semantic model. It is primarily an in Memory engine with high compression capabilities. It is sometimes referred to as xVelocity engine.
The compression and performance is largely aligned with the number of unique rows in any given column. Therefore, the fewer number of unique rows in a column, the more compressed that column will be.
Date and Time
Lets consider the use of Dates and Time. Imagine we have a sales table which has entries for each sales transaction and stores the date and time to the nearest hour. Which is more performant in Power BI, a combined Date and Time column, or 2 separate columns of Date and Time?
There are 365 days in a year, so if our data only covered a single year, a separate Date column would have a maximum 365 unique values.
There are 24 hours in a day. As we are only storing the time of transactions to the nearest hour, the Time column would have a maximum of 24 unique values.
If we had a combined Date and Time column, the uniqueness is a Cartesian Product of the 2 columns, in our case 365 x 24 = 8,760 potential unique values. An exponential increase, which will have an exponential increase in your model size.
Unique Identifiers
Unique Identifiers in your model will also have a big impact. Lets assume you are reporting over your very large customer base. You want to do things like counts of different customers, by demographics and more. Each row in your customer fact table represents an individual customer, along with a unique ID. Suddenly you have a completely uncompressible column. Worse still the customer ID is not stored as an efficient INT or BIGINT, its a VARCHAR.
Not a lot you can do about this right? We need it to count customers don’t we? Well No actually. If you know each row represents a customer, and you are not explicitly using the Customer ID for anything, you can just count the number of rows and remove the completely uncompressible column. The row count will still represent a count of the unique customers, by what ever filtering you apply, but your model will be noticeably smaller and more performant.
Summary
Always consider Column uniqueness in your Power BI/Fabric semantic models. Can you split columns and still provide the same functionality, like the Date and Time example? Are you explicitly using Unique identifiers or can you get the same functionality another way?
