Power BI Storage Modes explained
Some time ago we defined the ‘Powerdobs best practices for Power BI‘, and one of them is using Import over DirectQuery mode. Best practices ensure that reports are built to a certain standard and perform optimally. But what about the other storage modes in Power BI? Let’s take a look at the different storage modes available in Power BI and see what the advantages and disadvantages are, as well as which storage mode is best suited to which situation.
What are storage modes?
The term “storage mode” in Power BI refers to how tables are stored and how queries will be sent to a data source. In Power BI desktop you can specify the storage mode on a per-table basis, this lets you control whether or not the table data stored in-memory or not. It is important to understand how the different storage modes function and how they can be utilised to meet certain requirements.
There are a few different storage modes in Power BI (some more well known than others):
- Direct Query
- Live Connection
Factors to consider when choosing a storage mode
Each storage mode has features that makes it more or less suited to certain scenarios. There are four main factors to consider when determining the appropriate storage mode for your requirements are.
- Query Performance
When users interact with visuals in your report, DAX queries are submitted to the dataset. Storage modes can have a big impact on the performance of the submitted query and this in turn can have a big impact on how usable and interactive your report is. Thus it is important to carefully consider the advantages and disadvantages of each mode to understand which mode best fits your data and reporting needs.
- Dataset Size
Dataset size is another factor to consider when choosing storage modes. Larger tables consume more memory when cached and depending on the size of your dataset this could impact performance significantly. Carefully consider which tables need to, and which do not need to be cached for query performance, and choose the storage modes accordingly.
- Refresh Frequency
Tables that aren’t cached in memory do not need to be refreshed, thus you can optimize refresh times by caching only the necessary tables to meet your specific needs. By optimizing refresh times you can also increase the frequency of the refreshes which would provide report users more up to date reports and visuals further improving the quality of reporting.
- Near-real time data consumption
Near-real time requirements require a reduction in data latency and as such these tables will benefit from not being cached. Choosing the appropriate storage mode is of vital importance to meet requirements relating to near-real time data consumption.
As we can see, each of these factors can greatly improve or lower the usability of any given report, thus taking all these factors into account when choosing a storage mode is vital to ensuring that you provide report users with the optimal experience based on their needs.
Different Power BI Storage Modes
We need to explore each of these storage modes in more detail to get a sufficient understanding of the properties of each one and see how each storage mode can affect the factors mentioned above. We start with the most used and also the default storage mode in Power BI:
As mentioned, import mode is the default and most commonly used storage mode to develop datasets in Power BI. It delivers fast performance thanks to in-memory querying and it also supports specific Power BI service features such as Q&A and Quick Insights. In this mode Power BI connects to underlying data sources and downloads the data from the data source. This data is then stored in the in-memory Power BI model and a fresh copy of the data is downloaded each time the data is refreshed. With import you can load data from any of the available data sources in Power BI. With the data being loaded highly compressed into memory, it allows for very fast query performance and the ability to create complex visualisations without suffering loss in performance and usability.
With import you really get the full functionality of Power BI without any limitations on function. You can also combine data from multiple sources in the same report which is very useful if your data is not structured into one central database. This method also supports all of the connection types in Power BI Desktop, giving you a very wide range of connection options. The query performance of Import is also unmatched, because the data is cached in memory, queries can be performed with 0 visible latency leading to a (in general) fast performing report.
The main disadvantage of Import mode is the limitation on the data size, for the Power BI Pro user this will be 1GB per dataset. Depending on the size of the data this can be a deal breaker in terms of using this storage method. One way to overcome this is with Power BI Premium where the limitation is 10GB per dataset, and even more depending on the Premium SKU. But this comes with the added financial implication of Power BI Premium licensing.
In summary, Import is the method that provides the most flexibility in terms of features as well as being the fastest (provided your model has been created using best practices). In most cases this should be the preferred storage mode.
DirectQuery mode does not import data, but rather only consists of metadata (schema) defining the model structure. DirectQuery connects to the data source and queries the data in real-time, without having to import the data into the in-memory data model within Power BI. With this approach there are some features such as Quick Insights which are not available. But depending on your use case, this could be a worthwhile sacrifice for the ability to display near-real time data over large datasets.
DirectQuery does not have the same size limitations as Import, because the data is not cached in memory. This allows for scalability as this method is suitable for small to extremely large datasets. Another big advantage is the ability to have near-real time data available for reporting, which can be a big factor depending on your requirements. In the SAP space another substantial advantage that DirectQuery has over Import is that data permissions can be set with row-level security or BW-permissions as opposed to Import where the permissions can only be set by row-level security.
Unfortunately, the ability to handle extremely large datasets comes with some drawbacks. DirectQuery does not have the full functionality of Power BI, because Power Query and DAX are limited in functions, and calculated tables aren’t supported. Also, you are limited to only certain connection types. This method could also have worse performance than import because the queries are done directly against the data source. Slow connections and poorly constructed data sources will have a detrimental effect on the performance of reports using this method.
In summary, DirectQuery is a viable option for large datasets that Import would simply be unable to handle. But only if the data source is constructed using best practices and able to handle all of the queries that will be sent to it without hampering performance.
Live Connection is a type of data connection that, similarly to DirectQuery, does not store a duplicate copy of the data in the in-memory model in Power BI, but other than DirectQuery it also does not store any metadata data as well. Live Connection is only supported on SQL Server Analysis Services (SSAS) Multi-Dimensional, SSAS Tabular and Power BI Service Dataset data sources. The reason for this is that these three data sources are based on SSAS technology. Live Connection is a connection to a data model, thus all the data modelling is done on the aforementioned model.
Live Connections are perfect for near-real time data consumption. Just like DirectQuery the data source is accessed directly with no need to wait for data refreshes. With this method you also have no size limitations, and depending on your data source, query performance can be very good. Another big advantage is that external-hosted models and Power BI desktop models can enforce row-level security to limit the data that can be retrieved by users.
With Live Connections you will not have access to Power Query Transformations in Power BI Desktop. All of the modelling will be done on the data source side. You do have access to create some report level measures with some limitations, but for consistency it would be best practice to create these calculations on the data source side as well.
In summary, Live Connection is a viable option for Power BI reporting when near-real time reporting is required and it also aligns with the best practices of keeping all the data modelling on the database side, and to separate data model and report in Power BI.
Dual storage mode is used in a composite model. This is a situation where a model includes tables in Import storage mode and DirectQuery storage mode. A table set to Dual storage mode can act either as DirectQuery or Import depending on the relationship to the other tables in the model. For Dual mode to work successfully it needs to have characteristics of both storage modes, meaning that for all Dual mode tables there will be a copy of that table stored in the in-memory of Power BI as well as a version that works through DirectQuery. This will have the same impact on reporting and performance as if you had both versions of that table included in your report, this also needs to be taken into consideration when choosing storage modes.
With dual mode you have the advantage of using both DirectQuery and Import tables in your model. This gives you the flexibility to leverage the advantages of Import for great query performance on smaller tables that do not have near-real time requirements as well as the advantages of DirectQuery for large tables that do have near-real time requirements.
Just as with the advantages, with Dual mode you have the disadvantages of both DirectQuery and Import, because a dual mode table functions as the bridge between an Import and a DirectQuery table and thus contains features and by definition the disadvantages of both.
In summary, Dual mode is the best of both worlds. It provides functionality to combine DirectQuery sources and Import sources in the same dataset, providing the performance of Import and the scalability of DirectQuery in one package.
There are 3 types of real-time datasets in Power BI, where a Push dataset is they only one with a physical dataset in the service. The other 2 real-time datasets, namely Streaming dataset & PubNub streaming dataset, only stream the data or store it in a cache and display that in a dashboard but don’t save it in Power BI. The main difference with a Push dataset is that it functions as a regular dataset that is updated by pushing data into it rather than pulling data from a data source at refresh time. Once the tables are imported into the dataset, new rows are incrementally added to the tables. By increasing the frequency at which the data is pushed you can achieve the same real-time result as you would with a streaming dataset, but with the added ability to display historical data as well.
This method is particularly useful when creating a near real-time report where historical data is also of importance. The low latency and good scalability of the Push dataset makes it a well suited option to achieve these results.
Depending on your requirements a Push dataset would perhaps not be a viable option for you. There is some disadvantages in terms of network resources and the architecture of the dataset being pushed.
Comparisons of Data Storage Modes
The table below provides a comparison of the features of each storage mode:
|Feature||Import||DirectQuery||Live Connection||Dual Mode||Push|
|Query Performance||Excellent performance with no visible latency||Dependant on data source modelling & network connectivity||Dependant on data source modelling & network connectivity||Dependant on related tables||Dependant on data source modelling & network connectivity|
|Dataset Size||1GB with PBI Pro
10GB or more with PBI Premium
|No size limitation, only resource limitations of the server||No size limitation, only resource limitations of the server||Dependant on related tables||5 million rows per table|
|Multiple Data Sources||Yes||No||No||Dependant on related tables||No|
|Data Storage||Data stored in PBI service||Data stored in source, only metadata & schema stored in PBI service||Data stored in source, no data stored in PBI Service||Dependant on related tables||Data stored in PBI service|
|Scheduled Refresh||Maximum 8 schedules per day||Max every 15 mins||Not needed as connection is live to dataset||Dependant on related tables||New rows can be pushed every 30 seconds on average|
|Data Transformations||All transformations supported||Many transformations supported but with some limitations||Limited transformations supported||Dependant on related tables||No transformations supported|
|Quick Insights||Yes||No||No||Dependant on related tables||No|
|Q&A feature in Power BI||Yes||No||No||Dependant on related tables||No|
|Target Audience||For small to medium datasets where performance is important||For large datasets where near-real time requirements are important||For larger datasets where modelling in Power BI is not required||For composite models containing both Import & DirectQuery||For near-real time requirements where historical data is important|
We’ve seen that every storage mode has it’s own use case and advantages. So while our best practice remains that ‘Import should be the default’, it makes sense to carefully check what your specific requirements are for a solution.
So, in case you’re struggling with your use case and you’d like to get a second opinion, don’t hesitate to call us!
References and/or further reading:
- Manage storage mode in Power BI Desktop
- Data refresh in Power BI
- Dataset modes in the Power BI service
- DirectQuery in Power BI
- Use composite models in Power BI Desktop
- Real-time streaming in Power BI