This article describes how to consider possible optimization approaches to improve performance of nested iterators in DAX.
One of the possible causes of slow performance in DAX is the presence of nested iterators in an expression. The real issue is not the presence of an iterator by itself, but the cardinality of the materialization required by the lowest level of context transition. While it is true that moving most of the workload to the storage engine is a good idea, finding the right balance between materialization and storage engine calculation is the ultimate goal.
For example, consider the following data model of a sample Contoso database, which has one million rows in the Sales table.
The measure Gross Amount is defined as follows:
Gross Amount := SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )
Both Product and Sales have a discount that should be applied to the Gross Amount measure. Thus, an intuitive way to solve the problem is the following:
Sales Amount Slow := SUMX ( Customer, SUMX ( 'Product', VAR DiscountedProduct = 1 - 'Product'[Product Discount] VAR DiscountedCustomer = 1 - Customer[Customer Discount] RETURN [Gross Amount] * DiscountedProduct * DiscountedCustomer ) )
The measure Sales Amount Slow produces the following result, which is correct.
The Sales Amount Slow measure is easy to read, but it is also the slowest one. Analyzing the query produced by the previous report with DAX Studio we isolated only the Sales Amount Slow measure. This calculation generates large materializations up to 239,684 rows, which is the number of existing combinations of customers and products in the Sales table.
EVALUATE SUMMARIZECOLUMNS ( ROLLUPADDISSUBTOTAL ( 'Date'[Year], "IsGrandTotalRowTotal" ), "Sales_Amount", [Sales Amount Slow] ) ORDER BY [IsGrandTotalRowTotal] DESC, 'Date'[Year]
The materialization happens because of the context transition of the Gross Amount measure referenced within the Sales Amount Slow one. In fact, expanding the Gross Amount in the Sales Amount Slow measure would produce this equivalent measure:
Sales Amount Slow Expanded := SUMX ( Customer, SUMX ( 'Product', VAR DiscountedProduct = 1 - 'Product'[Product Discount] VAR DiscountedCustomer = 1 - Customer[Customer Discount] RETURN CALCULATE ( SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] ) ) * DiscountedProduct * DiscountedCustomer ) )
The expanded code of Sales Amount Slow Expanded clarifies that there are actually three nested iterators and not just two. Only the innermost iterator pushes the evaluation in the storage engine, whereas the other ones must be executed by the formula engine. In other words, the formula engine iterates the result of the measure Gross Amount computed by the storage engine – not because it is a measure, but because it is the innermost aggregation that can be performed by the storage engine.
An alternative approach would be that of removing any context transition in the calculation, creating a single iterator and relying on the RELATED function to retrieve the discount of product and customer for each transaction in Sales.
Sales Amount Intermediate := SUMX ( Sales, VAR LineAmount = Sales[Quantity] * Sales[Unit Price] VAR ProductDiscount = RELATED ( 'Product'[Product Discount] ) VAR CustomerDiscount = RELATED ( 'Customer'[Customer Discount] ) RETURN LineAmount * ( 1 - ProductDiscount ) * ( 1 - CustomerDiscount ) )
The Sales Amount Intermediate measure produces the same result of Sales Amount Slow as long as the underlying data is a floating point value (Decimal Number in Power BI). In case the data model uses currency data type (Fixed Decimal Number in Power BI) there could be differences in the result, because of the different rounding of the decimal part happening at different steps of the calculation.
The performance of the Sales Amount Intermediate measure is better, as shown by the following DAX Studio screenshot.
However, this approach has two issues. The biggest one is that the business logic defined in the Gross Amount measure must be duplicated within the Sales Amount Intermediate measure – that is, the code duplicated in the LineAmount variable. The second issue is that the presence of different data types in the data model or of some more complex calculations might generate a CallbackDataID for each row of the Sales table.
For example, the following is the result observed in DAX Studio in case the Unit Price data type is a currency instead of a floating point – this condition can be obtained in Power BI by changing the data type from Decimal Number to Fixed Decimal Number, but keep in mind that in this case the result of the calculation will be different because of rounding differences.
The optimal approach is a balanced one. The materialization caused by the slowest measure depends on the cardinality of the iterators. Whenever possible, the cardinality should correspond do the one required by the terms used in the calculation, instead of simply iterating the tables where those terms are stored. In our calculation, we need only one column from each of the Customer and Product tables. Therefore, it is possible to rewrite the code by materializing the data for all the transactions having the same discount percentage, instead of materializing by products and customers that might have the same discount.
Sales Amount Optimal := SUMX ( VALUES ( Customer[Customer Discount] ), SUMX ( VALUES ( 'Product'[Product Discount] ), VAR DiscountedProduct = 1 - 'Product'[Product Discount] VAR DiscountedCustomer = 1 - Customer[Customer Discount] RETURN [Gross Amount] * DiscountedProduct * DiscountedCustomer ) )
The Sales Amount Optimal measure might not have the same efficiency of Sales Amount Intermediate when used with floating point number in Unit Price (data type set to Decimal Number in Power BI) because it generates two storage engine queries instead of one, but it is usually very close and does not require duplicating the business logic of the Gross Amount measure.
However, in case of a currency data type in Unit Price (data type set to Fixed Decimal Number in Power BI) the performance of the Sales Amount Optimal measure does not change, being faster than the Sales Amount Intermediate measure for the same data type. Moreover, with currency data type the Sales Amount Optimal measure also produces a number much closer if not identical to the one of Sales Amount Slow, because of the reduction of rounding differences.
Conclusions
Nested iterators in DAX might have performance issues if the combined cardinality of the iterators is large. Only the innermost iterator can be pushed to the storage engine and oftentimes this iterator is “hidden” in a measure. The best practice is to reduce the cardinality of iterators that invoke measures so that they always minimize the materialization required to the storage engine.