Quantcast
Channel: SQLBI
Viewing all articles
Browse latest Browse all 434

Duplicated names in DAX

$
0
0

This article describes how DAX resolves column and measure names in DAX, providing best practices to avoid conflicting names.

A fundamental rule you should follow writing DAX expression is: always specify the table name in a column reference, and never include the table name in a measure reference. In other words, you should always use this syntax:

Column reference: ‘table'[column]
Measure reference: [measure]

This best practice allows you to read a DAX expression in an easier way, because it lets you quickly distinguish between measures and columns. Why is this important? Every measure reference involves a context transition when it is called within a row context (such as in a calculated column, or in an iterator like SUMX or FILTER). Because the context transition changes the way the calculation is made, requires more memory, and impacts performance, it is a good idea to be aware of that in a visual way.

Usually, this is also a safe practice, because you can define a column with the same name in different tables, whereas you cannot define a measure that has the same name of any other column or measure in the entire data model, regardless of the table. If this rule was always true, the table name before a measure reference would be an unnecessary syntax element. The reason why such a syntax exists is because the reality is more complex, and this is the reason for this article.

Duplicated measure names

When I show the difference between column reference and measure reference, and I explain what is the best practice I described above, I always struggle with code generated by IntelliSense, Quick Measures, and Power BI queries, which often include the table name in column reference. We will cover IntelliSense separately, but an automatic DAX code generator probably cannot afford using the best practice, because there are conditions where a model can have multiple measures with the same name, in different tables. I think it is a very bad idea to intentionally design a model in this way, but to avoid breaking a model by importing a new table, this was probably a required design decision made by Microsoft engineers.

For example, consider the following model with two tables, Customers and Partners, having the same column names.

This is common and expected, because it is also a best practice having the same column name for columns in different tables having the same content. For example, columns used in relationships should have the same column name on both tables.

If you try to create a new measure using the name Revenues on any table of the data model (also on a table other than Customers or Partners), you will get the following error message:

The name ‘Revenues’ is already used for a column on table Customers*. Choose a different name.

* The table name could be “Partners” instead of Customers in this specific example.

Thus, you cannot create a new measure using an existing column or measure name defined in any part of the data model. However, you can obtain this naming conflict by creating a new column having the same name of an existing measure.

For example, consider a model that has the following measure defined in the table Customers:

Length := 
SUMX ( 
    Customers, 
    LEN ( Customers[Name] ) 
)

Now, create a calculated column with the same name (Length) in the table Partners:

Length = LEN ( Partners[Name] )

This is allowed, and at this point you have a measure called Length in the Customers table, and a column called Length in the Partners table, as you can see in the following screenshot.

This situation is not good, because at this point using the suggested syntax to reference the measure Length can be ambiguous: if you have a row context active in the Partners tale, then [Length] is a column reference, whereas any other use of [Length] outside of a row context in the Partners table is a measure reference. Moreover, if before creating the Partners[Length] column you had a [Length] measure reference in another measure defined in the Partners table, all of a sudden such a measure will stop working, trying to access the Partners[Length] column even if there is no row context in Partners, breaking the existing measure.

This definition might seem too abstract and technical, so it is a good idea to look at some practical example of what could happen by adding a column that has the same name of an existing measure.

Consider the following data model.

The content of the two tables is the following.

The two measures Margin and Net Margin are defined in the model as follows.

Customers[Margin] := 
SUMX ( 
    Customers, 
    Customers[Revenues] - Customers[Costs] 
)

Countries[Net Margin] := 
SUMX ( 
    Countries, 
    [Margin] * ( 1 - Countries[Tax Rate] ) 
)

The Net Margin measure returns the sum of the margin obtained from customers after removing the percentage of taxes for each country (the numbers we used in this demo do not represent any real-world scenario and actual tax rates in each country are different). The [Margin] measure referenced in the [Net Margin] measure is executed in a filter context containing all the customers of each country during the iteration of Countries. This model creates the following report.

You can see this model with this report in the file Duplicated Demo 03.pbix included in the ZIP file you can download. At this point, you can break the calculation made in Net Margin by importing a column called Margin in the table Countries. This can be done in many ways: by creating a calculated column, by modifying the underlying query in the Query Editor, or by modifying the expression assigned to a calculated table. In our example, we use this latter technique, modifying the definition of the table Countries using the following DAX code.

Countries = DATATABLE (
    "Country", STRING,
    "Tax Rate", DOUBLE,
    "Margin", CURRENCY,
    {
        { "United States", 0.25, 40 },
        { "Canada", 0.30, 22 }
    }
)

Now the content of the table is the following.

The content of the report is now completely different.

As you see in the previous screenshot, the Net Margin measure produces completely different numbers, because its calculation no longer works as expected. We can fix the report by modifying the definition of the Net Margin measure using a measure reference that also includes the Customers table name, as in the following example.

Countries[Net Margin] := 
SUMX ( 
    Countries, 
    Customers[Margin] * ( 1 - Countries[Tax Rate] ) 
)

Now your formula explicitly references the measure Margin from the Customer table, instead of implicitly referencing the column Margin from the Countries table. This will restore the original calculation.

The formula editor in Power BI highlights the Margin name with a purple color, so you can recognize that it is a measure reference instead of a column reference. In this way, you should be able to avoid the confusion between column reference and measure reference regardless of the presence or absence of the table name. However, this is not a perfect solution, because the code highlighting is not always clear and not always available in other editors.

Power BI does not allow you to create a column if a measure with the same name already exists within the same table. Thus, trying to add a Margin column in the table Customers is not possible, unless you remove or rename the existing Margin measure first.

DAX code generators

Considering the behavior of the DAX syntax, it is inevitable that a DAX code generator uses an explicit table name in any measure reference, just because in this way the code obtained is more resilient to column names conflicting with existing measure names.

As a data modeler, you should be aware of the existing names, and adopt a naming convention that should avoid confusing a column name with a measure name. For example, if [Revenues] could be a good name for a measure, the underlying column that is aggregated should have a different name, such as [Line Revenue], or maybe [_Revenues], where the prefix highlight a column name that should be hidden because used only to aggregate the corresponding measure. However, entering in the territory of what should be the right naming convention is an entire different topic, and it’s impossible to establish a set of rules that can be universally adopted. There are too many scenarios to consider, and what is important is being consistent with its own set of rules. But you need to define one, especially if your model will grow and will be modified and maintained by different people.

If you keep a consistent naming convention, your chances of a name conflict between column and measure names is minimal, and you can continue to omit the table name from a measure reference in your DAX expressions. This practice leads to more readable and maintainable code.

It is understandable that you might be tempted to always include a table name when you use a measure reference in future projects. However, the very reason I wrote this article is to highlight that, despite this potential issue, it is very important to use the table name always for column references and never for measures references. My suggestion is to be aware of the possible side effects of importing new columns and tables in a data model, checking whether the column names in the model are conflicting with some of the measure names. Writing this, I think that this health check would be a very useful feature to add in DAX Studio, too.

Suggested best practices

If you are an avid reader, you should understand the implications of different choices, and why we provide the following guidelines.

If you are lazy and you skipped the article just to see the conclusion, here is what you should do:

  • Never use a table name in a measure reference
  • Always use the explicit table name in a column reference
  • Check whether the names of new columns imported in a model might conflict with the name of existing measures. If yes, rename the columns or the measures, so that such a conflict does not exist anymore.

There are conditions when a measure name is not unique in the data model, and there could be columns in other tables using the same name of a measure. However, this situation should be avoided, because a model that keep uniqueness of measure names against column names is much safer to use and to analyze (it also improves user interface).

Examples included

You can download a ZIP file that includes five examples:

  • Duplicated Demo 01.pbix: initial example with Customers and Partners without measures and calculated columns
  • Duplicated Demo 02.pbix: example with Customers and Partners with a measure and a calculated column called Length
  • Duplicated Demo 03.pbix: initial example with Customers and Countries tables
  • Duplicated Demo 04.pbix: modified example with a column Margin in Countries tables, which breaks the report
  • Duplicated Demo 05.pbix: final example that fixed the Net Margin measure by including the table name Customer to the Margin measure reference

Viewing all articles
Browse latest Browse all 434

Trending Articles