Power BI and Regular Expressions

I read a quote somewhere along the lines of…

If you are trying to solve a problem with regular expressions...
you now have two problems

This seems a little harsh for a useful tool that has been with us for what seems a very long time.  Plenty of languages offer support for RegEx string searching and pattern matching but not so far in Power BI Desktop.

I thought I would share a simple and quick way to enable the use of RegEx in Power BI.

This could be useful in a number of ways.  RegEx patterns could help validate strings from source systems:

  • Valid email addresses
  • URLs
  • IP addresses
  • Product Codes
  • Stripping numbers or texts from strings

I’m going to look at the first scenario and demonstrate with a simple exercise how you might use RegEx to test if email addresses are valid.

DAX and M don’t have any dedicated functions for RegEx so the technique I’m sharing uses R Script in the Query Editor.

You will need to have an instance of R installed on your workstation and have your Power BI options configured for R.

Lets start with some data.  Here is a small sample of good and bad email addresses which I will use to add a column to display which email addresses are good or bad.

Email Address
not an email address
a b c@email.com

Once the data is loaded using the Enter Data function in Query Editor, click the Run R Script button on the Transform Tab.

Transform Tab

This opens the following R Script editor;

R Editor

Note that any line that starts with a # character will be ignored as a remark/comment line.

Next add the following code;

# 'dataset' holds the input data for this script
pattern <- "^[[:alnum:].-_]+@[[:alnum:].-]+$"

and that’s it!

Lets have a brief look at what each line is doing.

The first line simply adds the Regular Expression pattern to a variable. This is for readability and also makes updating the pattern less likely to break the code.

The second line defines a function which will test any string passed against the pattern and return either TRUE or FALSE as appropriate.

The final line, output will return the updated dataset as a table.

The within function adds a new column/vector to the dataset.  Note the very first line in the script editor advises the input data is held in a variable called dataset.

ValidEmail is the column/vector we are adding and calls the isValidEmail function passing data from the Email column as a parameter.

Once added we should see the following output:


We now have a new column called ValidEmail which shows TRUE/FALSE for each line depending on how the data in the Email column is matched with our regular expression pattern.

Using RegEx for validating email addresses is an interesting can of worms.  The characters allowed to be used in a valid RFC email address makes using RegEx for email validation complex.  Perhaps in this case ValidEmail could be used as a warning.

Hopefully this might be useful if you are considering using regular expressions in your Power BI report. Even if not, it shows a method of adding columns using R Script to your dataset.

Feel free to download the PBIX file from here



5 3 votes
Article Rating

Leave a Reply

Inline Feedbacks
View all comments
6 years ago

Reblogged this on MS Excel | Power Pivot | DAX | SSIS |SQL.

Sue Loh
Sue Loh
6 years ago

The script appears to have been incompletely pasted. Here is the script from the .pbix file, for other readers:

# ‘dataset’ holds the input data for this script
pattern <- "^[[:alnum:].-_]+@[[:alnum:].-]+$"
isValidEmail <- function(x) {grepl(pattern , as.character(x), ignore.case=TRUE)}
output <-within(dataset,{ValidEmail=isValidEmail(dataset$Email)})

5 years ago

Hi Phil,
Thank you for your post! I was looking to run a similar script on Power BI.
Unfortunately, my system failed to properly execute your R script.

I get the following error:
shared Emails = let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WKkktLnEozs9NTcnPTczM00vOz1WK1YlWSszLL8lILXJIBYrmJKakFKUWF8MlgXIKiXkKYDkFqCREl0KSQjJED0RxLAA=”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Email = _t]),
#”Run R Script” = R.Execute(“# ‘dataset’ holds the input data for this script#(lf)pattern <- ""^[[:alnum:].-_]+@[[:alnum:].-]+$""#(lf)isValidEmail <- function(x) {grepl(pattern , as.character(x), ignore.case=TRUE)}#(lf)output <-within(dataset,{ValidEmail=isValidEmail(dataset$Email)})",[dataset=Source]),
output = #"Run R Script"{[Name="output"]}[Value]


4 years ago

The file is not anymore available for download and it seems there’s part of the code missing on the page (Output line for example).
I also have a question : in your example it’s working fine as dataset holds only 1 field but what if it’s a full table ? How do you pass the field in parameter ?

Phil Seamark (MVP)
4 years ago
Reply to  Jeff

I moved the article here