Finding Key Words Within Text in Power Automate

Introduction

Imagine you need to know if a particular document or block of text contains certain key words, and/or you want to count the number of times those key words appear in a particular text.

Also imagine Power Automate was your only tool to do that, and you can’t use any kind of AI or other premium feature like a HTTP request to a 3rd party service. You have to roll your own using standard actions.

Well, imagine no longer because I’m going to show you how.

I started this thinking it was going to be easy. It isn’t really as straightforward as I first thought.

It’s one thing finding a word in some text using the contains operator in a condition or filter action, or using the indexOf function, but it’s quite another to know that you’re looking at a distinct word.

In other words, when you look for laughter, you might find slaughter, and that’s not what you want. Even if you looks for <space>laughter<space> you won’t get slaughter but you’ll miss laughter at the beginning or the end of a sentence, in quotes or brackets or with any other punctuation alongside.

The Flow

So, the pseudocode-ish algorithm for this is:

  • Define a list of key words as an array.
  • Define your text to search
  • Filter the array of key words for whether the text to search contains the word
  • For each of the key words found in the text
    • Find each instance of that key word
    • Record the character before and after each instance
  • Filter the full list of all instances of each word for whether the character before or after the word is not an alpha character
  • Build a distinct list of the resulting words along with a count of how many times that word appears.

So here’s how to do it. If you want to skip the explanation, just scroll to the end where you can download a .zip file which you can import into your own power automate account to use in your own solutions.

Firstly let’s define the key words and the text we’re going to look inside. I’ve hard coded these into Compose actions. Yours can come from an external data source.

The key words are simply a comma separated list which we turn into an array later by splitting on the comma, but yours could already be in an array (such as a SharePoint list). If that’s the case there’s no need to join the array, just use a Select action to make a single column collection of words.

Now let’s initialise a bunch of variables we use later:

Now all the initialisation is out of the way, we get into the guts of it.

Firstly, split the string of keywords into an array if it isn’t already. If you have a SharePoint list full of key words you’ll need to put the “value” property of the Get items action into a Select then put the expression item()?[‘Title’] into it to create a single column collection for example.

Regardless of how you get there, the result you want is a single column table of key words like this:

Now filter that array for the text to search contains the current item:

Now we’re going to iterate through each key word that survived the filter. If no words were found this will pretty much be the end of the flow (there’s nothing to go into this loop), but if there were, one or more iterations of this loop will occur:

The first action is to set the sSubstring variable we defined earlier as the full text to search, then add a Do until loop. That’s a Do until loop inside an Apply to each loop, and because we set variables inside the apply to each, set its concurrency to 1:

Now let’s look a the Do until loop:

The first two actions inside the return the indexOf (the number of characters from the start of sSubstring) that this word first occurs, and the lastIndexOf which is number of chars from the start of sSubstring that this word last occurs. If there is only one instance of this word within the text the two indices will be the same.

The loop ends when both these numbers are the same, so in that case we only get one iteration of the Do until loop. If there is more than one instance of a word, then the numbers will be different so the loop continues.

Therefore, each iteration of the Do until loop represents each instance of the found word, while the outer Apply to each loop represents each word.

Now for each instance of the word, we need to see what characters exist directly before and after the word. e.g if the word is slaughter and we want laughter, then the character before will be s and the character after will be a space or (full stop etc).

charBefore

Substring() can be a bit of a finicky function so you have to make sure you don’t feed it anything it can’t digest, hence the if(). Basically if the index of this instance of the word is greater than 1, then it’s not the beginning of the text to search, so we then use substring() to find the character immediately before by subtracting 1 from the indexOf it. If it is the first word, then return a space character (any non-alpha car will do).

charAfter is bit more complex:

charAfter

If the whole length of the text to search (sSubstring) is the same as the index of this instance of the word plus its length, then we know it’s the last word in the text, so we return a space, else take the character after the word by taking a single character substring of the text where the start index is the sum of the index of the word and its length.

Now push the word and the results of those two Compose actions onto an array. The current item is from the outer Apply to each, found within Dynamic content, usually right down the bottom of the list:

Now we have to chop off the bit of the text to search that we just looked at (the first instance of the word, and set it as the sSubstring variable so the text to search we feed into the next iteration of the loop has one instance fewer of the word in it.

We do this by taking a sub string of sSubstring (defined in a compose above) that starts at the point where the first instance of the word ends (by adding its index to its length)

That’s the end of the Until loop. For this iteration of the Apply to each, we’ll have one or more rows in the array called aFoundWords with columns for the word, the character before the word (or space if it’s the start) and the character after the word (or space if it’s the end).

This until loop will run within each iteration of the apply to each until we have an array with at least the number of words found by that original Filter array action (before the looping) but maybe more.

As you can see, some have a space (or dot etc) before and after (good, it’s a word on its own) and some have a letter before or after (bad, it’s part of a longer word)

So, let’s filter out those bad ones:

The result:

We no longer need those before and after chars, so drop them in a Select:

The result:

Now we need another Apply to each. This one is used to transform the output you see above, where the same word appears multiple times, into one where each word only appears once, but we get an integer count of the instances of that word.

Start by filtering the same array we feed into the Apply to each for this word. The result will be an array one or more instances of the same word.

Now take its length:

Next, we use a Select action to do something that doesn’t make sense just yet. We already defined the array called aDistinctWords but haven’t put anything on it yet. Of course, this is a loop so we do other stuff later that puts data into this array. The array contains objects but we only care about the word itself, so drop the other properties with a Select:

Now we want to know whether the result of that Select contains the word we’re looking at in this iteration:

If it does, then the work for this word is already done (it’s the second or third etc time we’ve seen this word in this Apply to each loop so it’s already on the array), if not (it’s the first iteration where this word has come up) then push it onto aDistinctWords along with the number of instances we found it, that we got by getting the length() of the output of the Filter array action earlier in the loop.

That’s the end of this second Apply to each loop.

I’ve just put this into an HTML table for demonstration purposes, but you may want to transform this data another way, such as joining as a comma separated string or whatever.

If you like my work and feel I deserve a beer for it, then feel free to make a small donation.

Here is a link to to this as a Power Automate package

And here it is as a Logic Apps template

Any questions or comments below.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s