Even if You Don't REGULARLY EXPRESS Yourself Well

What it is

A regular expression is a sequence of symbols and characters expressing a string or pattern to be searched for within a longer piece of text. It is a strict search code that must find an exact match according to the regular expression used. You can search for words, numbers or symbols. They can be specific words or numbers, or any given word or number. The concept is used in most coding languages and has many uses.

regexeater

LabVIEW has a great native function called "Match Regular Expression". The required inputs are the string you want to search and the regular expression you are searching for. The cool part in LabVIEW are the outputs. You have the expected outputs like everything before the match, the entire match and after the match. You also have submatches that are specified parts of the entire match. If you get good at Reg Ex, you can use one function and pull all of the information you need out of a string.

LabVIEW

There are many times that I have found this useful. I have used it to parse through a json file that was structured making the unflatten from json file super difficult to implement. I helped a coworker use it to parse through a web page to find the relevant information.

How It Works

A regular expression search finds exact matches using syntax that allow you to find matches with specific characteristics.  As an introduction, here are some basic examples. 

Words: A regular expression can handle exact words. name will find the first time "name" is found in your search string. \w will return any letter. Modifiers can change how many letters you find.

Numbers: Just like letters, you can search 1234 and it will return "1234" but if you need to find any number \d will do the trick. If you want a full number with (or without) a decimal, [0-9\.]+ is your solution.

digit

White Space: \s will find any white space.

Brackets: [ ] are very useful and create a set of possible matches. [a-z] will find any single character from a to z (lowercase) . Parentheses ( ) create capturing groups. This is how you can get submatches. If you want to find the name without returning "name" in the string "Name: Bob", you would use Name:\s(\w+). The whole match would return "Name: Bob", but submatch 1 would be "Bob". You could also pull "name" and "ID" in one search using parentheses twice in one regular expression.

Escape character: You may have noticed by now that \ has a special meaning. finds "w" but \w is a code that returns any letter. If you need to find any of the characters with special uses, you will have to escape them with a \. To return a "[" you must put \[.

Anything: it is often useful to find anything between specific words. In such a case I use one of two options: or [\s\S]. They are each a little different. The period searches anything except a line break, the other will return anything at all. \s finds whites pace and \S finds anything but white space, so within brackets it finds anything that is or isn't white space.

Quantifiers: { } * + and ? are all quantifying characters. \w{3} returns exactly 3 letters, \d{2,} returns two or more numbers, [\s\S]* returns zero or more of anything, and \d+ returns one or more numbers. The ? is a lazy qualifier, so it will match as few as possible to still satisfy the previous argument. This is one of the least intuitive things I have talked about so I will give you a good example. If you have a list of "name: ..., lots of other irrelevant information, id: ...," repeated 3 times and you search name:\s(\w+)[\s\S]+id:\s(\d+) your two submatches will be the first name and last ID number. That is a because by default, regular expressions find the largest match that they can and the find anything plus will go until the last ID number. To get the first name and ID you would have to add a ? so your search looks like this: name:\s(\w+)[\s\S]+?id:\s(\d+)

name id

Troubleshooting

Because regular expressions are very strict, it is often the case that your first attempt doesn't work. The best way to resolve the problem is to break your expression down into small parts and only continue once you are returning what you want.

Check out https://www.regexpal.com/ or https://www.rexegg.com/regex-quickstart.html for additional help

Haden Heath

Associate Systems Engineer, Business Development Engineer

Haden developed an interest in problem solving and looking for opportunities to innovate while studying at Brigham Young University. He earned his BS in Mechanical Engineering and a Business Management minor. During his schooling he worked on projects including a LabVIEW VI built to determine thermal properties using fluorescence from green lasers and a bullet-proof barrier for law enforcement that can be folded down and stored in the trunk of a vehicle.

Haden joined the Endigit team in 2018 immediately following his graduation from BYU and hopes to help expand the business while developing LabVIEW software.

Haden enjoys spending time with his wife and daughter, especially when doing outdoors activities. Hiking, camping, boating, hunting, and sports are among his favorite activities.

Add new comment