A regular expression (regex) is a concept of matching a pattern in a given string, or a sequence of characters, not necessarily words. Regex is powerful and can be used in the following CLI, Scripting languages, email filters, and more: vi, tr, rename, grep, sed, awk, Perl, Python etc.
This article goes into detail on the usage of regex.
Basics
- Here are the commonly used expressions for grep using the -e flag, which will enable the expressions engine
- Special Characters
- . Matches any single character
- \ Used to find literal or special characters
- \s Used to match any single white-space characters
- \S Used to match any single non-white-space character
- \w Used to match any alphabetic character, digit, or underscore
- \W Used to match anything other than an alphabetic character, digit, or underscore
- \d Used to match any single digit between 0 and 9
- \D Used to match any non-digit
- [] Used to match any of the enclosed characters
- Instance (Quantifiers)
- ? The preceding item is optional and will be matched, at most, once
- * The preceding item will be matched zero or more times
- + The preceding item will be matched one or more times
- {N} The preceding item is matched exactly N times
- {N,} The preceding item is matched N or more times
- {N,M} The preceding item is matched at least N times, but not more than M times
- Pattern
- – Represents the range if it’s not first or last in a list or the ending point of a range in a list
- ^ Matches the empty string at the beginning of a line; also represents the characters, not in the range of a list
- $ Matches the empty string at the end of a line
- \b Matches the empty string at the edge of a word
- \B Matches the empty string provided it’s not at the edge of a word
- \< Match the empty string at the beginning of the word
- \> Match the empty string at the end of the word
- \| OR operator
- Tools
- Man7 Regex Reference
- Test matching patterns with Regxr
- Special Characters
Where Useful
- Email Filtering
- One particularly good use for regex is in email filtering since filtering rules to block or sort emails received can be made very intelligent with the use of regex
EXAMPLE: Imagine you want to block any emails arriving from domains with the TLD .xyz. The rule would be:
.*@.*
anything @ anything
.*@.*\.xyz
anything @ anything .xyz
- One particularly good use for regex is in email filtering since filtering rules to block or sort emails received can be made very intelligent with the use of regex
- Finding Strings With sed
- A common use of regex is in modifying strings in files such as an SOA record on domains
- For the following section we will update all SOA records in all /var/named/ zone files
sed -i 's/20[0-9]\{8\}/2016041901/g' *.db;/scripts/dnscluster syncallNOTE: This will search for the pattern beginning with “20” and eight following numbers, which will match our SOA, and sed will replace the string with our string 2016041901.
- For the following section we will update all SOA records in all /var/named/ zone files
- A common use of regex is in modifying strings in files such as an SOA record on domains
- Specific Filtering of Email Logs
- Let’s say there are long periods of time passing between when an email is sent and when the target address receives said email
- Commonly, on shared servers, you will need to tail through hundreds or even thousands of lines of Exim’s mainlog for mails sent just minutes prior
- Let’s see if we can parse results to something more manageable, filtering based on either queued time or delivery time exceeding 10 or more seconds
sudo tail -100 /var/log/exim_mainlog|grep -e "\w\+\@inmotionhosting\.com"|grep -e "QT=[0-9]\{2,\}[s]\|QT=[0-9\+][m]\|DT=[0-9]\{2,\}[s]\|DT=[0-9\+][m]" - This will result in printing strings that match any user@inmotionhosting.com matching our time criteria. In the included example below, we have omitted the email address and will return any mail with a QT or DT matching.
EXAMPLE:
[userna5@domain.com@ecngx303: ~]$ sudo cat /var/log/exim_mainlog|grep -e "\w\+\@inmotionhosting\.com"|grep -e "QT=[
0-9]\{2,\}[s]\|QT=[0-9\+][m]\|DT=[0-9]\{2,\}[s]\|DT=[0-9\+][m]"
2022-10-16 03:31:06.611 [1911794] 1ojy6j-0081L1-UT => systems-notices@inmotionhosting.com F=<cpanel@ecngx303.inmotionhosting.com>
P=<cpanel@ecngx303.inmotionhosting.com> R=smarthost_dkim T=remote_smtp_smart_dkim S=55459 H=smtp.servconfig.com [192.249.113.29]:587
I=[216.194.170.77]:37929 X=TLS1.2:AES128-GCM-SHA256:128 CV=yes DN="/CN=*.servconfig.com" C="250 OK id=1ojy6k-0009GF-Sc" QT=16s DT=16s
- Let’s break down how we specify our time requirements
grep -e "QT=[0-9]\{2,\}[s] \| QT=[1-9\+][m]"
- Our OR statement “\|” returns either the string “QT=[0-9 any number]\{2, two or more times\}” followed by “[s for seconds]” OR the string “QT=[0-9 any number \+ one or more times]” followed by “[m for minutes]”
- Let’s say there are long periods of time passing between when an email is sent and when the target address receives said email
Comments
0 comments
Article is closed for comments.