Sunday, March 26, 2017

Regex rules for Python...

Something to keep in mind about Regex.

? - matches zero or one of the preceding group
*matches zero or more of the preceding group
+matches one or more of the preceding group
{n} - matches exactly n of the preceding group
{n,} - matches n or more of the preceding group 
{,m} - matches 0 to m of the preceding group 
{n,m} - matches at least n and at most m of the preceding group
{n,m}? or *? or +? - performs a nongreedy match of the preceding group 
^spam - means the string must begin with spam
spam$ - means the string must end with spam
. - matches any character, except newline characters
\d, \w and \s - match a digit, word, or space character
\D, \W and \S - match anything except a digit, word or space character
[abc] - matches any character between the brackets
[^abc] - matches any character that isn't between brackets


In order to use  package re should be imported after that compile method can be invoked:

regex = re.compile(r'spam')

Some useful methods:

regex.search('abc') - returns first occurrence of matched pattern
regex.group(1) - returns first group in case of multiple of those ie re.compile(r'(\d\d)-(\d\d\d))
regex.group() - returns the entire matched text
regex.groups() - returns tuple of multiple values
regex.findall() - returns the strings of every match in the searched string ie list of strings if there's more than 1 match
regex.sub() - takes two arguments: first string for replacement and second string where to replace (in case of match) ie regex.sub(r'abc', 'abc is the most popular shortcut')

List is not complete and I will try to add some more examples later...


No comments:

Post a Comment