Thursday, 14 October 2021

How Regular Expression works in Python

 




How Regular Expression works in Python


Introduction:


   In this blog, we are going to discuss how regular expressions are works when dealing with text data. 

    Regular Expressions knew as regex.  It is used to match or find strings of text such as particular characters, words, sets of strings, patterns of characters, or numbers. This is required when dealing with raw data from the web, which would contain long text, repeated text and HTML tags.


How It Works

    
    The python has module name named re is used to work with regular expression.     
import re


Regex Flags:

    Regular expression may include some basic flags. It control various aspects of matching. There are I, L, M, S, U, X.

  •    re.I:    is used for ignoring casing. 
  •    re.L:   is used to find a local dependent. 
  •    re.M:  is useful if you want to find patterns throughout multiple lines. 
  •    re.S:   is used to find dot matches. 
  •    re.U:   is used to work for Unicode data. 
  •    re.X:   is used for writing regex in a more readable format.

    

Regex Patterns:


    Patterns are characters that are performed in individual ways by a Regular expression engine.


    1. [ab]
        Match the single occurrence of character a and b.

    2. [^ab]
        Match characters except for a and b.
    
    3. [a-z]
        Match the character range of a to z.

    4. [^a-z]
        Match the character range except a to z.

    5. [A-Z]
        Match the character range of A to Z.

    6. [0-9]
        Match the character range of number 0 to 9.

    7. [a-zA-Z]
        Match the character range of a to z as well as A to Z.

    8. [ ]
        Match the any sing character.

    9. \s
        Match the any whitespace character.

    10. \S
        Match the any non-whitespace character.

    11. \d
        Match the any digit.

    12. \D
        Match the any non-digit.

    13. \w
        Match the any words.

    14. \W
        Match the any non-words.

    15. \b
        Match the any word boundary.

    16. \B
        Match the any non-word boundary.

    17. ^
        Starting of the string.

    18. $
        Ending of the string.

    19. (a|b)
        Match the character either a or b.

    20. a?
        The occurrence of a is zero or one but not more than that.

    21. a*
        The occurrence of a is zero or more than that.

    22. a+
        The occurrence of a is one or more than that.

    23. a{n}
        Match n number of occurrence of a.

    24. a{2}
        Match exactly two occurrence of a.

    25. a{2, }
        Match simultaneously two or more occurrence of a.

    26. a{2, 6}
        Match simultaneously between 2 to 6 occurrence of a.

    

Regex Functions:

    The module defines several functions and used to find the patterns and then can be processed according to the requirements of the application.

    1. re.split() - This checks for match of the strings where the split have occurred.

    2. re.match()This checks for a match of the string only at the start of the string. So, if it finds the pattern at the start of the input string, then it returns the matched pattern; if not, it returns a none.

    3. re.search()This checks for a match of the string anywhere in the string. It finds all the occurrences of the pattern produce a match with the given input string.

   4. re.sub()This checks for a match of the string where matched occurrences are replaced with the content of replace variable.

    5. re.findall() - This checks for a match of the string containing all matches.


Examples:


1. If you want to split the sentence into words with white spaces, using re.split() function.
import re
print(re.split('\s+','I like this blog.'))
['I', 'like', 'this', 'blog.']

2. If you want to extract the email id, using re.findall() function.
import re
doc = "My email addresses are
	abc@gmail.com, xyz@yahoo.com"

addresses = re.findall(r'[\w\.-]+@[\w\.-]+', doc)
for address in addresses:
 print(address)
abc@gmail.com
xyz@yahoo.com

3. If you want to replace word to another word, using re.sub() function.
import re
doc = "Hi, My name is Adam"

new_doc = re.sub(r'Adam', r'John', doc)
print(new_doc)
Hi, My name is John



Conclusion:

    In summary, we are discussed about how regular expressions are works when dealing with text data.

Thank you...




References:

    [1]: Natural Language Processing Recipes Unlocking Text Data with Machine Learning and Deep Learning using Python link

1 comment:

  1. Doing good job Mike.keep do what you learn.and be a inspiration to all.to who to do.and my word congratulations for your upcoming anything else about your future life

    ReplyDelete