4

Extract Emails From a Text File Using Grep Command in Linux

 11 months ago
source link: https://www.geeksforgeeks.org/extract-emails-from-a-text-file-using-grep-command-in-linux/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Extract Emails From a Text File Using Grep Command in Linux

Let’s consider we have a text file that contains lots of text and in that text file there are some email IDs present, and we have to find the all email IDs present in that text file. So what we can do? How can we find all email IDs present in that text file?. One way to find all email IDs manually, but this is a very time-consuming and boring process. Another option is can take the help of the grep command Linux to find all email IDs in text files.

Grep command on Linux

Grep command in Linux finds the pattern in a string or file and prints all lines or sub-strings that match the given pattern. The pattern provided to the grep command is generally known as a regular expression. The general syntax of the grep command is as follows:

$ grep <pattern> filepath/filename

The general format of Email IDs

To write the regular expression to provide the, grep command, first we need to understand the general pattern or format of email IDs.

The general form of email IDs is as follows:

<username>@<domain>.<address>

Email IDs had mainly 3 fields username, domain, and address. Let’s write regression for each field.

Regular expression for filtering Email ID

Now let’s write the regular expression for filtering email IDs. Let’s start with the username. Username can contain capital (A-Z) and small(a-z) letters, digits (0-9), and special symbols like full-stop, and underscore hyphens. So, the regular expression for the username will be  [a-zA-Z0-9._-] 

Domain and address generally contain capital (A-Z) and small (a-z) letters. So the regular expression for Domain and address will be [a-zA-Z] 

Now let’s combine the regular expression of email ID fields and make one regular expression for email IDs’. We can combine patterns using \+ characters. So final regular expression will be:

[a-zA-Z0-9._-]\+@[a-zA-Z]\+.[a-zA-Z]\+

Filtering Email IDs using the grep command

 We have a regular expression pattern. We can use that pattern to print all email ids. Let’s take one text file for example.

This is sample text file.
This file contains email IDs.
[email protected] this is email ID of person 1.
[email protected] this is email ID of person 2.
[email protected] is email ID with Gmail domain.
These are the email IDs.

Name of File: emails_file.txt.

Let’s use the grep command with the regular expression we created on this file and see the result.

$ grep -e “[a-zA-Z0-9._-]\+@[a-zA-Z]\+.[a-zA-Z]\+”  emails_file.txt

-e option is used to mention the pattern to find the filter in the file.

Following is the result of the above grep command:

Screenshotfrom20220611224024.png

In the result of the above command, we can see that the email IDs are printed but with email IDs, the other text on the respective email IDs line is also printed.

The grep command gives us the -o option to print the string with the only matched pattern. We just have to use the -o option with the grep command to get a string that matches the given pattern.

grep -oe “[a-zA-Z0-9._-]\+@[a-zA-Z]\+.[a-zA-Z]\+”  emails_file.txt

The following is the result of the above command:

Screenshotfrom20220611224307.png

Now we can see that only email IDs are printed. This is the result we wanted.

Last Updated : 27 Jun, 2022
Like Article
Save Article

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK