Thanks for all your error reports, I didn't forget it. I'll cleanup my guide soon. Thanks again!

Exercise 8. Bash: more on redirection and filtering: head, tail, awk, grep, sed

Now that you tasted Linux, I will introduce you to the Unix way. Behold.

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

What this means in practical terms is that to become proficient in Linux you need to know how to take an output from one program and feed it to the other, usually modifying it in the process. Normally you do this by gluing several programs together using pipes which allow you connect output of one program to another. Like this:

What happens here is really simple. Almost every Linux program opens at lest these 3 files when started:

  1. stdin — the standard input. This is from where program reads something.
  2. stdout — the standard output. This is where program writes something.
  3. stderr — the standard error. This is where program whines about wrong things that do happen.

This is how it reads:

Start Program 1
    Start reading data from keyboard
    Start writing errors to display
    Start Program 2
        Start reading input from Program 1
        Start writing errors to Display
        Start Program 3
            Start reading input from Program 2
            Start writing errors to Display
            Start writing data to Display

There is another way to picture what happens if you like South Park type of humor, but beware: what was seen cannot be unseen! Warning! You will not be able to unsee this.

Let us consider the following pipeline which takes ls -al output and prints out only file names and file modification times:

ls -al | tr -s ' ' | cut -d ' ' -f 8,9

Here is an outline of what happens:

Start ls -al
    Get list of files in current directory
    Write errors to Display
    Write output to Pipe
    Start tr -s ' '
        Read input from ls -al via Pipe
        Leave only 1 space between fields
        Write errors to Display
        Write output to Pipe
        Start cut -d ' ' -f 8,9
            Read input from tr -s ' ' via Pipe'
            Leave only fields 8 and 9, throw away anything else
            Write errors to Display
            Write output to Display

To further elaborate, this is what happens on each step:

Step 1: ls -al, we get a directory listing. Every column here is called a field.

user1@vm1:~$ ls -al
total 52
drwxr-xr-x 2 user1 user1 4096 Jun 18 14:16 .
drwxr-xr-x 3 root  root  4096 Jun  6 21:49 ..
-rw------- 1 user1 user1 4865 Jun 15 19:34 .bash_history
-rw-r--r-- 1 user1 user1  220 Jun  6 21:48 .bash_logout
-rw-r--r-- 1 user1 user1 3184 Jun 14 12:24 .bashrc
-rw-r--r-- 1 user1 user1   64 Jun 18 14:16 hello.txt
-rw------- 1 user1 user1   89 Jun 18 16:26 .lesshst
-rw-r--r-- 1 user1 user1  634 Jun 15 20:03 ls.out
-rw-r--r-- 1 user1 user1  697 Jun  7 12:25 .profile
-rw-r--r-- 1 user1 user1  741 Jun  7 12:19 .profile.bak
-rw-r--r-- 1 user1 user1  741 Jun  7 13:12 .profile.bak1
-rw------- 1 user1 user1  666 Jun 18 14:16 .viminfo

Step 2: ls -al | tr -s ' '. We leave only one space between fields because cut does not understand multiple spaces as a way to tell several fields apart.

user1@vm1:~$ ls -al | tr -s ' '
total 52
drwxr-xr-x 2 user1 user1 4096 Jun 18 14:16 .
drwxr-xr-x 3 root root 4096 Jun 6 21:49 ..
-rw------- 1 user1 user1 4865 Jun 15 19:34 .bash_history
-rw-r--r-- 1 user1 user1 220 Jun 6 21:48 .bash_logout
-rw-r--r-- 1 user1 user1 3184 Jun 14 12:24 .bashrc
-rw-r--r-- 1 user1 user1 64 Jun 18 14:16 hello.txt
-rw------- 1 user1 user1 89 Jun 18 16:26 .lesshst
-rw-r--r-- 1 user1 user1 634 Jun 15 20:03 ls.out
-rw-r--r-- 1 user1 user1 697 Jun 7 12:25 .profile
-rw-r--r-- 1 user1 user1 741 Jun 7 12:19 .profile.bak
-rw-r--r-- 1 user1 user1 741 Jun 7 13:12 .profile.bak1
-rw------- 1 user1 user1 666 Jun 18 14:16 .viminfo

Step 3: ls -al | tr -s ' ' | cut -d ' ' -f 8,9. We leave only fields eight and nine, which are what we want.

user1@vm1:~$ ls -al | tr -s ' ' | cut -d ' ' -f 8,9
 
14:16 .
21:49 ..
19:34 .bash_history
21:48 .bash_logout
12:24 .bashrc
14:16 hello.txt
16:26 .lesshst
20:03 ls.out
12:25 .profile
12:19 .profile.bak
13:12 .profile.bak1
14:16 .viminfo

Now you will learn how to take output (text stream) from one program and pass it another, and how transform it.

Do this

 1: ls -al | head -n 5
 2: ls -al | tail -n 5
 3: ls -al | awk '{print $8, $9}'
 4: ls -al | awk '{print $9, $8}'
 5: ls -al | awk '{printf "%-20.20s %s\n",$9, $8}'
 6: ls -al | grep bash
 7: ls -al > ls.out
 8: cat ls.out
 9: cat ls.out | sed  's/bash/I replace this!!!/g'

What you should see

user1@vm1:~$ ls -al | head -n 5
total 52
drwxr-xr-x 2 user1 user1 4096 Jun 18 14:16 .
drwxr-xr-x 3 root  root  4096 Jun  6 21:49 ..
-rw------- 1 user1 user1 4865 Jun 15 19:34 .bash_history
-rw-r--r-- 1 user1 user1  220 Jun  6 21:48 .bash_logout
user1@vm1:~$ ls -al | tail -n 5
-rw-r--r-- 1 user1 user1  636 Jun 18 17:52 ls.out
-rw-r--r-- 1 user1 user1  697 Jun  7 12:25 .profile
-rw-r--r-- 1 user1 user1  741 Jun  7 12:19 .profile.bak
-rw-r--r-- 1 user1 user1  741 Jun  7 13:12 .profile.bak1
-rw------- 1 user1 user1  666 Jun 18 14:16 .viminfo
user1@vm1:~$ ls -al | awk '{print $8, $9}'
 
14:16 .
21:49 ..
19:34 .bash_history
21:48 .bash_logout
12:24 .bashrc
14:16 hello.txt
16:26 .lesshst
17:52 ls.out
12:25 .profile
12:19 .profile.bak
13:12 .profile.bak1
14:16 .viminfo
user1@vm1:~$ ls -al | awk '{print $9, $8}'
 
. 14:16
.. 21:49
.bash_history 19:34
.bash_logout 21:48
.bashrc 12:24
hello.txt 14:16
.lesshst 16:26
ls.out 17:52
.profile 12:25
.profile.bak 12:19
.profile.bak1 13:12
.viminfo 14:16
 
user1@vm1:~$ ls -al | awk '{printf "%-20.20s %s\n",$9, $8}'
 
.                    14:16
..                   21:49
.bash_history        19:34
.bash_logout         21:48
.bashrc              12:24
hello.txt            14:16
.lesshst             16:26
ls.out               17:52
.profile             12:25
.profile.bak         12:19
.profile.bak1        13:12
.viminfo             14:16
user1@vm1:~$ ls -al | grep bash
-rw------- 1 user1 user1 4865 Jun 15 19:34 .bash_history
-rw-r--r-- 1 user1 user1  220 Jun  6 21:48 .bash_logout
-rw-r--r-- 1 user1 user1 3184 Jun 14 12:24 .bashrc
user1@vm1:~$ ls -al > ls.out
user1@vm1:~$ cat ls.out
total 48
drwxr-xr-x 2 user1 user1 4096 Jun 18 14:16 .
drwxr-xr-x 3 root  root  4096 Jun  6 21:49 ..
-rw------- 1 user1 user1 4865 Jun 15 19:34 .bash_history
-rw-r--r-- 1 user1 user1  220 Jun  6 21:48 .bash_logout
-rw-r--r-- 1 user1 user1 3184 Jun 14 12:24 .bashrc
-rw-r--r-- 1 user1 user1   64 Jun 18 14:16 hello.txt
-rw------- 1 user1 user1   89 Jun 18 16:26 .lesshst
-rw-r--r-- 1 user1 user1    0 Jun 18 17:53 ls.out
-rw-r--r-- 1 user1 user1  697 Jun  7 12:25 .profile
-rw-r--r-- 1 user1 user1  741 Jun  7 12:19 .profile.bak
-rw-r--r-- 1 user1 user1  741 Jun  7 13:12 .profile.bak1
-rw------- 1 user1 user1  666 Jun 18 14:16 .viminfo
user1@vm1:~$ cat ls.out | sed  's/bash/I replace this!!!/g'
total 48
drwxr-xr-x 2 user1 user1 4096 Jun 18 14:16 .
drwxr-xr-x 3 root  root  4096 Jun  6 21:49 ..
-rw------- 1 user1 user1 4865 Jun 15 19:34 .I replace this!!!_history
-rw-r--r-- 1 user1 user1  220 Jun  6 21:48 .I replace this!!!_logout
-rw-r--r-- 1 user1 user1 3184 Jun 14 12:24 .I replace this!!!rc
-rw-r--r-- 1 user1 user1   64 Jun 18 14:16 hello.txt
-rw------- 1 user1 user1   89 Jun 18 16:26 .lesshst
-rw-r--r-- 1 user1 user1    0 Jun 18 17:53 ls.out
-rw-r--r-- 1 user1 user1  697 Jun  7 12:25 .profile
-rw-r--r-- 1 user1 user1  741 Jun  7 12:19 .profile.bak
-rw-r--r-- 1 user1 user1  741 Jun  7 13:12 .profile.bak1
-rw------- 1 user1 user1  666 Jun 18 14:16 .viminfo

Explanation

  1. Prints out only 5 first entries in directory listing.
  2. Prints out only 5 last entries in directory listing.
  3. Prints out only modification time and file name. Notice how I used awk, which is smarter than cut. Difference here is that while cut understands only single symbol (space in our case) as a way to tell field apart (field separator), awk treats any number of spaces and tabs as filed separator, so there is no need to use tr which removes unnecessary spaces.
  4. Prints out file name and modification time in this order. This again is something cat is not able to do.
  5. Prints out file name and modification time nicely. Notice how output looks much clearer now.
  6. Prints out only those lines from directory listing which contain the word “bash”.
  7. Writes directory listing output to file ls.out.
  8. Prints out ls.out. cat is simplest program available that allows you to print out a file and nothing more. Despite being so simple it is very useful when constructing complicated pipelines.
  9. Prints out ls.out replacing all bash entries with I replace this!!! sed is one powerful stream editor, which is very very very useful.

Extra credit

  1. Open man pages for head, tail, awk, grep and sed. Do not be intimidated, just remember that man pages are always there for you. With some practice you will be able actually understand them.
  2. Find grep options which allow to print out one line before and one line after the lines it finds.
  3. Google about awk printf command, try to understand how this works.
  4. Read Useless Use of Cat Award. Try some examples from there.

Discussion

Navigation

Learn Linux The Hard Way