HomeTutorsContact
commandline
Awk Introduction
Konrad Kuśmierz
Konrad Kuśmierz
June 21, 2023
2 min
Awk Introduction

Step 1 - AWK one-liner collection

I love perl and I use it for most scripts but nothing beats awk on the commandline. AWK is a pattern matching and string processing language named after the surnames of the original authors: Alfred Aho, Peter Weinberger and Brian Kernighan. Print selected fields (at a fixed position) Split up the lines of the file file.txt with ”:” (colon) separated fields and print the second field ($2) of each line:

awk -F":" '{print $2}' file.txt

Same as above but print only output if the second field ($2) exists and is not empty:

awk -F":" '{if ($2)print $2}' file.txt

Print selected fields from each line separated by a dash:

awk -F: '{ print $1 "-" $4 "-" $6 }' file.txt

Print the last field in each line:

awk -F: '{ print $NF }' file.txt

Print every line and delete the second field:

awk '{ $2 = ""; print }' file.txt

Print matching lines Print field number two ($2) only on lines matching “some regexp” (fiel separator is ”:”):

awk -F":" '/some regexp/{print $2}' file.txt

Print lines matching “regexp a” and lines matching “regexp b” but the later ones are printed without newline (note the printf):

awk '/regexp a/{print};/regexp b/{printf $0}' file.txt

Print field number two ($2) only on lines not matching “some regexp” (fiel separator is ”:”):

awk -F":" '!/some regexp/{print $2}' file.txt

or

awk -F":" '/some regexp/{next;}{print $2}' file.txt

Print field number two ($2) only on lines matching “some regexp” otherwise print field number three ($3) (fiel separator is ”:”):

awk -F":" '/some regexp/{print $2;next}{print $3}' file.txt

The “next” command causes awk to continue with the next line and execute “{print $3}” only for non matching lines. This is like /regexp/{…if..regexp..matches…;next}{…else…}

Print lines where field number two matches regexp (apply regexp only to field 2, not the whole line):

awk '$2 ~ /regexp/{print;}' file.txt

Here is an example parsing the linux “ps aux” command. It has in the eighth column the process state. To print all processes that are in running or runnable state you would look for the letter “R” in that 8-th column. You want as well to print line 1 of the ps command printout since it contains the column header:

ps aux | awk '$8 ~ /R/{print;}NR==1{print}'

Print the next two (i=2) lines after the line matching regexp:

awk '/regexp/{i=2;next;}{if(i){i--; print;}}' file.txt

Print the line and the next two (i=2) lines after the line matching regexp (this command is the same as: grep -A 2 regexp):

awk '/regexp/{i=2+1;}{if(i){i--; print;}}' file.txt

Print the line matching regexp and 12 following lines. Print also any line matching regexp2:

awk '/regexp/{i=12+1}{if(i){i--; print;}}/regexp2/{print}' file.txt

AWK ranges: Print the lines from a file starting at the line matching “start” until the line matching “stop”:

awk '/start/,/stop/' file.txt

Note: make sure that the stop pattern does not match the start line otherwise only that line will be printed.

AWK ranges: Print the lines starting at the line matching “start” until the end of the file:

awk '/start/,0' file.txt

Note: make sure that the stop pattern does not match the start line otherwise only that line will be printed.

Sometimes you have a terminal log and it contains “prompt# command” with printouts of “command” in-between. You can’t use: awk ‘/command/,/prompt/’ log.txt to print everything from command to the next prompt because the line where the command is has also a prompt. To solve this we can use a state variable and the “next” statement to skip the processing of other statements once we found “command”. This will print everything from “prompt.*ls” and stop printing at the next prompt:

awk '/prompt.*ls/{s=1;print;next};/prompt/{s=0};s==1{print}' log.txt

Change “prompt” to whatever string appears in your terminal prompt, e.g the hostname. If you want to include the last prompt where to stop printing then try this:

awk '/prompt.*ls/{s=1;print;next};s==0{next};{print};/prompt/{s=0};' log.txt

Change “prompt” to whatever string appears in your terminal prompt, e.g the hostname. Print fields 1 and 2 from all lines not matching regexp:

awk '!/regexp/{print $1 " " $2 }' file.txt

Print fields 1 and 2 from lines matching regexp1 and not matching regexp2:

awk '/regexp1/&&!/regexp2/{print $1 " " $2 }' file.txt

Regexp syntax:

c matches the non-metacharacter c.

\c matches the literal character c.

. matches any character including newline.

^ matches the beginning of a string (example: ^1 , only lines starting with a one)

$ matches the end of a string (example: end$ , only lines ending in "end")

[abc...] character list, matches any of the characters abc....

[0-9a-zA-Z] range of characters 0-9 and a-z,A-Z

[^abc...] negated character list, matches any character except abc....

r1|r2 alternation: matches either r1 or r2.

r1r2 concatenation: matches r1, and then r2.

r+ matches one or more r's.

r* matches zero or more r's.

r? matches zero or one r's.

(r) grouping: matches r.

In languages like Perl you can use the grouping feature to extract a substring from the matching string. Normal AWK can not use a grouping to chapture a string. However gawk has the match function which can be used for that. The string matched by the first bracket will be in arr[1].

Print the content of the part of the matching regexp that is enclosed by the round brackets:

gawk 'match($0, /length:([0-9]+) cm/,arr){ print arr[1]}' file.txt

Tags

Share

Konrad Kuśmierz

Konrad Kuśmierz

Software Engineer

Founder

Expertise

devops
ai

Social Media

instagramtwitterwebsite

Related Posts

Sed Introduction
Sed Introduction
June 21, 2023
1 min