Excercise on bash text processing commands
- recall command line redirection and pipelining (STDIN,
STDOUT, STDERR)
- using wget, download skakalpes-il2.txt
- view the file using cat and less
- using iconv, convert the file from iso-8859-2 to to
utf-8 and store it into skakalpes-utf8.txt
- view the new file
- count the number of lines in the file using wc
- using head and tail, view the first 15 lines ,
the last 15 lines and lines 10-20
- using cut, print the first two words on each line
- using grep, print all lines containing a digit
- using sed, substitute spaces and punctuations marks
with the new line symbol, so that there is at most one word
per line (\n)
- using grep, avoid empty lines
- using sort, sort the words alphabetically
- using wc, count the number of words in the
text
- using sort|uniq, count the number of distinct words
in the text
- using sort|uniq -c|sort -nr, create a frequency
list of words
- create a frequency list of letters
- using paste, create the frequency list of word
bigrams (create another file with lines shifted upwards by one,
merge it by paste with the original file and make a frequency
list of the lines)
- Longer excercise: write a shell script that downloads
the main web-page of some news server and finds all word bigrams
in it in which both words are capitalized. Make a frequency list
of HTML tags used in the document.
- reorganize it into a Makefile; name your targets t2-t18
- suggest similar new exercises