Excercise on bash text processing commands
- using wget, download skakalpes.txt (a plain-text file encoded in UTF8)
- view the file using cat and less
- count the number of lines in the file using wc
- using head and tail, view the first 15 lines ,
the last 15 lines and lines 10-20
- using cut, print the first two words on each line
- using grep, print all lines containing a digit
- using sed, substitute spaces and punctuations marks
with the new line symbol, so that there is at most one word
per line (\n)
- using grep, avoid empty lines
- using sort, sort the words alphabetically
- using sort|uniq, count the number of distinct words in the text
- print words that appear only once in the text (uniq)
- extract all words longer than 2 letters, count them (wc) and sort them according to the "rhyming order" (use rev - revert them, sort them and revert them back)
- using sort|uniq -c|sort -nr, create a frequency
list of words
- similarly, create a frequency list of letters
- using paste, create the frequency list of word
bigrams (create another file with lines shifted upwards by one,
merge it by paste with the original file and make a frequency
list of the lines)