Excercise on bash text processing commands

  1. using wget, download skakalpes.txt (a plain-text file encoded in UTF8)
  2. view the file using cat and less
  3. count the number of lines in the file using wc
  4. using head and tail, view the first 15 lines , the last 15 lines and lines 10-20
  5. using cut, print the first two words on each line
  6. using grep, print all lines containing a digit
  7. using sed, substitute spaces and punctuations marks with the new line symbol, so that there is at most one word per line (\n)
  8. using grep, avoid empty lines
  9. using sort, sort the words alphabetically
  10. using sort|uniq, count the number of distinct words in the text
  11. print words that appear only once in the text (uniq)
  12. extract all words longer than 2 letters, count them (wc) and sort them according to the "rhyming order" (use rev - revert them, sort them and revert them back)
  13. using sort|uniq -c|sort -nr, create a frequency list of words
  14. similarly, create a frequency list of letters
  15. using paste, create the frequency list of word bigrams (create another file with lines shifted upwards by one, merge it by paste with the original file and make a frequency list of the lines)