Excercise on bash text processing commands

  1. recall command line redirection and pipelining (STDIN, STDOUT, STDERR)
  2. using wget, download skakalpes-il2.txt
  3. view the file using cat and less
  4. using iconv, convert the file from iso-8859-2 to to utf-8 and store it into skakalpes-utf8.txt
  5. view the new file
  6. count the number of lines in the file using wc
  7. using head and tail, view the first 15 lines , the last 15 lines and lines 10-20
  8. using cut, print the first two words on each line
  9. using grep, print all lines containing a digit
  10. using sed, substitute spaces and punctuations marks with the new line symbol, so that there is at most one word per line (\n)
  11. using grep, avoid empty lines
  12. using sort, sort the words alphabetically
  13. using wc, count the number of words in the text
  14. using sort|uniq, count the number of distinct words in the text
  15. using sort|uniq -c|sort -nr, create a frequency list of words
  16. create a frequency list of letters
  17. using paste, create the frequency list of word bigrams (create another file with lines shifted upwards by one, merge it by paste with the original file and make a frequency list of the lines)
  18. Longer excercise: write a shell script that downloads the main web-page of some news server and finds all word bigrams in it in which both words are capitalized. Make a frequency list of HTML tags used in the document.
  19. reorganize it into a Makefile; name your targets t2-t18
  20. suggest similar new exercises