Language Technologies for Research in Humanities


NPFL131 / ATKL00349

Pavel Straňák

stranak@ufal.mff.cuni.cz

Friday 12:30–14:00
Palachovo nám. 2, room C131

10. 3. 2023

Bash shortcuts (and Windows specifics)

sort, uniq, frequencies (repeated from last week)

grep

Mostly we use perl. You can do all of this in perl, but for the very simplest things, especially with fixed strings (no regexes) grep can be simpler to use. Whenever you need a regex, use perl.

[6] alfred:~% ps ax | grep Chrome | grep -v grep |wc -l
       5
[7] alfred:~% ps ax | grep Chrome |wc -l               
       6

cut & paste

cut -d: -f5 /etc/passwd
cut -d: -f5 /etc/passwd >c1
cut -d: -f3 /etc/passwd >c2
paste c1 c2

Perl in command line

Perl is a programming language, but it can be also used like grep, sed, wc, etc.

Big advantages of Perl

  1. It is a programming language. Anything can be done in a script (=program).
perl -C -ple '$_=reverse()'
  1. Best regular expressions and Unicode support.

Perl Regular Expressions

Regular expressions (regexes) exist in many programming languages and unix tools. Many variants (“standard,”extended”, etc.). We will only use Perl regular expressions.

Fast, convenient, great Unicode support, many extensions …

Basic character classes

Even More Regular Expressions