\documentclass[a4paper,11pt]{article}

\begin{filecontents}{colacl.sty}
%
% colacl.sty
%
\typeout{}
\typeout{ACL-COLING 1998 Proceedings style -- March 31st 1998}
\typeout{}
%
% ----------------------------------------------------------------------
%
% This is the LaTeX style file for ACL-COLING 1998.  It is based on
% a series of similar files prepared for previous conferences by
% Fernando Pereira, Paul Jacobs, Stuart Shieber, Peter F. 
% Patel-Schneider and others.  Various changes have been made, chiefly
% to save space in the final output or remove redundant definitions. 
%
% colacl.sty is designed for use as a package or option with the
% standard LaTeX article class, and the BibTeX style acl.bst.
%
% Author/title and citation formatting differs slightly from standard
% LaTeX; see AUTHOR FORMATS and CITATION FORMATS below for more
% information.
% 
% This file is supplied as a hopefully convenient implementation of 
% some of the "instructions for authors" repeated below.  It is not
% guaranteed to work in any given LaTeX installation or in conjunction
% with any given class, package or style, and it is not intended as
% a LaTeX tutorial.
%
% ----------------------------------------------------------------------
% Instructions for authors
% 
%   (i) Maximum length of full papers: 7 pages.  An eighth page may be
%       used for one or more abstracts in languages which differ from
%       that used for the body of the paper.  For project notes, 5
%       pages, with an optional sixth for abstract(s).
%
%  (ii) Paper size: A4 or US Letter.
%
% (iii) Margins: set so that text lies within a rectangle 9in (23cm)
%       high and 6.5in (16.5cm) wide.
%
%  (iv) Body of text to be set in two columns.  Full-width figures
%       (i.e. using \begin{figure*}) and tables may be used if
%       necessary. 
%
%   (v) Use standard fonts, e.g. Computer Modern Roman, Times Roman, no
%       smaller than 10pt.
%
%  (vi) No page numbering (pages should be numbered in pencil, on the
%       reverse side).
% 
% Items (iii), (iv) and (vi) are handled by this file, and should
% therefore not be overridden by resetting \textwidth, \textheight,
% \pagestyle etc. in your document, or by using styles or packages
% which have the same effect.
%
% ----------------------------------------------------------------------
% To convert papers prepared with colaclsub.sty to the final format
% for use with colacl.sty:  
%
% (1) Remove commands specific to the original submission format 
%     (\type, \subject, \contact, \conference, \makeidpage).
% 
% (2) Replace \summary{...} with an abstract, using the normal
%     abstract environment, placed after \maketitle.
%
% A simple document template:
%
%   \documentclass[11pt]{article}
%   \usepackage{colacl}
%   \title{...}
%   \author{...}             % see below for possible formats
%   \begin{document}
%   \maketitle
%   \begin{abstract}
%   ...                      % contents of abstract
%   \end{abstract}
%   ...                      % contents of article
%   \bibliographystyle{acl}  % use acl.bst
%   \bibliography{...}
%   \end{document}
%
% Users of obsolete LaTeX versions can try:
%
%   \documentstyle[colacl]{article}	% or [11pt,colacl]
%   \title{...}
%   ...
%
%
% ----------------------------------------------------------------------
% AUTHOR FORMATS
%
% Author information can be set in various styles.
%
% For several authors from the same institution:
% \author{Author 1 \and ... \and Author n \\
%	  Address line \\ ... \\ Address line}
% if the names do not fit well on one line use
%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
%
% For authors from different institutions:
% \author{Author 1 \\ Address line \\  ... \\ Address line
%	  \And  ... \And
%	  Author n \\ Address line \\ ... \\ Address line}
%
% To start a separate "row" of authors use \AND, as in
% \author{Author 1 \\ Address line \\  ... \\ Address line
%	  \AND
%	  Author 2 \\ Address line \\ ... \\ Address line \And
%	  Author 3 \\ Address line \\ ... \\ Address line}
%
% If the title and author information does not fit in the area allocated,
% place \setlength\titlebox{<new height>} after \usepackage{colacl},
% where <new height> can be something larger than 2.0in
%
% ----------------------------------------------------------------------
% CITATION FORMATS
% 
% Three possible citation formats:
% "\cite{...}"      produces a citation like "(Author, 1999)"
% "\shortcite{...}" produces a citation like "(1999)"
% "\newcite{...}"   produces a citation like "Author (1999)"
%
% All three take an optional argument which can be used to add page
% references, etc.:
% "\newcite[1--6]{...}" produces a citation like "Author (1999, 1--6)"
%
% ----------------------------------------------------------------------
% IF IT DOESN'T WORK
%
% The error message "File `colacl.sty' not found." indicates that this
% file has not been installed in a location which is visible to your 
% LaTeX.  Try putting it in the same directory as your paper, and
% running LaTeX there.  Consult your `Local Guide' documentation or
% your system administrator to find out how LaTeX searches for input
% files.
%
% "\documentclass..." is a LaTeX2e declaration.  An error message
% "Undefined control sequence." followed by a line ending in
% "\documentclass" indicates that you have used this with an obsolete
% LaTeX installation.  Use the "\documentstyle" variant shown above.
%
% As a last resort, forget about colacl.sty and simply copy the
% following lines (uncommented, obviously) into your document before
% the "\begin{document}":
%
% \setlength\topmargin{0.0in}
% \setlength\oddsidemargin{-0.0in}
% \setlength\textheight{9.0in}
% \setlength\textwidth{6.5in}
% \setlength\columnsep{0.25in}
% \setlength\headheight{0pt}
% \setlength\headsep{0pt}
% \thispagestyle{empty}
% \pagestyle{empty}
% \flushbottom
% \twocolumn
% \sloppy
%
% Some interactions with other packages may still occur.  In order to
% remove the page number from the first page, you may have to place the
% "\thispagestyle{empty}" command immediately after "\maketitle".
%
% ----------------------------------------------------------------------
% NOTE:  Some laser printers have a serious problem printing TeX output.
% These printing devices, commonly known as "write-white" laser
% printers, tend to make characters too light.  To get around this
% problem, a darker set of fonts must be created for these devices.
%
% ----------------------------------------------------------------------
% Physical page layout - slightly modified from IJCAI by pj

\setlength\topmargin{0.0in}
\setlength\oddsidemargin{-0.0in}
\setlength\textheight{9.0in}
\setlength\textwidth{6.5in}
\setlength\columnsep{0.25in}
\newlength\titlebox
\setlength\titlebox{2.0in}		% was 2.25in
\setlength\headheight{0pt}
\setlength\headsep{0pt}
\setlength\footskip{0pt}                % irrelevant when no footers.
\pagestyle{empty}			% no page numbers
\thispagestyle{empty}			% no page numbers
\flushbottom
\twocolumn
\sloppy

% We're never going to need a table of contents, so just flush it to
% save space --- suggested by drstrip@sandia-2
\def\addcontentsline#1#2#3{}

% ----------------------------------------------------------------------
% Title stuff, taken from deproc.

\def\maketitle{%
  \par%
  \begingroup%
     \def\thefootnote{\fnsymbol{footnote}}%
     \def\@makefnmark{\rlap{$^{\@thefnmark}$\hss}}%
     % no paragraph breaks in \thanks
     \long\def\@makefntext##1{%
                  \parindent 1em\noindent%
                  \hbox to 1em{$^{\@thefnmark}$}##1}
     \twocolumn[\@maketitle] \@thanks%
  \endgroup%
  \setcounter{footnote}{0}%
  \let\maketitle\relax\let\@maketitle\relax%
  \gdef\@thanks{}\gdef\@author{}\gdef\@title{}%
  \let\thanks\relax}

% some vertical space removed here: skip above and below title
%
\def\@maketitle{%
  \vbox to \titlebox{%
    \hsize\textwidth\linewidth\hsize%
    \vskip 0.125in minus 0.05in%
    \centering{\Large\bf \@title \par}%
    \vskip 0.2in plus 0.1fil minus 0.1in
    {\def\and{\unskip\enspace{\rm and}\enspace}%
     \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil 
              \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}%
     \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
	      \vskip 0.25in plus 1fil minus 0.125in
	      \hbox to \linewidth\bgroup\large \hfil\hfil
   	      \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}
    \hbox to \linewidth \bgroup\large \hfil\hfil
    \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf\@author 
			    \end{tabular}\hss\egroup
    \hfil\hfil\egroup}
  \vskip 0.3in plus 2fil minus 0.1in
}}

% ----------------------------------------------------------------------
% abstract

% quote env a bit narrow for 2-column
%\renewenvironment{abstract}{\centerline{\large\bf
% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}

\renewenvironment{abstract}{\section*{\centerline{Abstract}}}{}

% ----------------------------------------------------------------------
% bibliography and citations

% most of cite format is from aclsub.sty by SMS

% don't box citations, separate with ; and a space
% Replaced for multiple citations (pj) 
% don't box citations and also add space, semicolon between multiple
% citations
%
\def\@citex[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi
  \def\@citea{}\@cite{\@for\@citeb:=#2\do
     {\@citea\def\@citea{; }\@ifundefined
       {b@\@citeb}{{\bf ?}\@warning
        {Citation `\@citeb' on page \thepage \space undefined}}%
 {\csname b@\@citeb\endcsname}}}{#1}}

% Allow short (name-less) citations, when used in
% conjunction with a bibliography style that creates labels like
%	\citename{<names>, }<year>
% 
\let\@internalcite\cite
\def\cite{\def\citename##1{##1, }\@internalcite}
\def\shortcite{\def\citename##1{}\@internalcite}
\def\newcite{\leavevmode\def\citename##1{{##1} (}\@internalciteb}

% Macros for \newcite, which leaves name in running text, and is
% otherwise like \shortcite.
%
\def\@citexb[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi
  \def\@citea{}\@newcite{\@for\@citeb:=#2\do
    {\@citea\def\@citea{;\penalty\@m\ }\@ifundefined
       {b@\@citeb}{{\bf ?}\@warning
       {Citation `\@citeb' on page \thepage \space undefined}}%
% gjr: hbox causes too many bad linebreaks
%\hbox{\csname b@\@citeb\endcsname}}}{#1}}
     {\csname b@\@citeb\endcsname}}}{#1}}

\def\@internalciteb{%
  \@ifnextchar [{\@tempswatrue\@citexb}{\@tempswafalse\@citexb[]}}

\def\@newcite#1#2{{#1\if@tempswa, #2\fi)}}

% gjr: no labels in this bibliography style
%\def\@biblabel#1{\def\citename##1{##1}[#1]\hfill}
\def\@biblabel#1{}

%%% More changes made by SMS (originals in latex.tex)
% Use parentheses instead of square brackets in the text.
\def\@cite#1#2{({#1\if@tempswa , #2\fi})}

% Don't put a label in the bibliography at all.  Just use the unlabeled format
% instead.
% gjr: removed \@mkboth -- no headers here.
% gjr: reduced vertical space between entries (plus was .33em)
%
\def\thebibliography#1{%
  \section*{References}
  \list{}{\setlength{\labelwidth}{0pt}
          \setlength{\leftmargin}{\parindent}
          \setlength{\itemsep}{0.11ex plus 0.11ex}
          \setlength{\parsep}{0ex}
          \setlength{\itemindent}{-\parindent}}
  \def\newblock{\hskip .11em plus .11em minus -.07em}
  \sloppy\clubpenalty4000\widowpenalty4000
  \sfcode`\.=1000\relax}
\let\endthebibliography=\endlist

% Allow for a bibliography of sources of attested examples
\def\thesourcebibliography#1{%
  \section*{Sources of Attested Examples}
  \list{}{\setlength{\labelwidth}{0pt}
          \setlength{\leftmargin}{\parindent}
          \setlength{\itemsep}{0.11ex plus 0.11ex}
          \setlength{\parsep}{0ex}
          \setlength{\itemindent}{-\parindent}}
  \def\newblock{\hskip .11em plus .11em minus -.07em}
  \sloppy\clubpenalty4000\widowpenalty4000
  \sfcode`\.=1000\relax}
\let\endthesourcebibliography=\endlist

\def\@lbibitem[#1]#2{\item[]\if@filesw 
      { \def\protect##1{\string ##1\space}\immediate
        \write\@auxout{\string\bibcite{#2}{#1}}\fi\ignorespaces}}

\def\@bibitem#1{\item\if@filesw \immediate\write\@auxout
       {\string\bibcite{#1}{\the\c@enumi}}\fi\ignorespaces}

% ----------------------------------------------------------------------
% Section headings with less space

\def\section{%
    \@startsection{section}{1}{\z@}%
                  {-2.0ex plus -0.5ex minus -0.3ex}%
                  {0.8ex plus 0.3ex minus 0.2ex}%
                  {\large\bf\raggedright}}
\def\subsection{%
    \@startsection{subsection}{2}{\z@}%
                  {-1.4ex plus -0.4ex minus -0.2ex}%
                  {0.6ex plus 0.2ex minus 0.1ex}%
                  {\normalsize\bf\raggedright}}
\def\subsubsection{%
    \@startsection{subsubsection}{3}{\z@}%
                  {-0.8ex plus -0.3ex minus -0.1ex}%
                  {0.4ex plus 0.1ex minus 0.1ex}%
                  {\normalsize\bf\raggedright}}
\def\paragraph{%
    \@startsection{paragraph}{4}{\z@}%
                  {-0.8ex plus -0.3ex minus -0.1ex}%
                  {-1em}%
                  {\normalsize\bf}}
\def\subparagraph{%
    \@startsection{subparagraph}{5}{\parindent}%
                  {0.4ex plus 0.3ex minus 0.1ex}%
                  {-1em}%
                  {\normalsize\bf}}


% ----------------------------------------------------------------------
% Footnotes

%\footnotesep 6.65pt %
%\skip\footins 9pt plus 4pt minus 2pt
%\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt }
%\setcounter{footnote}{0}

% ----------------------------------------------------------------------
% Lists and paragraphs

\setlength\parindent{1em}

\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em
\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt

% ----------------------------------------------------------------------
% Floats (figures, tables etc.)
%
% Allow a larger proportion of the column/page to be taken up with
% floats than the standard classes.  Also discourage the creation of
% columns/pages containing only floats.

% Maximum fraction of the page that can be occupied by floats:
%
\renewcommand\topfraction{.9}
\renewcommand\bottomfraction{.5}
\renewcommand\dbltopfraction{.9}	% 2-column floats

% Minimum fraction of page that can be occupied by text:
%
\renewcommand\textfraction{.1}

% Maximum fraction of a page that can be occupied by floats before a
% separate float page is produced:
%
\renewcommand\floatpagefraction{0.9}
\renewcommand\dblfloatpagefraction{.9}	% 2-column floats

% ----------------------------------------------------------------------
%
% Since we're using two columns, lines are short and we can get away
% with less vertical space between lines, within lists and around
% various kinds of display.
%
% Normally, these parameters are set in the size option to the class
% file (standard definitions are in classes.dtx).  Here we want to
% accommodate 10pt, 11pt and 12pt, so we wrap the definitions in
% \ifcase. 
%

%  \normalsize
%
\ifcase\@ptsize%
    \renewcommand{\normalsize}{% 				10pt
        \@setsize\normalsize{11.3pt}\xpt\@xpt%
        \abovedisplayskip 10\p@\@plus2\p@\@minus5\p@%
        \abovedisplayshortskip\z@\@plus3\p@%
        \belowdisplayshortskip 4\p@\@plus3\p@\@minus3\p@%
        \belowdisplayskip\abovedisplayskip%
        \let\@listi\@listI}%
 \or%
    \renewcommand{\normalsize}{% 				11pt
        \@setsize\normalsize{12.6pt}\xipt\@xipt%
        \abovedisplayskip11\p@\@plus2\p@\@minus4\p@%
        \abovedisplayshortskip\z@\@plus3\p@%
        \belowdisplayshortskip5\p@\@plus3\p@\@minus2\p@%
        \belowdisplayskip\abovedisplayskip%
        \let\@listi\@listI}%
 \or%
    \renewcommand{\normalsize}{%				12pt
        \@setsize\normalsize{13pt}\xiipt\@xiipt%
        \abovedisplayskip 11\p@ \@plus3\p@ \@minus5\p@%
        \abovedisplayshortskip \z@ \@plus3\p@%
        \belowdisplayshortskip 5\p@ \@plus3\p@ \@minus2\p@%
        \belowdisplayskip\abovedisplayskip%
        \let\@listi\@listI}%
 \fi    

% \small
%
\ifcase\@ptsize%
    \renewcommand{\small}{%					10pt
        \@setsize\small{10.5pt}\ixpt\@ixpt%
        \abovedisplayskip 8\p@ \@plus3\p@ \@minus3\p@%
        \abovedisplayshortskip \z@ \@plus2\p@%
        \belowdisplayshortskip 3\p@ \@plus2\p@ \@minus2\p@%
        \belowdisplayskip\abovedisplayskip%
        \def\@listi{\leftmargin\leftmargini%
                    \topsep 3.5\p@ \@plus1.5\p@ \@minus1.5\p@%
                    \parsep 1.5\p@ \@plus\p@ \@minus\p@%
                    \itemsep \parsep}}%
 \or%
    \renewcommand{\small}{%					11pt
        \@setsize\small{11.3pt}\xpt\@xpt%
        \abovedisplayskip 9\p@ \@plus2\p@ \@minus4\p@%
        \abovedisplayshortskip \z@ \@plus3\p@%
        \belowdisplayshortskip 5\p@ \@plus2.5\p@ \@minus2.5\p@%
        \belowdisplayskip\abovedisplayskip%
        \def\@listi{\leftmargin\leftmargini%
                    \topsep 5\p@ \@plus2\p@ \@minus2\p@%
                    \parsep 2\p@ \@plus2\p@ \@minus\p@%
                    \itemsep \parsep}}%
 \or%
    \renewcommand{\small}{%					12pt
        \@setsize\small{12pt}\xipt\@xipt%
        \abovedisplayskip 9\p@ \@plus3\p@ \@minus4\p@%
        \abovedisplayshortskip \z@ \@plus3\p@%
        \belowdisplayshortskip 5\p@ \@plus2.5\p@ \@minus2\p@%
        \belowdisplayskip\abovedisplayskip%
        \def\@listi{\leftmargin\leftmargini%
                    \topsep 5.5\p@ \@plus2.5\p@ \@minus2.5\p@%
                    \parsep 4\p@ \@plus1.5\p@ \@minus\p@%
                    \itemsep \parsep}}%
 \fi


% \footnotesize
%
\ifcase\@ptsize
    \renewcommand{\footnotesize}{%				10pt
        \@setsize\footnotesize{9.3pt}\viiipt\@viiipt%
        \abovedisplayskip 5\p@ \@plus2\p@ \@minus3\p@%
        \abovedisplayshortskip \z@ \@plus\p@%
        \belowdisplayshortskip 2.5\p@\@plus\p@\@minus2\p@%
        \belowdisplayskip\abovedisplayskip%
        \def\@listi{\leftmargin\leftmargini%
                    \topsep 2.5\p@ \@plus\p@ \@minus\p@%
                    \parsep 1.5\p@ \@plus\p@ \@minus\p@%
                    \itemsep \parsep}}%
 \or%
    \renewcommand{\footnotesize}{%				11pt
        \@setsize\footnotesize{10.3pt}\ixpt\@ixpt%
        \abovedisplayskip 7\p@ \@plus2\p@ \@minus4\p@%
        \abovedisplayshortskip \z@ \@plus\p@%
        \belowdisplayshortskip 3\p@ \@plus2\p@ \@minus2\p@%
        \belowdisplayskip\abovedisplayskip%
        \def\@listi{\leftmargin\leftmargini%
                    \topsep 3\p@ \@plus2\p@ \@minus2\p@%
                    \parsep 2\p@ \@plus\p@ \@minus\p@%
                    \itemsep \parsep}}%
 \or%
    \renewcommand{\footnotesize}{%				12pt
        \@setsize\footnotesize{11pt}\xpt\@xpt%
        \abovedisplayskip 9\p@ \@plus2\p@ \@minus4\p@%
        \abovedisplayshortskip \z@ \@plus3\p@%
        \belowdisplayshortskip 5\p@ \@plus3\p@ \@minus3\p@%
        \belowdisplayskip\abovedisplayskip%
        \def\@listi{\leftmargin\leftmargini%
                    \topsep 4.5\p@ \@plus2\p@ \@minus2\p@%
                    \parsep 3\p@ \@plus\p@ \@minus\p@%
                    \itemsep \parsep}}%
 \fi 


% \large
%
\ifcase\@ptsize%
    \renewcommand{\large}{\@setsize\large{13pt}\xiipt\@xiipt}%	10pt
 \or%
    \renewcommand{\large}{\@setsize\large{13pt}\xiipt\@xiipt}%	11pt
 \or%
    \renewcommand{\large}{\@setsize\large{16pt}\xivpt\@xivpt}%	12pt
 \fi

% \Large
%
\ifcase\@ptsize%
    \renewcommand{\Large}{\@setsize\Large{16pt}\xivpt\@xivpt}%	10pt
 \or%
    \renewcommand{\Large}{\@setsize\Large{16pt}\xivpt\@xivpt}%	11pt
 \or%
    \renewcommand{\Large}{\@setsize\Large{16pt}\xivpt\@xivpt}%	12pt
 \fi

% Leave \scriptsize, \tiny, \huge, \Huge unchanged?

%
% Float separations, single and double-column
%
\ifcase\@ptsize%
    \setlength\floatsep{10\p@ \@plus 2\p@ \@minus 2\p@}%	10pt
    \setlength\textfloatsep{16\p@ \@plus 2\p@ \@minus 4\p@}%
    \setlength\intextsep{10\p@ \@plus 2\p@ \@minus 2\p@}%
    \setlength\dblfloatsep{10\p@ \@plus 2\p@ \@minus 2\p@}%
    \setlength\dbltextfloatsep{16\p@ \@plus 2\p@ \@minus 4\p@}%
 \or%
    \setlength\floatsep{10\p@ \@plus 2\p@ \@minus 2\p@}%	11pt
    \setlength\textfloatsep{16\p@ \@plus 2\p@ \@minus 4\p@}%
    \setlength\intextsep{10\p@ \@plus 2\p@ \@minus 2\p@}%
    \setlength\dblfloatsep{10\p@ \@plus 2\p@ \@minus 2\p@}%
    \setlength\dbltextfloatsep{16\p@ \@plus 2\p@ \@minus 4\p@}%
 \or%
    \setlength\floatsep{12\p@ \@plus 3\p@ \@minus 3\p@}%	12pt
    \setlength\textfloatsep{18\p@ \@plus 2\p@ \@minus 4\p@}%
    \setlength\intextsep{12\p@ \@plus 3\p@ \@minus 3\p@}%
    \setlength\dblfloatsep{12\p@ \@plus 2\p@ \@minus 4\p@}%
    \setlength\dbltextfloatsep{18\p@ \@plus 2\p@ \@minus 4\p@}%
 \fi

%
% Top-level list in \normalsize text
%
\ifcase\@ptsize%
    \def\@listi{\leftmargin\leftmargini%			10pt
                \topsep  6\p@ \@plus2\p@ \@minus2\p@%
                \parsep  2\p@ \@plus0.5\p@ \@minus\p@%
                \itemsep 2.5\p@ \@plus\p@ \@minus0.5\p@}%
 \or%
    \def\@listi{\leftmargin\leftmargini%			11pt
                \topsep  8\p@ \@plus2\p@ \@minus2\p@%
                \parsep  3\p@ \@plus1.5\p@ \@minus\p@%
                \itemsep 3\p@ \@plus1.5\p@ \@minus\p@}%
 \or%
    \def\@listi{\leftmargin\leftmargini%			12pt
                \topsep  9\p@ \@plus3\p@   \@minus4\p@%
                \parsep  4\p@  \@plus2\p@ \@minus\p@%
                \itemsep 4\p@  \@plus2\p@ \@minus\p@}%
 \fi
\let\@listI\@listi

%
% Embedded lists unchanged.
%


% ----------------------------------------------------------------------
% End of colacl.sty
% ----------------------------------------------------------------------
\end{filecontents}

\begin{filecontents}{acl.bst}
% BibTeX `acl' style file for BibTeX version 0.99c, LaTeX version 2.09
% This version was made by modifying `aaai-named' format based on the master
% file by Oren Patashnik (PATASHNIK@SCORE.STANFORD.EDU)

% Copyright (C) 1985, all rights reserved.
% Modifications Copyright 1988, Peter F. Patel-Schneider
% Further modifictions by Stuart Shieber, 1991, and Fernando Pereira, 1992.
% Copying of this file is authorized only if either
% (1) you make absolutely no changes to your copy, including name, or
% (2) if you do make changes, you name it something other than
% btxbst.doc, plain.bst, unsrt.bst, alpha.bst, and abbrv.bst.
% This restriction helps ensure that all standard styles are identical.

% There are undoubtably bugs in this style.  If you make bug fixes,
% improvements, etc.  please let me know.  My e-mail address is:
%	pfps@spar.slb.com

%   Citation format: [author-last-name, year]
%		     [author-last-name and author-last-name, year]
%		     [author-last-name {\em et al.}, year]
%
%   Reference list ordering: alphabetical by author or whatever passes
%	for author in the absence of one.
%
% This BibTeX style has support for short (year only) citations.  This
% is done by having the citations actually look like
%         \citename{name-info, }year
% The LaTeX style has to have the following
%     \let\@internalcite\cite
%     \def\cite{\def\citename##1{##1}\@internalcite}
%     \def\shortcite{\def\citename##1{}\@internalcite}
%     \def\@biblabel#1{\def\citename##1{##1}[#1]\hfill}
% which makes \shortcite the macro for short citations.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Changes made by SMS for thesis style
%   no emphasis on "et al."
%   "Ph.D." includes periods (not "PhD")
%   moved year to immediately after author's name
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ENTRY
  { address
    author
    booktitle
    chapter
    edition
    editor
    howpublished
    institution
    journal
    key
    month
    note
    number
    organization
    pages
    publisher
    school
    series
    title
    type
    volume
    year
  }
  {}
  { label extra.label sort.label }

INTEGERS { output.state before.all mid.sentence after.sentence after.block }

FUNCTION {init.state.consts}
{ #0 'before.all :=
  #1 'mid.sentence :=
  #2 'after.sentence :=
  #3 'after.block :=
}

STRINGS { s t }

FUNCTION {output.nonnull}
{ 's :=
  output.state mid.sentence =
    { ", " * write$ }
    { output.state after.block =
	{ add.period$ write$
	  newline$
	  "\newblock " write$
	}
	{ output.state before.all =
	    'write$
	    { add.period$ " " * write$ }
	  if$
	}
      if$
      mid.sentence 'output.state :=
    }
  if$
  s
}

FUNCTION {output}
{ duplicate$ empty$
    'pop$
    'output.nonnull
  if$
}

FUNCTION {output.check}
{ 't :=
  duplicate$ empty$
    { pop$ "empty " t * " in " * cite$ * warning$ }
    'output.nonnull
  if$
}

FUNCTION {output.bibitem}
{ newline$

  "\bibitem[" write$
  label write$
  "]{" write$

  cite$ write$
  "}" write$
  newline$
  ""
  before.all 'output.state :=
}

FUNCTION {fin.entry}
{ add.period$
  write$
  newline$
}

FUNCTION {new.block}
{ output.state before.all =
    'skip$
    { after.block 'output.state := }
  if$
}

FUNCTION {new.sentence}
{ output.state after.block =
    'skip$
    { output.state before.all =
	'skip$
	{ after.sentence 'output.state := }
      if$
    }
  if$
}

FUNCTION {not}
{   { #0 }
    { #1 }
  if$
}

FUNCTION {and}
{   'skip$
    { pop$ #0 }
  if$
}

FUNCTION {or}
{   { pop$ #1 }
    'skip$
  if$
}

FUNCTION {new.block.checka}
{ empty$
    'skip$
    'new.block
  if$
}

FUNCTION {new.block.checkb}
{ empty$
  swap$ empty$
  and
    'skip$
    'new.block
  if$
}

FUNCTION {new.sentence.checka}
{ empty$
    'skip$
    'new.sentence
  if$
}

FUNCTION {new.sentence.checkb}
{ empty$
  swap$ empty$
  and
    'skip$
    'new.sentence
  if$
}

FUNCTION {field.or.null}
{ duplicate$ empty$
    { pop$ "" }
    'skip$
  if$
}

FUNCTION {emphasize}
{ duplicate$ empty$
    { pop$ "" }
    { "{\em " swap$ * "}" * }
  if$
}

INTEGERS { nameptr namesleft numnames }

FUNCTION {format.names}
{ 's :=
  #1 'nameptr :=
  s num.names$ 'numnames :=
  numnames 'namesleft :=
    { namesleft #0 > }

    { s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$ 't :=

      nameptr #1 >
	{ namesleft #1 >
	    { ", " * t * }
	    { numnames #2 >
		{ "," * }
		'skip$
	      if$
	      t "others" =
		{ " et~al." * }
		{ " and " * t * }
	      if$
	    }
	  if$
	}
	't
      if$
      nameptr #1 + 'nameptr :=
      namesleft #1 - 'namesleft :=
    }
  while$
}

FUNCTION {format.authors}
{ author empty$
    { "" }
    { author format.names }
  if$
}

FUNCTION {format.editors}
{ editor empty$
    { "" }
    { editor format.names
      editor num.names$ #1 >
	{ ", editors" * }
	{ ", editor" * }
      if$
    }
  if$
}

FUNCTION {format.title}
{ title empty$
    { "" }

    { title "t" change.case$ }

  if$
}

FUNCTION {n.dashify}
{ 't :=
  ""
    { t empty$ not }
    { t #1 #1 substring$ "-" =
	{ t #1 #2 substring$ "--" = not
	    { "--" *
	      t #2 global.max$ substring$ 't :=
	    }
	    {   { t #1 #1 substring$ "-" = }
		{ "-" *
		  t #2 global.max$ substring$ 't :=
		}
	      while$
	    }
	  if$
	}
	{ t #1 #1 substring$ *
	  t #2 global.max$ substring$ 't :=
	}
      if$
    }
  while$
}

FUNCTION {format.date}
{ year empty$
    { month empty$
	{ "" }
	{ "there's a month but no year in " cite$ * warning$
	  month
	}
      if$
    }
    { month empty$
	{ "" }
	{ month }
      if$
    }
  if$
}

FUNCTION {format.btitle}
{ title emphasize
}

FUNCTION {tie.or.space.connect}
{ duplicate$ text.length$ #3 <
    { "~" }
    { " " }
  if$
  swap$ * *
}

FUNCTION {either.or.check}
{ empty$
    'pop$
    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
  if$
}

FUNCTION {format.bvolume}
{ volume empty$
    { "" }
    { "volume" volume tie.or.space.connect
      series empty$
	'skip$
	{ " of " * series emphasize * }
      if$
      "volume and number" number either.or.check
    }
  if$
}

FUNCTION {format.number.series}
{ volume empty$
    { number empty$
	{ series field.or.null }
	{ output.state mid.sentence =
	    { "number" }
	    { "Number" }
	  if$
	  number tie.or.space.connect
	  series empty$
	    { "there's a number but no series in " cite$ * warning$ }
	    { " in " * series * }
	  if$
	}
      if$
    }
    { "" }
  if$
}

FUNCTION {format.edition}
{ edition empty$
    { "" }
    { output.state mid.sentence =
	{ edition "l" change.case$ " edition" * }
	{ edition "t" change.case$ " edition" * }
      if$
    }
  if$
}

INTEGERS { multiresult }

FUNCTION {multi.page.check}
{ 't :=
  #0 'multiresult :=
    { multiresult not
      t empty$ not
      and
    }
    { t #1 #1 substring$
      duplicate$ "-" =
      swap$ duplicate$ "," =
      swap$ "+" =
      or or
	{ #1 'multiresult := }
	{ t #2 global.max$ substring$ 't := }
      if$
    }
  while$
  multiresult
}

FUNCTION {format.pages}
{ pages empty$
    { "" }
    { pages multi.page.check
	{ "pages" pages n.dashify tie.or.space.connect }
	{ "page" pages tie.or.space.connect }
      if$
    }
  if$
}

FUNCTION {format.year.label}
{ year extra.label *
}

FUNCTION {format.vol.num.pages}
{ volume field.or.null
  number empty$
    'skip$
    { "(" number * ")" * *
      volume empty$
	{ "there's a number but no volume in " cite$ * warning$ }
	'skip$
      if$
    }
  if$
  pages empty$
    'skip$
    { duplicate$ empty$
	{ pop$ format.pages }
	{ ":" * pages n.dashify * }
      if$
    }
  if$
}

FUNCTION {format.chapter.pages}
{ chapter empty$
    'format.pages
    { type empty$
	{ "chapter" }
	{ type "l" change.case$ }
      if$
      chapter tie.or.space.connect
      pages empty$
	'skip$
	{ ", " * format.pages * }
      if$
    }
  if$
}

FUNCTION {format.in.ed.booktitle}
{ booktitle empty$
    { "" }
    { editor empty$
	{ "In " booktitle emphasize * }
	{ "In " format.editors * ", " * booktitle emphasize * }
      if$
    }
  if$
}

FUNCTION {empty.misc.check}
{ author empty$ title empty$ howpublished empty$
  month empty$ year empty$ note empty$
  and and and and and

  key empty$ not and

    { "all relevant fields are empty in " cite$ * warning$ }
    'skip$
  if$
}

FUNCTION {format.thesis.type}
{ type empty$
    'skip$
    { pop$
      type "t" change.case$
    }
  if$
}

FUNCTION {format.tr.number}
{ type empty$
    { "Technical Report" }
    'type
  if$
  number empty$
    { "t" change.case$ }
    { number tie.or.space.connect }
  if$
}

FUNCTION {format.article.crossref}
{ key empty$
    { journal empty$
	{ "need key or journal for " cite$ * " to crossref " * crossref *
	  warning$
	  ""
	}
	{ "In {\em " journal * "\/}" * }
      if$
    }
    { "In " key * }
  if$
  " \cite{" * crossref * "}" *
}

FUNCTION {format.crossref.editor}
{ editor #1 "{vv~}{ll}" format.name$
  editor num.names$ duplicate$
  #2 >
    { pop$ " et~al." * }
    { #2 <
	'skip$
	{ editor #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
	    { " et~al." * }
	    { " and " * editor #2 "{vv~}{ll}" format.name$ * }
	  if$
	}
      if$
    }
  if$
}

FUNCTION {format.book.crossref}
{ volume empty$
    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
      "In "
    }
    { "Volume" volume tie.or.space.connect
      " of " *
    }
  if$
  editor empty$
  editor field.or.null author field.or.null =
  or
    { key empty$
	{ series empty$
	    { "need editor, key, or series for " cite$ * " to crossref " *
	      crossref * warning$
	      "" *
	    }
	    { "{\em " * series * "\/}" * }
	  if$
	}
	{ key * }
      if$
    }
    { format.crossref.editor * }
  if$
  " \cite{" * crossref * "}" *
}

FUNCTION {format.incoll.inproc.crossref}
{ editor empty$
  editor field.or.null author field.or.null =
  or
    { key empty$
	{ booktitle empty$
	    { "need editor, key, or booktitle for " cite$ * " to crossref " *
	      crossref * warning$
	      ""
	    }
	    { "In {\em " booktitle * "\/}" * }
	  if$
	}
	{ "In " key * }
      if$
    }
    { "In " format.crossref.editor * }
  if$
  " \cite{" * crossref * "}" *
}

FUNCTION {article}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  new.block
  crossref missing$
    { journal emphasize "journal" output.check
      format.vol.num.pages output
      format.date output
    }
    { format.article.crossref output.nonnull
      format.pages output
    }
  if$
  new.block
  note output
  fin.entry
}

FUNCTION {book}
{ output.bibitem
  author empty$
    { format.editors "author and editor" output.check }
    { format.authors output.nonnull
      crossref missing$
	{ "author and editor" editor either.or.check }
	'skip$
      if$
    }
  if$
  new.block
  format.year.label "year" output.check
  new.block
  format.btitle "title" output.check
  crossref missing$
    { format.bvolume output
      new.block
      format.number.series output
      new.sentence
      publisher "publisher" output.check
      address output
    }
    { new.block
      format.book.crossref output.nonnull
    }
  if$
  format.edition output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {booklet}
{ output.bibitem
  format.authors output
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  howpublished address new.block.checkb
  howpublished output
  address output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {inbook}
{ output.bibitem
  author empty$
    { format.editors "author and editor" output.check }
    { format.authors output.nonnull
      crossref missing$
	{ "author and editor" editor either.or.check }
	'skip$
      if$
    }
  if$
  format.year.label "year" output.check
  new.block
  new.block
  format.btitle "title" output.check
  crossref missing$
    { format.bvolume output
      format.chapter.pages "chapter and pages" output.check
      new.block
      format.number.series output
      new.sentence
      publisher "publisher" output.check
      address output
    }
    { format.chapter.pages "chapter and pages" output.check
      new.block
      format.book.crossref output.nonnull
    }
  if$
  format.edition output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {incollection}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  new.block
  crossref missing$
    { format.in.ed.booktitle "booktitle" output.check
      format.bvolume output
      format.number.series output
      format.chapter.pages output
      new.sentence
      publisher "publisher" output.check
      address output
      format.edition output
      format.date output
    }
    { format.incoll.inproc.crossref output.nonnull
      format.chapter.pages output
    }
  if$
  new.block
  note output
  fin.entry
}

FUNCTION {inproceedings}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  new.block
  crossref missing$
    { format.in.ed.booktitle "booktitle" output.check
      format.bvolume output
      format.number.series output
      format.pages output
      address empty$
	{ organization publisher new.sentence.checkb
	  organization output
	  publisher output
	  format.date output
	}
	{ address output.nonnull
	  format.date output
	  new.sentence
	  organization output
	  publisher output
	}
      if$
    }
    { format.incoll.inproc.crossref output.nonnull
      format.pages output
    }
  if$
  new.block
  note output
  fin.entry
}

FUNCTION {conference} { inproceedings }

FUNCTION {manual}
{ output.bibitem
  author empty$
    { organization empty$
	'skip$
	{ organization output.nonnull
	  address output
	}
      if$
    }
    { format.authors output.nonnull }
  if$
  format.year.label "year" output.check
  new.block
  new.block
  format.btitle "title" output.check
  author empty$
    { organization empty$
	{ address new.block.checka
	  address output
	}
	'skip$
      if$
    }
    { organization address new.block.checkb
      organization output
      address output
    }
  if$
  format.edition output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {mastersthesis}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  new.block
  "Master's thesis" format.thesis.type output.nonnull
  school "school" output.check
  address output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {misc}
{ output.bibitem
  format.authors output 
  new.block
  format.year.label output
  new.block
  title howpublished new.block.checkb
  format.title output
  howpublished new.block.checka
  howpublished output
  format.date output
  new.block
  note output
  fin.entry
  empty.misc.check
}

FUNCTION {phdthesis}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.btitle "title" output.check
  new.block
  "{Ph.D.} thesis" format.thesis.type output.nonnull
  school "school" output.check
  address output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {proceedings}
{ output.bibitem
  editor empty$
    { organization output }
    { format.editors output.nonnull }
  if$
  new.block
  format.year.label "year" output.check
  new.block
  format.btitle "title" output.check
  format.bvolume output
  format.number.series output
  address empty$
    { editor empty$
	{ publisher new.sentence.checka }
	{ organization publisher new.sentence.checkb
	  organization output
	}
      if$
      publisher output
      format.date output
    }
    { address output.nonnull
      format.date output
      new.sentence
      editor empty$
	'skip$
	{ organization output }
      if$
      publisher output
    }
  if$
  new.block
  note output
  fin.entry
}

FUNCTION {techreport}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  new.block
  format.tr.number output.nonnull
  institution "institution" output.check
  address output
  format.date output
  new.block
  note output
  fin.entry
}

FUNCTION {unpublished}
{ output.bibitem
  format.authors "author" output.check
  new.block
  format.year.label "year" output.check
  new.block
  format.title "title" output.check
  new.block
  note "note" output.check
  format.date output
  fin.entry
}

FUNCTION {default.type} { misc }

MACRO {jan} {"January"}

MACRO {feb} {"February"}

MACRO {mar} {"March"}

MACRO {apr} {"April"}

MACRO {may} {"May"}

MACRO {jun} {"June"}

MACRO {jul} {"July"}

MACRO {aug} {"August"}

MACRO {sep} {"September"}

MACRO {oct} {"October"}

MACRO {nov} {"November"}

MACRO {dec} {"December"}

MACRO {acmcs} {"ACM Computing Surveys"}

MACRO {acta} {"Acta Informatica"}

MACRO {cacm} {"Communications of the ACM"}

MACRO {ibmjrd} {"IBM Journal of Research and Development"}

MACRO {ibmsj} {"IBM Systems Journal"}

MACRO {ieeese} {"IEEE Transactions on Software Engineering"}

MACRO {ieeetc} {"IEEE Transactions on Computers"}

MACRO {ieeetcad}
 {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}

MACRO {ipl} {"Information Processing Letters"}

MACRO {jacm} {"Journal of the ACM"}

MACRO {jcss} {"Journal of Computer and System Sciences"}

MACRO {scp} {"Science of Computer Programming"}

MACRO {sicomp} {"SIAM Journal on Computing"}

MACRO {tocs} {"ACM Transactions on Computer Systems"}

MACRO {tods} {"ACM Transactions on Database Systems"}

MACRO {tog} {"ACM Transactions on Graphics"}

MACRO {toms} {"ACM Transactions on Mathematical Software"}

MACRO {toois} {"ACM Transactions on Office Information Systems"}

MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}

MACRO {tcs} {"Theoretical Computer Science"}

READ

FUNCTION {sortify}
{ purify$
  "l" change.case$
}

INTEGERS { len }

FUNCTION {chop.word}
{ 's :=
  'len :=
  s #1 len substring$ =
    { s len #1 + global.max$ substring$ }
    's
  if$
}

INTEGERS { et.al.char.used }

FUNCTION {initialize.et.al.char.used}
{ #0 'et.al.char.used :=
}

EXECUTE {initialize.et.al.char.used}

FUNCTION {format.lab.names}
{ 's :=
  s num.names$ 'numnames :=

  numnames #1 =
    { s #1 "{vv }{ll}" format.name$ }
    { numnames #2 =
        { s #1 "{vv }{ll }and " format.name$ s #2 "{vv }{ll}" format.name$ *
        }
        { s #1 "{vv }{ll }\bgroup et al.\egroup " format.name$ }
      if$
    }
  if$

}

FUNCTION {author.key.label}
{ author empty$
    { key empty$

	{ cite$ #1 #3 substring$ }

	{ key #3 text.prefix$ }
      if$
    }
    { author format.lab.names }
  if$
}

FUNCTION {author.editor.key.label}
{ author empty$
    { editor empty$
	{ key empty$

	    { cite$ #1 #3 substring$ }

	    { key #3 text.prefix$ }
	  if$
	}
	{ editor format.lab.names }
      if$
    }
    { author format.lab.names }
  if$
}

FUNCTION {author.key.organization.label}
{ author empty$
    { key empty$
	{ organization empty$

	    { cite$ #1 #3 substring$ }

	    { "The " #4 organization chop.word #3 text.prefix$ }
	  if$
	}
	{ key #3 text.prefix$ }
      if$
    }
    { author format.lab.names }
  if$
}

FUNCTION {editor.key.organization.label}
{ editor empty$
    { key empty$
	{ organization empty$

	    { cite$ #1 #3 substring$ }

	    { "The " #4 organization chop.word #3 text.prefix$ }
	  if$
	}
	{ key #3 text.prefix$ }
      if$
    }
    { editor format.lab.names }
  if$
}

FUNCTION {calc.label}
{ type$ "book" =
  type$ "inbook" =
  or
    'author.editor.key.label
    { type$ "proceedings" =
	'editor.key.organization.label
	{ type$ "manual" =
	    'author.key.organization.label
	    'author.key.label
	  if$
	}
      if$
    }
  if$
  duplicate$

  "\protect\citename{" swap$ * "}" *
  year field.or.null purify$ *
  'label :=
  year field.or.null purify$ *

  sortify 'sort.label :=
}

FUNCTION {sort.format.names}
{ 's :=
  #1 'nameptr :=
  ""
  s num.names$ 'numnames :=
  numnames 'namesleft :=
    { namesleft #0 > }
    { nameptr #1 >
	{ "   " * }
	'skip$
      if$

      s nameptr "{vv{ } }{ll{ }}{  ff{ }}{  jj{ }}" format.name$ 't :=

      nameptr numnames = t "others" = and
	{ "et al" * }
	{ t sortify * }
      if$
      nameptr #1 + 'nameptr :=
      namesleft #1 - 'namesleft :=
    }
  while$
}

FUNCTION {sort.format.title}
{ 't :=
  "A " #2
    "An " #3
      "The " #4 t chop.word
    chop.word
  chop.word
  sortify
  #1 global.max$ substring$
}

FUNCTION {author.sort}
{ author empty$
    { key empty$
	{ "to sort, need author or key in " cite$ * warning$
	  ""
	}
	{ key sortify }
      if$
    }
    { author sort.format.names }
  if$
}

FUNCTION {author.editor.sort}
{ author empty$
    { editor empty$
	{ key empty$
	    { "to sort, need author, editor, or key in " cite$ * warning$
	      ""
	    }
	    { key sortify }
	  if$
	}
	{ editor sort.format.names }
      if$
    }
    { author sort.format.names }
  if$
}

FUNCTION {author.organization.sort}
{ author empty$
    { organization empty$
	{ key empty$
	    { "to sort, need author, organization, or key in " cite$ * warning$
	      ""
	    }
	    { key sortify }
	  if$
	}
	{ "The " #4 organization chop.word sortify }
      if$
    }
    { author sort.format.names }
  if$
}

FUNCTION {editor.organization.sort}
{ editor empty$
    { organization empty$
	{ key empty$
	    { "to sort, need editor, organization, or key in " cite$ * warning$
	      ""
	    }
	    { key sortify }
	  if$
	}
	{ "The " #4 organization chop.word sortify }
      if$
    }
    { editor sort.format.names }
  if$
}

FUNCTION {presort}

{ calc.label
  sort.label
  "    "
  *
  type$ "book" =

  type$ "inbook" =
  or
    'author.editor.sort
    { type$ "proceedings" =
	'editor.organization.sort
	{ type$ "manual" =
	    'author.organization.sort
	    'author.sort
	  if$
	}
      if$
    }
  if$

  *

  "    "
  *
  year field.or.null sortify
  *
  "    "
  *
  title field.or.null
  sort.format.title
  *
  #1 entry.max$ substring$
  'sort.key$ :=
}

ITERATE {presort}

SORT

STRINGS { longest.label last.sort.label next.extra }

INTEGERS { longest.label.width last.extra.num }

FUNCTION {initialize.longest.label}
{ "" 'longest.label :=
  #0 int.to.chr$ 'last.sort.label :=
  "" 'next.extra :=
  #0 'longest.label.width :=
  #0 'last.extra.num :=
}

FUNCTION {forward.pass}
{ last.sort.label sort.label =
    { last.extra.num #1 + 'last.extra.num :=
      last.extra.num int.to.chr$ 'extra.label :=
    }
    { "a" chr.to.int$ 'last.extra.num :=
      "" 'extra.label :=
      sort.label 'last.sort.label :=
    }
  if$
}

FUNCTION {reverse.pass}
{ next.extra "b" =
    { "a" 'extra.label := }
    'skip$
  if$
  label extra.label * 'label :=
  label width$ longest.label.width >
    { label 'longest.label :=
      label width$ 'longest.label.width :=
    }
    'skip$
  if$
  extra.label 'next.extra :=
}

EXECUTE {initialize.longest.label}

ITERATE {forward.pass}

REVERSE {reverse.pass}

FUNCTION {begin.bib}

{ et.al.char.used
    { "\newcommand{\etalchar}[1]{$^{#1}$}" write$ newline$ }
    'skip$
  if$
  preamble$ empty$

    'skip$
    { preamble$ write$ newline$ }
  if$

  "\begin{thebibliography}{" "}" * write$ newline$

}

EXECUTE {begin.bib}

EXECUTE {init.state.consts}

ITERATE {call.type$}

FUNCTION {end.bib}
{ newline$
  "\end{thebibliography}" write$ newline$
}

EXECUTE {end.bib}
\end{filecontents}

\begin{filecontents}{subcat.bib}

@InProceedings{sarkar00:_subcat_frames_czech,
  author = 	 {Anoop Sarkar and Daniel Zeman},
  title = 	 {Automatic Extraction of Subcategorization Frames for Czech},
  booktitle = 	 {Proceedings of COLING 2000},
  year =	 2000
}

@Article{dunning93:_statis,
  author = 	 {Ted Dunning},
  title = 	 {Accurate Methods for the Statistics of Surprise and Coincidence},
  journal = 	 {Computational Linguistics},
  year = 	 1993,
  volume =	 19,
  number =	 1,
  pages =	 {61--74},
  month =	 {March}
}

@Book{bickel77:_mathem_statis,
  author =	 {Peter Bickel and Kjell Doksum},
  title = 	 {Mathematical Statistics},
  publisher = 	 {Holden-Day Inc.},
  year = 	 1977
}

@InCollection{hajic98:_pdt,
  author = 	 {Jan Haji\v{c}},
  title = 	 {Building a Syntactically Annotated Corpus: The Prague Dependency Treebank},
  booktitle = 	 {Issues of Valency and Meaning},
  pages =	 {106--132},
  publisher =	 {Karolinum},
  year =	 1998,
  address =	 {Praha}
}

@InProceedings{hajic98:_tagger,
  author = 	 {Jan Haji\v{c} and Barbora Hladk\'a},
  title = 	 {Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset},
  booktitle = 	 {Proceedings of COLING-ACL 98},
  pages =	 {483--490},
  year =	 1998,
  series =	 {Universit\'e de Montr\'eal, Montr\'eal}
}

@InProceedings{manning93:_subcat,
  author = 	 {Christopher D. Manning},
  title = 	 {Automatic Acquisition of a Large Subcategorization Dictionary from Corpora},
  booktitle = 	 {Proceedings of the 31st Meeting of the ACL},
  pages =	 {235--242},
  year =	 1993,
  address =	 {Columbus, Ohio}
}

@InProceedings{briscoe97:_subcat,
  author = 	 {Ted Briscoe and John Carroll},
  title = 	 {Automatic Extraction of Subcategorization from Corpora},
  booktitle = 	 {Proceedings of the 5th ANLP Conference},
  pages =	 {356--363},
  year =	 1997,
  address =	 {Washington, D.C.},
  organization = {ACL}
}


@InProceedings{carroll98:_subcat_help_parser,
  author = 	 {John Carroll and Guido Minnen},
  title = 	 {Can Subcategorisation Probabilities Help a Statistical Parser},
  booktitle = 	 {Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora (WVLC-6)},
  year =	 1998,
  address =	 {Montreal, Canada}
}

@InProceedings{brent91:_subcat,
  author = 	 {Michael Brent},
  title = 	 {Automatic acquisition of subcategorization frames from untagged text},
  booktitle = 	 {Proceedings of the 29th Meeting of the ACL},
  pages =	 {209--214},
  year =	 1991,
  address =	 {Berkeley, CA}
}

@Article{brent93:_unsup_learn,
  author = 	 {Michael Brent},
  title = 	 {From grammar to lexicon: unsupervised learning of lexical syntax},
  journal = 	 {Computational Linguistics},
  year = 	 1993,
  volume =	 19,
  number =	 3,
  pages =	 {243--262}
}

@Article{brent94:_acquis_subcat,
  author = 	 {Michael Brent},
  title = 	 {Acquisition of subcategorization frames using aggregated evidence from local syntactic cues},
  journal = 	 {Lingua},
  year = 	 1994,
  volume =	 92,
  pages =	 {433--470},
  note =	 {Reprinted in Acquisition of the Lexicon, L. Gleitman and B. Landau (Eds.). MIT Press, Cambridge, MA}
}


@InProceedings{siegel97:_class_verbs,
  author = 	 {Eric V. Siegel},
  title = 	 {Learning Methods for Combining Linguistic Indicators to Classify Verbs},
  booktitle = 	 {Proceedings of EMNLP-97},
  pages =	 {156--162},
  year =	 1997
}


@InProceedings{webster89:_lexical_frames,
  author = 	 {Mort Webster and Mitchell Marcus},
  title = 	 {Automatic acquisition of the lexical frames of verbs from sentence frames},
  booktitle = 	 {Proceedings of the 27th Meeting of the ACL},
  pages =	 {177--184},
  year =	 1989
}

@InProceedings{stevenson99:_verb_class,
  author = 	 {Suzanne Stevenson and Paola Merlo},
  title = 	 {Automatic Verb Classification using Distributions of Grammatical Features},
  booktitle = 	 {Proceedings of EACL '99},
  pages =	 {45--52},
  year =	 1999,
  address =	 {Bergen, Norway},
  month =	 {8--12 June}
}

@InProceedings{li96:_case_frames,
  author = 	 {Hang Li and Naoki Abe},
  title = 	 {Learning Dependencies between Case Frame Slots},
  booktitle = 	 {Proceedings of the 16th International Conference on Computational Linguistics (COLING '96)},
  pages =	 {10--15},
  year =	 1996
}

@InProceedings{carroll98:_valen_pcfg,
  author = 	 {Glenn Carroll and Mats Rooth},
  title = 	 "{Valence induction with a head-lexicalized PCFG}",
  booktitle = 	 {Proceedings of the 3rd Conference on Empirical Methods in Natural
Language Processing (EMNLP 3)},
  year =	 1998,
  address =	 {Granada, Spain}
}


@InProceedings{ushioda93:_verb_subcat,
  author = 	 {Akira Ushioda and David A. Evans and Ted Gibson and Alex Waibel},
  title = 	 {The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora},
  booktitle = 	 {Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text},
  pages =	 {95--106},
  year =	 1993,
  editor =	 {B. Boguraev and J. Pustejovsky},
  address =	 {Columbus, OH},
  month =	 {21 June}
}

@InCollection{ersan96:_case_frames,
  author = 	 {Murat Ersan and Eugene Charniak},
  title = 	 {A Statistical Syntactic Disambiguation Program and What It Learns},
  booktitle = 	 {Connectionist, Statistical and Symbolic Approaches in Learning for Natural Language Processing},
  pages =	 {146--159},
  publisher =	 {Springer-Verlag},
  year =	 1996,
  editor =	 {S. Wermter and E. Riloff and G. Scheler},
  volume =	 1040,
  series =	 {Lecture Notes in Artifical Intelligence},
  address =	 {Berlin}
}

@InProceedings{lapata99:_verb_class,
  author = 	 {Maria Lapata and Chris Brew},
  title = 	 {Using subcategorization to resolve verb class ambiguity},
  booktitle = 	 {Proceedings of WVLC/EMNLP},
  pages =	 {266--274},
  year =	 1999,
  editor =	 {Pascale Fung and Joe Zhou},
  month =	 {21-22 June}
}

@InProceedings{lapata99:_acquir_lexic_gener,
  author = 	 {Maria Lapata},
  title = 	 {Acquiring Lexical Generalizations from Corpora: A case study for diathesis alternations},
  booktitle = 	 {Proceedings of 37th Meeting of ACL},
  pages =	 {397--404},
  year =	 1999
}

@InProceedings{stevenson99:_lexical_sem,
  author = 	 {Suzanne Stevenson and Paola Merlo and Natalia Kariaeva and Kamin Whitehouse},
  title = 	 {Supervised learning of lexical semantic classes using frequency distributions},
  booktitle = 	 {SIGLEX-99},
  year =	 1999
}

@InProceedings{basili98:_subcat,
  author = 	 {Roberto Basili and Michele Vindigni},
  title = 	 {Adapting a Subcategorization Lexicon to a Domain},
  booktitle = 	 {Proceedings of the ECML'98 Workshop {\em TANLPS: Towards adaptive NLP-driven systems: linguistic information, learning methods and applications}},
  year =	 1998,
  address =	 {Chemnitz, Germany},
  month =	 {Apr 24}
}

\end{filecontents}

\usepackage{colacl}
\usepackage{times}

\begin{filecontents}{pdt_ex.eps}
%!PS-Adobe-2.0 EPSF-2.0
%%Title: pdt_ex.eps
%%Creator: fig2dev Version 3.2 Patchlevel 0-beta3
%%CreationDate: Fri Jan 14 15:17:18 2000
%%For: anoop@seringa.cis.upenn.edu (Anoop Sarkar)
%%Orientation: Portrait
%%BoundingBox: 0 0 590 390
%%Pages: 0
%%BeginSetup
%%EndSetup
%%Magnification: 1.0000
%%EndComments
/$F2psDict 200 dict def
$F2psDict begin
$F2psDict /mtrx matrix put
/col-1 {0 setgray} bind def
/col0 {0.000 0.000 0.000 srgb} bind def
/col1 {0.000 0.000 1.000 srgb} bind def
/col2 {0.000 1.000 0.000 srgb} bind def
/col3 {0.000 1.000 1.000 srgb} bind def
/col4 {1.000 0.000 0.000 srgb} bind def
/col5 {1.000 0.000 1.000 srgb} bind def
/col6 {1.000 1.000 0.000 srgb} bind def
/col7 {1.000 1.000 1.000 srgb} bind def
/col8 {0.000 0.000 0.560 srgb} bind def
/col9 {0.000 0.000 0.690 srgb} bind def
/col10 {0.000 0.000 0.820 srgb} bind def
/col11 {0.530 0.810 1.000 srgb} bind def
/col12 {0.000 0.560 0.000 srgb} bind def
/col13 {0.000 0.690 0.000 srgb} bind def
/col14 {0.000 0.820 0.000 srgb} bind def
/col15 {0.000 0.560 0.560 srgb} bind def
/col16 {0.000 0.690 0.690 srgb} bind def
/col17 {0.000 0.820 0.820 srgb} bind def
/col18 {0.560 0.000 0.000 srgb} bind def
/col19 {0.690 0.000 0.000 srgb} bind def
/col20 {0.820 0.000 0.000 srgb} bind def
/col21 {0.560 0.000 0.560 srgb} bind def
/col22 {0.690 0.000 0.690 srgb} bind def
/col23 {0.820 0.000 0.820 srgb} bind def
/col24 {0.500 0.190 0.000 srgb} bind def
/col25 {0.630 0.250 0.000 srgb} bind def
/col26 {0.750 0.380 0.000 srgb} bind def
/col27 {1.000 0.500 0.500 srgb} bind def
/col28 {1.000 0.630 0.630 srgb} bind def
/col29 {1.000 0.750 0.750 srgb} bind def
/col30 {1.000 0.880 0.880 srgb} bind def
/col31 {1.000 0.840 0.000 srgb} bind def

end
save
-18.0 405.0 translate
1 -1 scale

/cp {closepath} bind def
/ef {eofill} bind def
/gr {grestore} bind def
/gs {gsave} bind def
/sa {save} bind def
/rs {restore} bind def
/l {lineto} bind def
/m {moveto} bind def
/rm {rmoveto} bind def
/n {newpath} bind def
/s {stroke} bind def
/sh {show} bind def
/slc {setlinecap} bind def
/slj {setlinejoin} bind def
/slw {setlinewidth} bind def
/srgb {setrgbcolor} bind def
/rot {rotate} bind def
/sc {scale} bind def
/sd {setdash} bind def
/ff {findfont} bind def
/sf {setfont} bind def
/scf {scalefont} bind def
/sw {stringwidth} bind def
/tr {translate} bind def
/tnt {dup dup currentrgbcolor
  4 -2 roll dup 1 exch sub 3 -1 roll mul add
  4 -2 roll dup 1 exch sub 3 -1 roll mul add
  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
  bind def
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
  4 -2 roll mul srgb} bind def
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
/$F2psEnd {$F2psEnteredState restore end} def
%%EndProlog

$F2psBegin
10 setmiterlimit
n -1000 7747 m -1000 -1000 l 11123 -1000 l 11123 7747 l cp clip
 0.06000 0.06000 sc
% Polyline
7.500 slw
gs  clippath
2520 1856 m 2396 1841 l 2512 1796 l 2351 1817 l 2359 1877 l cp
clip
n 6480 1305 m 2370 1845 l gs col0 s gr gr

% arrowhead
n 2520 1856 m 2396 1841 l 2512 1796 l 2516 1826 l 2520 1856 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
5114 1809 m 4990 1821 l 5094 1753 l 4941 1807 l 4961 1863 l cp
clip
n 6495 1290 m 4965 1830 l gs col0 s gr gr

% arrowhead
n 5114 1809 m 4990 1821 l 5094 1753 l 5104 1781 l 5114 1809 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
7499 1726 m 7595 1803 l 7474 1780 l 7621 1849 l 7646 1794 l cp
clip
n 6525 1305 m 7620 1815 l gs col0 s gr gr

% arrowhead
n 7499 1726 m 7595 1803 l 7474 1780 l 7487 1753 l 7499 1726 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
1782 2944 m 1670 2997 l 1743 2898 l 1619 3002 l 1658 3048 l cp
clip
n 2295 2475 m 1650 3015 l gs col0 s gr gr

% arrowhead
n 1782 2944 m 1670 2997 l 1743 2898 l 1763 2921 l 1782 2944 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3594 2963 m 3694 3035 l 3572 3019 l 3723 3078 l 3745 3023 l cp
clip
n 2310 2490 m 3720 3045 l gs col0 s gr gr

% arrowhead
n 3594 2963 m 3694 3035 l 3572 3019 l 3583 2991 l 3594 2963 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
2875 4192 m 2754 4218 l 2849 4138 l 2703 4210 l 2730 4264 l cp
clip
n 3615 3795 m 2730 4230 l gs col0 s gr gr

% arrowhead
n 2875 4192 m 2754 4218 l 2849 4138 l 2862 4165 l 2875 4192 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3502 5305 m 3565 5410 l 3460 5347 l 3574 5462 l 3617 5419 l cp
clip
n 3150 4995 m 3585 5430 l gs col0 s gr gr

% arrowhead
n 3502 5305 m 3565 5410 l 3460 5347 l 3481 5326 l 3502 5305 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6479 3013 m 6355 3021 l 6460 2956 l 6306 3006 l 6325 3063 l cp
clip
n 7515 2640 m 6330 3030 l gs col0 s gr gr

% arrowhead
n 6479 3013 m 6355 3021 l 6460 2956 l 6470 2984 l 6479 3013 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
8626 2918 m 8733 2978 l 8610 2976 l 8767 3018 l 8782 2960 l cp
clip
n 7530 2655 m 8760 2985 l gs col0 s gr gr

% arrowhead
n 8626 2918 m 8733 2978 l 8610 2976 l 8618 2947 l 8626 2918 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
5766 783 m 5883 822 l 5761 843 l 5923 856 l 5927 796 l cp
clip
n 1470 465 m 5910 825 l gs col0 s gr gr

% arrowhead
n 5766 783 m 5883 822 l 5761 843 l 5763 813 l 5766 783 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
8749 656 m 8868 689 l 8747 716 l 8909 720 l 8911 660 l cp
clip
n 1470 480 m 8895 690 l gs col0 s gr gr

% arrowhead
n 8749 656 m 8868 689 l 8747 716 l 8748 686 l 8749 656 l  cp gs 0.00 setgray ef gr  col0 s
/Times-Roman ff 255.00 scf sf
4395 2115 m
gs 1 -1 sc ([\\,, ZIP, 6]) col0 sh gr
/Times-Roman ff 255.00 scf sf
9000 930 m
gs 1 -1 sc ([., ZIP, 11]) col0 sh gr
/Times-Roman ff 255.00 scf sf
705 3330 m
gs 1 -1 sc ([studenti, N1, 1]) col0 sh gr
/Times-Roman ff 255.00 scf sf
1515 2115 m
gs 1 -1 sc ([maji, VPP3A, 2]) col0 sh gr
/Times-Roman ff 255.00 scf sf
6000 915 m
gs 1 -1 sc ([vsak, JE, 8]) col0 sh gr
/Times-Roman ff 255.00 scf sf
6900 2130 m
gs 1 -1 sc ([chybi, VPP3A, 9]) col0 sh gr
/Times-Roman ff 255.00 scf sf
5430 3345 m
gs 1 -1 sc ([fakulte, N3, 7]) col0 sh gr
/Times-Roman ff 255.00 scf sf
7815 3315 m
gs 1 -1 sc ([anglictinari, N1, 10]) col0 sh gr
/Times-Roman ff 255.00 scf sf
3165 3315 m
gs 1 -1 sc ([zajem, N4, 5]) col0 sh gr
/Times-Roman ff 255.00 scf sf
2280 4515 m
gs 1 -1 sc ([o, R4, 3]) col0 sh gr
/Times-Roman ff 255.00 scf sf
2715 5715 m
gs 1 -1 sc ([jazyky, NIP4A, 4]) col0 sh gr
/Times-Roman ff 255.00 scf sf
3105 6105 m
gs 1 -1 sc (languages) col0 sh gr
/Times-Roman ff 255.00 scf sf
2640 4905 m
gs 1 -1 sc (in) col0 sh gr
/Times-Roman ff 255.00 scf sf
1080 3735 m
gs 1 -1 sc (students) col0 sh gr
/Times-Roman ff 255.00 scf sf
3465 3720 m
gs 1 -1 sc (interest) col0 sh gr
/Times-Roman ff 255.00 scf sf
5475 3735 m
gs 1 -1 sc (faculty\(dative\)) col0 sh gr
/Times-Roman ff 255.00 scf sf
7920 3720 m
gs 1 -1 sc (teachers of english) col0 sh gr
/Times-Roman ff 255.00 scf sf
7470 2490 m
gs 1 -1 sc (miss) col0 sh gr
/Times-Roman ff 255.00 scf sf
2085 2430 m
gs 1 -1 sc (have) col0 sh gr
/Times-Roman ff 255.00 scf sf
6405 1245 m
gs 1 -1 sc (but) col0 sh gr
/Times-Roman ff 255.00 scf sf
300 465 m
gs 1 -1 sc ([#, ZSB, 0]) col0 sh gr
/Times-Roman ff 255.00 scf sf
5250 6675 m
gs 1 -1 sc (The students are interested in languages but the faculty is missing teachers of English.) dup sw pop 2 div neg 0 rm  col0 sh gr
$F2psEnd
rs
\end{filecontents}

\begin{filecontents}{subsets.eps}
%!PS-Adobe-2.0 EPSF-2.0
%%Title: subsets.eps
%%Creator: fig2dev Version 3.2 Patchlevel 0-beta3
%%CreationDate: Fri Jan 14 11:17:04 2000
%%For: anoop@seringa.cis.upenn.edu (Anoop Sarkar)
%%Orientation: Portrait
%%BoundingBox: 0 0 559 208
%%Pages: 0
%%BeginSetup
%%EndSetup
%%Magnification: 1.0000
%%EndComments
/$F2psDict 200 dict def
$F2psDict begin
$F2psDict /mtrx matrix put
/col-1 {0 setgray} bind def
/col0 {0.000 0.000 0.000 srgb} bind def
/col1 {0.000 0.000 1.000 srgb} bind def
/col2 {0.000 1.000 0.000 srgb} bind def
/col3 {0.000 1.000 1.000 srgb} bind def
/col4 {1.000 0.000 0.000 srgb} bind def
/col5 {1.000 0.000 1.000 srgb} bind def
/col6 {1.000 1.000 0.000 srgb} bind def
/col7 {1.000 1.000 1.000 srgb} bind def
/col8 {0.000 0.000 0.560 srgb} bind def
/col9 {0.000 0.000 0.690 srgb} bind def
/col10 {0.000 0.000 0.820 srgb} bind def
/col11 {0.530 0.810 1.000 srgb} bind def
/col12 {0.000 0.560 0.000 srgb} bind def
/col13 {0.000 0.690 0.000 srgb} bind def
/col14 {0.000 0.820 0.000 srgb} bind def
/col15 {0.000 0.560 0.560 srgb} bind def
/col16 {0.000 0.690 0.690 srgb} bind def
/col17 {0.000 0.820 0.820 srgb} bind def
/col18 {0.560 0.000 0.000 srgb} bind def
/col19 {0.690 0.000 0.000 srgb} bind def
/col20 {0.820 0.000 0.000 srgb} bind def
/col21 {0.560 0.000 0.560 srgb} bind def
/col22 {0.690 0.000 0.690 srgb} bind def
/col23 {0.820 0.000 0.820 srgb} bind def
/col24 {0.500 0.190 0.000 srgb} bind def
/col25 {0.630 0.250 0.000 srgb} bind def
/col26 {0.750 0.380 0.000 srgb} bind def
/col27 {1.000 0.500 0.500 srgb} bind def
/col28 {1.000 0.630 0.630 srgb} bind def
/col29 {1.000 0.750 0.750 srgb} bind def
/col30 {1.000 0.880 0.880 srgb} bind def
/col31 {1.000 0.840 0.000 srgb} bind def

end
save
-18.0 223.0 translate
1 -1 scale

/cp {closepath} bind def
/ef {eofill} bind def
/gr {grestore} bind def
/gs {gsave} bind def
/sa {save} bind def
/rs {restore} bind def
/l {lineto} bind def
/m {moveto} bind def
/rm {rmoveto} bind def
/n {newpath} bind def
/s {stroke} bind def
/sh {show} bind def
/slc {setlinecap} bind def
/slj {setlinejoin} bind def
/slw {setlinewidth} bind def
/srgb {setrgbcolor} bind def
/rot {rotate} bind def
/sc {scale} bind def
/sd {setdash} bind def
/ff {findfont} bind def
/sf {setfont} bind def
/scf {scalefont} bind def
/sw {stringwidth} bind def
/tr {translate} bind def
/tnt {dup dup currentrgbcolor
  4 -2 roll dup 1 exch sub 3 -1 roll mul add
  4 -2 roll dup 1 exch sub 3 -1 roll mul add
  4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
  bind def
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
  4 -2 roll mul srgb} bind def
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
/$F2psEnd {$F2psEnteredState restore end} def
%%EndProlog

$F2psBegin
10 setmiterlimit
n -1000 4702 m -1000 -1000 l 10611 -1000 l 10611 4702 l cp clip
 0.06000 0.06000 sc
% Polyline
7.500 slw
gs  clippath
3460 548 m 3577 509 l 3492 599 l 3629 512 l 3597 462 l cp
clip
n 2700 1065 m 3600 495 l gs col0 s gr gr

% arrowhead
n 3460 548 m 3577 509 l 3492 599 l 3476 574 l 3460 548 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3391 991 m 3513 1007 l 3397 1050 l 3558 1033 l 3552 974 l cp
clip
n 2685 1095 m 3540 1005 l gs col0 s gr gr

% arrowhead
n 3391 991 m 3513 1007 l 3397 1050 l 3394 1020 l 3391 991 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3417 1415 m 3515 1489 l 3393 1470 l 3542 1533 l 3566 1478 l cp
clip
n 2700 1140 m 3540 1500 l gs col0 s gr gr

% arrowhead
n 3417 1415 m 3515 1489 l 3393 1470 l 3405 1442 l 3417 1415 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3379 2005 m 3498 2039 l 3377 2065 l 3539 2070 l 3541 2010 l cp
clip
n 2595 2010 m 3525 2040 l gs col0 s gr gr

% arrowhead
n 3379 2005 m 3498 2039 l 3377 2065 l 3378 2035 l 3379 2005 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3395 2469 m 3486 2552 l 3366 2521 l 3509 2599 l 3537 2546 l cp
clip
n 2595 2070 m 3510 2565 l gs col0 s gr gr

% arrowhead
n 3395 2469 m 3486 2552 l 3366 2521 l 3381 2495 l 3395 2469 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
3422 2864 m 3490 2966 l 3381 2908 l 3501 3017 l 3541 2973 l cp
clip
n 2610 2160 m 3510 2985 l gs col0 s gr gr

% arrowhead
n 3422 2864 m 3490 2966 l 3381 2908 l 3402 2886 l 3422 2864 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6556 385 m 6678 406 l 6560 445 l 6722 434 l 6718 374 l cp
clip
n 5385 495 m 6705 405 l gs col0 s gr gr

% arrowhead
n 6556 385 m 6678 406 l 6560 445 l 6558 415 l 6556 385 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6622 3005 m 6648 3125 l 6568 3032 l 6640 3177 l 6694 3150 l cp
clip
n 5370 540 m 6660 3150 l gs col0 s gr gr

% arrowhead
n 6622 3005 m 6648 3125 l 6568 3032 l 6595 3018 l 6622 3005 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6573 943 m 6693 974 l 6573 1003 l 6735 1005 l 6735 945 l cp
clip
n 5460 960 m 6720 975 l gs col0 s gr gr

% arrowhead
n 6573 943 m 6693 974 l 6573 1003 l 6573 973 l 6573 943 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6565 563 m 6670 499 l 6607 605 l 6722 491 l 6679 448 l cp
clip
n 5610 1560 m 6690 480 l gs col0 s gr gr

% arrowhead
n 6565 563 m 6670 499 l 6607 605 l 6586 584 l 6565 563 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6530 1119 m 6650 1092 l 6557 1173 l 6702 1100 l 6675 1046 l cp
clip
n 5625 1605 m 6675 1080 l gs col0 s gr gr

% arrowhead
n 6530 1119 m 6650 1092 l 6557 1173 l 6544 1146 l 6530 1119 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6570 1617 m 6694 1612 l 6587 1675 l 6743 1630 l 6726 1572 l cp
clip
n 5130 2070 m 6720 1605 l gs col0 s gr gr

% arrowhead
n 6570 1617 m 6694 1612 l 6587 1675 l 6579 1646 l 6570 1617 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6563 3156 m 6638 3253 l 6526 3203 l 6653 3303 l 6690 3256 l cp
clip
n 5145 2085 m 6660 3270 l gs col0 s gr gr

% arrowhead
n 6563 3156 m 6638 3253 l 6526 3203 l 6544 3179 l 6563 3156 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6651 2830 m 6690 2947 l 6600 2862 l 6688 2999 l 6738 2966 l cp
clip
n 5445 1005 m 6705 2970 l gs col0 s gr gr

% arrowhead
n 6651 2830 m 6690 2947 l 6600 2862 l 6626 2846 l 6651 2830 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6487 3217 m 6576 3301 l 6457 3269 l 6598 3348 l 6628 3296 l cp
clip
n 5220 2535 m 6600 3315 l gs col0 s gr gr

% arrowhead
n 6487 3217 m 6576 3301 l 6457 3269 l 6472 3243 l 6487 3217 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6495 2193 m 6618 2195 l 6508 2252 l 6666 2216 l 6653 2157 l cp
clip
n 5250 2505 m 6645 2190 l gs col0 s gr gr

% arrowhead
n 6495 2193 m 6618 2195 l 6508 2252 l 6502 2222 l 6495 2193 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6593 2719 m 6711 2683 l 6624 2771 l 6763 2688 l 6733 2637 l cp
clip
n 5265 3540 m 6735 2670 l gs col0 s gr gr

% arrowhead
n 6593 2719 m 6711 2683 l 6624 2771 l 6608 2745 l 6593 2719 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6568 1842 m 6671 1774 l 6611 1883 l 6722 1765 l 6678 1724 l cp
clip
n 5535 2985 m 6690 1755 l gs col0 s gr gr

% arrowhead
n 6568 1842 m 6671 1774 l 6611 1883 l 6589 1862 l 6568 1842 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6520 2363 m 6637 2324 l 6551 2414 l 6689 2327 l 6657 2277 l cp
clip
n 5490 3045 m 6660 2310 l gs col0 s gr gr

% arrowhead
n 6520 2363 m 6637 2324 l 6551 2414 l 6536 2388 l 6520 2363 l  cp gs 0.00 setgray ef gr  col0 s
% Polyline
gs  clippath
6435 3394 m 6558 3408 l 6443 3453 l 6604 3433 l 6596 3373 l cp
clip
n 5295 3570 m 6585 3405 l gs col0 s gr gr

% arrowhead
n 6435 3394 m 6558 3408 l 6443 3453 l 6439 3424 l 6435 3394 l  cp gs 0.00 setgray ef gr  col0 s
/Times-Roman ff 255.00 scf sf
3675 585 m
gs 1 -1 sc (N4 R2\(od\) {2}) col0 sh gr
/Times-Roman ff 255.00 scf sf
3645 1095 m
gs 1 -1 sc (N4 R2\(do\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
3630 1620 m
gs 1 -1 sc (R2\(od\) R2\(do\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
3630 2145 m
gs 1 -1 sc (N4 R6\(v\) {1}) col0 sh gr
/Times-Roman ff 255.00 scf sf
3645 2625 m
gs 1 -1 sc (N4 R6\(na\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
3645 3090 m
gs 1 -1 sc (R6\(v\) R6\(na\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
3645 3630 m
gs 1 -1 sc (N4 R6\(po\) {1}) col0 sh gr
/Times-Roman ff 255.00 scf sf
6810 480 m
gs 1 -1 sc (R2\(od\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
6825 1050 m
gs 1 -1 sc (R2\(do\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
6810 1665 m
gs 1 -1 sc (R6\(v\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
6795 2235 m
gs 1 -1 sc (R6\(na\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
6795 2805 m
gs 1 -1 sc (R6\(po\) {0}) col0 sh gr
/Times-Roman ff 255.00 scf sf
6780 3345 m
gs 1 -1 sc (N4 {2+1+1}) col0 sh gr
/Times-Roman ff 255.00 scf sf
345 1215 m
gs 1 -1 sc (N4 R2\(od\) R2\(do\) {2}) col0 sh gr
/Times-Roman ff 255.00 scf sf
315 2130 m
gs 1 -1 sc (N4 R6\(v\) R6\(na\) {1}) col0 sh gr
/Times-Roman ff 255.00 scf sf
8535 2010 m
gs 1 -1 sc (empty {0}) col0 sh gr
$F2psEnd
rs
\end{filecontents}

\begin{filecontents}{example-numbers.tex}
%%% Robert Rubinoff - Mar 3 1986
%%% sentences - an environment for producing numbered sentence example lists.
%%% you enter it with \beginsentences
%%% finish with \endsentences
%%%     (1) This kind is produced by \sitem
%%%     (2)i. This kind is produced by \smainitem
%%%       ii. This kind (i.e. subsequent subitems) is produced by \ssubitem
%%%  The sentences are numbered using \sentencectr and \sentencesubctr
%%%
%%% some formatting control provided via:
%%%     \smainform - controls format of main number - default is \arabic
%%%     \ssubform  - controls format of sub number - default is \roman
%%%     \ssubpunc  - controls punctuation after sub number - default is "."
%%% these can be changed via \renewcommand
%%% 
%%% you can also generate cross-reference labels via \label; this gives
%%% you the main counter and (when appropriate) the subcounter.  To get
%%% a label of just the main item number when in a \smainitem or \ssubitem,
%%% use \smainlabel; to get just the sublabel, use \ssublabel

\newcounter{sentencectr}
\newcounter{sentencesubctr}

\renewcommand{\thesentencectr}{(\smainform{sentencectr})}
\renewcommand{\thesentencesubctr}{\thesentencectr\ssubform{sentencesubctr}}

\newcommand{\smainform}{\arabic}
\newcommand{\ssubform}{\alph}
\newcommand{\ssubpunc}{.{}}

\newcommand{\beginsentences}{ %
\pagebreak[3] %
\begin{list}{(\thesentencectr)}
   {\usecounter{sentencesubctr}
%The next line controls how much space to put around examples.  The
%value I have here is for tight spaces.  Take it out when things
%aren't so desparate.
%    \setlength{\topsep}{0ex}			
%This following line is 0.5ex by default.  The 1ex spacing looks good in single
%spaced examples,  How about in space and a half?
    \setlength{\topsep}{1ex}			
    \setlength{\itemsep}{0 in}
    \setlength{\labelwidth}{0.5 in}
%Previous line increases width so we don't get problem with indented
%subitems once we have 2 digit example numbers. -- bf
%    \addtolength{\leftmargin}{25 pt}
%    \setlength{\leftmargin}{.6in}
% This next line makes indentation of examples the same as in LI which
% looks pretty good, I think.
%    \addtolength{\leftmargin}{8ex}
    \addtolength{\leftmargin}{4ex}
    \setlength{\labelsep}{.05in}
    \setlength{\parsep}{0 in}}}
\def\endsentences{\end{list}}

\newcommand{\sitem}{\renewcommand{\thesentencesubctr}{(\smainform{sentencectr}}
                    \refstepcounter{sentencectr}
     \item[(\smainform{sentencectr})\hfill]}
\newcommand{\smainitem}{\renewcommand{\thesentencesubctr
                                    }{\thesentencectr\ssubform{sentencesubctr}}
                        \setcounter{sentencesubctr}{0}
                        \refstepcounter{sentencectr}
                        \refstepcounter{sentencesubctr}
     \item[\thesentencectr\hfill\ssubform{sentencesubctr}\ssubpunc]}
\newcommand{\ssubitem}{\refstepcounter{sentencesubctr}
     \item[\hfill\ssubform{sentencesubctr}\ssubpunc]}

\makeatletter            
\newcommand{\smainlabel}[1]{{%  the extra braces make the change local
\renewcommand{\@currentlabel}{\thesentencectr}\label{#1}}}

\newcommand{\ssublabel}[1]{{%  the extra braces make the change local
\renewcommand{\@currentlabel}{\ssubform{sentencesubctr}}\label{#1}}}
\makeatother



\makeatother

\end{filecontents}

% A4: 210mm * 297mm
\setlength\topmargin{-0.45in}
\setlength\textheight{9.8in} 

\title{Automatic Extraction of Subcategorization Frames for
Czech\thanks{ This work was done during the second author's visit to
the University of Pennsylvania. We would like to thank Prof. Aravind
Joshi, David Chiang, Mark Dras and the anonymous reviewers for their
comments. The first author's work is partially supported by NSF Grant
SBR 8920230.  Many tools used in this work are the results of project
No. VS96151 of the Ministry of Education of the Czech Republic. The
data (PDT) is thanks to grant No. 405/96/K214 of the Grant Agency of
the Czech Republic. Both grants were given to the Institute of Formal
and Applied Linguistics, Faculty of Mathematics and Physics, Charles
University, Prague. }}

%\author{Paper Number: 993}
\author{Anoop Sarkar \\[2pt]
Department of Computer and Info. Sci. \\
University of Pennsylvania \\
200 South 33rd Street, \\
Philadelphia, PA 19104 USA \\
{\tt anoop@linc.cis.upenn.edu}
\And
Daniel Zeman\\
\'Ustav form\'aln\'{\i} a aplikovan\'e lingvistiky \\
Univerzita Karlova \\
Malostransk\'e n\'am\v{e}st\'{\i} 25 \\
CZ-11800  Praha, Czechia \\
%UFAL MFF UK, Praha
%currently at IRCS, UPenn, Philadelphia
{\tt zeman@ufal.mff.cuni.cz}}

\usepackage{graphicx}
\usepackage{psfrag}

\input{example-numbers}

\renewcommand{\baselinestretch}{.99}

%---------------------------------------------------------
% Common definitions
%---------------------------------------------------------
\newcommand{\sep}{\,\mid\,}
\newcommand{\comment}[1]{}

%--------------------------------------------------------

\begin{document}
\maketitle

\begin{abstract}
We present some novel machine learning techniques for the
identification of subcategorization information for verbs in Czech. We
compare three different statistical techniques applied to this
problem.  We show how the learning algorithm can be used to discover
previously unknown subcategorization frames from the Czech Prague
Dependency Treebank. The algorithm can then be used to label
dependents of a verb in the Czech treebank as either arguments or
adjuncts. Using our techniques, we are able to achieve 88\% precision
on unseen parsed text.
\end{abstract}

\section{Introduction}

The subcategorization of verbs is an essential issue in parsing,
because it helps disambiguate the attachment of arguments and recover
the correct predicate-argument relations by a parser.
\cite{carroll98:_subcat_help_parser,carroll98:_valen_pcfg} give several
reasons why subcategorization information is important for a natural
language parser. Machine-readable dictionaries are not comprehensive
enough to provide this lexical
information~\cite{manning93:_subcat,briscoe97:_subcat}. Furthermore,
such dictionaries are available only for very few languages. We need
some general method for the automatic extraction of subcategorization
information from text corpora.

Several techniques and results have been reported on learning
subcategorization frames (SFs) from text corpora
\cite{webster89:_lexical_frames,brent91:_subcat,brent93:_unsup_learn,brent94:_acquis_subcat,ushioda93:_verb_subcat,manning93:_subcat,ersan96:_case_frames,briscoe97:_subcat,carroll98:_subcat_help_parser,carroll98:_valen_pcfg}.
All of this work deals with English. In this paper we report on
techniques that automatically extract SFs for Czech, which is a free
word-order language, where verb complements have visible case
marking.\footnote{ One of the anonymous reviewers pointed out
that~\cite{basili98:_subcat} presents a corpus-driven acquisition of
subcategorization frames for Italian. }

Apart from the choice of target language, this work also differs from previous
work in other ways. Unlike all other previous work in this area, we do
not assume that the set of SFs is known to us in advance. Also in
contrast, we work with syntactically annotated data (the Prague
Dependency Treebank, PDT~\cite{hajic98:_pdt}) where the
subcategorization information is {\em not} given; although this might
be considered a simpler problem as compared to using raw text, we have
discovered interesting problems that a user of a raw or tagged corpus
is unlikely to face.

We first give a detailed description of the task of uncovering SFs and
also point out those properties of Czech that have to be taken into
account when searching for SFs. Then we discuss some differences from
the other research efforts. We then present the three techniques that
we use to learn SFs from the input data.

In the input data, many observed dependents of the verb are
adjuncts. To treat this problem effectively, we describe a novel
addition to the hypothesis testing technique that uses subset of
observed frames to permit the learning algorithm to better distinguish
arguments from adjuncts.

Using our techniques, we are able to achieve 88\% precision in
distinguishing arguments from adjuncts on unseen parsed text.

\section{Task Description}

In this section we describe precisely the proposed task. We also
describe the input training material and the output produced by our
algorithms.

\subsection{Identifying subcategorization frames}
\label{sec:expl}

In general, the problem of identifying subcategorization frames is to
distinguish between arguments and adjuncts among the constituents
modifying a verb. e.g., in ``John saw Mary yesterday at the station'',
only ``John'' and ``Mary'' are required arguments while the other
constituents are optional (adjuncts). There is some controversy as to
the {\em correct} subcategorization of a given verb and linguists
often disagree as to what is the right set of SFs for a given verb. A
machine learning approach such as the one followed in this paper
sidesteps this issue altogether, since it is left to the algorithm to
learn what is an appropriate SF for a verb.

Figure~\ref{fig:pdt_ex} shows a sample input sentence from the PDT
annotated with dependencies which is used as training material for the
techniques described in this paper. Each node in the tree contains a
word, its part-of-speech tag (which includes morphological
information) and its location in the sentence. We also use the
functional tags which are part of the PDT annotation\footnote{ For
those readers familiar with the PDT functional tags, it is important
to note that the functional tag {\em Obj} does not always correspond
to an argument. Similarly, the functional tag {\em Adv} does not
always correspond to an adjunct. Approximately 50 verbs out of the
total 2993 verbs require an adverbial argument.}. To make future
discussion easier we define some terms here. Each daughter of a verb
in the tree shown is called a {\em dependent} and the set of all
dependents for that verb in that tree is called an {\em observed frame
(OF)}. A {\em subcategorization frame (SF)} is a subset of the OF. For
example the OF for the verb {\em maj\'{\i} (have)} in
Figure~\ref{fig:pdt_ex} is {\em \{~N1, N4~\}} and its SF is the same
as its OF. Note that which OF (or which part of it) is a true SF is not
marked in the training data. After training on such examples, the
algorithm takes as input parsed text and labels each daughter of each
verb as either an argument or an adjunct. It does this by selecting the
most likely SF for that verb given its OF.

% Show an example tree from PDT. Then show the same tree with
% annotations for each verb dependent being either an argument or an
% adjunct. Perhaps we should use an example with a couple of PPs. The
% task is to take as input the first tree and return as output the
% second tree.

\begin{figure*}[htbp]
  \begin{center}
    \leavevmode
    \psfrag{[#, ZSB, 0]}{\small [\# ZSB 0]}
    \psfrag{[maji, VPP3A, 2]}{\small [maj\'{\i} VPP3A 2]}
    \psfrag{[zajem, N4, 5]}{\small [z\'ajem N4 5]}
    \psfrag{[o, R4, 3]}{\small [o R4 3]}
    \psfrag{[jazyky, NIP4A, 4]}{\small [jazyky N4 4]}
    \psfrag{[fakulte, N3, 7]}{\small [fakult\v{e} N3 7]}
    \psfrag{[vsak, JE, 8]}{\small [v\v{s}ak JE 8]}
    \psfrag{[chybi, VPP3A, 9]}{\small [chyb\'{\i} VPP3A 9]}
    \psfrag{[anglictinari, N1, 10]}{\small [angli\v{c}tin\'a\v{r}i N1 10]}
    \psfrag{[studenti, N1, 1]}{\small [studenti N1 1]}
    \psfrag{[\\,, ZIP, 6]}{\small [, ZIP 6]}
    \psfrag{[., ZIP, 11]}{\small [. ZIP 11]}
    \psfrag{have}{\small have}
    \psfrag{but}{\small but}
    \psfrag{students}{\small students}
    \psfrag{faculty(dative)}{\small faculty(dative)}
    \psfrag{teachers of english}{\small teachers of English}
    \psfrag{miss}{\small miss}
    \psfrag{interest}{\small interest}
    \psfrag{in}{\small in}
    \psfrag{languages}{\small languages}
    \includegraphics[height=3in]{pdt_ex.eps}
    \caption{Example input to the algorithm from the Prague Dependency Treebank}
    \label{fig:pdt_ex}
  \end{center}
\end{figure*}

\subsection{Relevant properties of the Czech Data}

Czech is a ``free word-order'' language. This means that the
arguments of a verb do not have fixed positions and are not guaranteed
to be in a particular configuration with respect to the verb.

The examples in \ref{ex:wordorder} show that while Czech has a
relatively free word-order some orders are still marked. The SVO, OVS,
and SOV orders in \ref{ex:svo}, \ref{ex:ovs}, \ref{ex:sov}
respectively, differ in emphasis but have the same predicate-argument
structure. The examples \ref{ex:ques1}, \ref{ex:ques2} can only be
interpreted as a question. Such word orders require proper intonation
in speech, or a question mark in text.

% The examples in
% \ref{ex:wordorder} show that while Czech has a relatively free
% word-order some orders are still marked (cf. \ref{ex:marked}) and
% furthermore morphology is very important in identifying the arguments
% of the verb (cf. \ref{ex:morphmarking}).

The example \ref{ex:morphmarking} demonstrates how morphology is
important in identifying the arguments of the
verb. cf. \ref{ex:morphmarking} with \ref{ex:ovs}. The ending {\em -a}
of {\em Martin} is the only difference between the two sentences. It
however changes the morphological case of {\em Martin} and turns it
from subject into object.  Czech has 7 cases that can be distinguished
morphologically.

\beginsentences
\smainitem Martin otv\'{\i}r\'a soubor. (SVO: Martin opens the file) \label{ex:svo}
\ssubitem  Soubor otv\'{\i}r\'a Martin. (OVS: $\neq$ the file opens Martin) \label{ex:ovs}
\ssubitem  Martin soubor otv\'{\i}r\'a. \label{ex:sov}
\ssubitem \#Otv\'{\i}r\'a Martin soubor. \label{ex:ques1}
\ssubitem \#Otv\'{\i}r\'a soubor Martin. \label{ex:ques2}
\ssubitem Soubor otv\'{\i}r\'a Martina. ($=$ the file opens Martin) \label{ex:morphmarking}
\smainlabel{ex:wordorder}
\endsentences

Almost all the existing techniques for extracting SFs exploit the
relatively fixed word-order of English to collect features for their
learning algorithms using fixed patterns or rules (see
Table~\ref{tbl:previous_work} for more details). Such a technique is
not easily transported into a new language like Czech.
Fully parsed training data can help here by supplying all dependents
of a verb.
%, no matter where in the sentence these occur. 
The observed frames obtained this way have to be {\em normalized} with
respect to the word order, e.g. by using an alphabetic ordering.
%their members can be ordered alphabetically.

For extracting SFs, prepositions in Czech have to be handled
carefully. In some SFs, a particular preposition is required by the
verb, while in other cases it is a class of prepositions such as
locative prepositions (e.g. {\em in, on, behind, $\ldots$}) that are
required by the verb. In contrast, adjuncts can use a wider variety of
prepositions. Prepositions specify the case of their noun phrase
complements but a preposition can take complements with more than one
case marking with a different meaning for each case. (e.g. {\em na
most\v{e} $=$ on the bridge; na most $=$ onto the bridge}). In general,
verbs select not only for particular prepositions but also indicate the
case marking for their noun phrase complements.

% For extracting SFs, prepositions in Czech have to be handled
% carefully. In some SFs, a particular preposition is required by the
% verb, while in other cases it is a class of prepositions such as
% locative prepositions (e.g. {\em in, on, behind, $\ldots$}) that are
% required by the verb. In contrast, adjuncts can use a wider variety of
% prepositions. In general, verbs select for particular prepositions and
% also indicate the case marking for their noun phrase complements. The
% prepositions specify the case marking for their complement noun
% phrases (e.g. {\em na most\v{e} $=$ on the bridge; na most $=$
% onto the bridge}).

\subsection{Argument types}

We use the following set of labels as possible arguments for a verb in
our corpus.  They are derived from morphological tags and simplified
from the original PDT definition~\cite{hajic98:_tagger,hajic98:_pdt};
the numeric attributes are the case marking identifiers.  For
prepositions and clause complementizers, we also save the lemma in
parentheses.

\begin{itemize}

\item Noun phrases: N4, N3, N2, N7, N1

\item Prepositional phrases: R2(bez), R3(k), R4(na), R6(na), R7(s),
$\ldots$

\item Reflexive pronouns {\em se}, {\em si}: PR4, PR3

\item Clauses: S, JS(\v{z}e), JS(zda)

\item Infinitives (VINF)

\item passive participles (VPAS)

\item adverbs (DB)

\end{itemize}

We do not specify which SFs are possible since we aim to discover
these (see Section~\ref{sec:expl}).

\section{Three methods for identifying subcategorization frames}
\label{sec:methods}

We describe three methods that take as input a list of verbs and
associated observed frames from the training data (see
Section~\ref{sec:expl}), and learn an association between verbs and
possible SFs. We describe three methods that arrive at a numerical
score for this association.

However, before we can apply any statistical methods to the training
data, there is one aspect of using a treebank as input that has to be
dealt with. A correct frame (verb + its arguments) is almost always
accompanied by one or more adjuncts in a real sentence. Thus the {\em
observed frame} will almost always contain noise. The approach offered
by Brent and others counts all observed frames and then decides which
of them do not associate strongly with a given verb. In our situation
this approach will fail for most of the observed frames because we
rarely see the correct frames isolated in the training data. For example,
from the occurrences of the transitive verb {\it absolvovat} (``go through
something'') that occurred ten times in the corpus, no occurrence
consisted of the verb-object pair alone. In other words, the correct SF
constituted 0\% of the observed situations. Nevertheless, for each
observed frame, one of its subsets was the correct frame we sought
for. Therefore, we considered all possible subsets of all observed
frames. We used a technique which steps through the subsets of each
observed frame from larger to smaller ones and records their frequency
in data.  Large infrequent subsets are suspected to contain adjuncts,
so we replace them by more frequent smaller subsets. Small infrequent
subsets may have elided some arguments and are rejected. Further
details of this process are discussed in Section~\ref{sec:miscue}. 
% We
% update the definition of an {\em observed frame} to include these
% subsets.

\begin{figure*}[htbp]
  \begin{center}
    \leavevmode
    \includegraphics[height=1.6in]{subsets.eps}

    \caption{Computing the subsets of observed frames for the verb
    {\em \ absolvovat}. The counts for each frame are given within
    braces $\{\}$. In this example, the frames {\em N4 R2(od), N4
    R6(v)} and {\em N4 R6(po)} have been observed with other verbs in
    the corpus. Note that the counts in this figure do not correspond
    to the real counts for the verb {\em absolvovat} in the training
    corpus.}

    \label{fig:subsets}
  \end{center}
\end{figure*}

The methods we present here have a common structure. For each verb, we
need to associate a score to the hypothesis that a particular set of
dependents of the verb are arguments of that verb. In other words, we
need to assign a value to the hypothesis that the observed frame under
consideration is the verb's SF. Intuitively, we either want to test
for independence of the observed frame and verb distributions in the
data, or we want to test how likely is a frame to be observed with a
particular verb without being a valid SF. We develop these intuitions
with the following well-known statistical methods. For further
background on these methods the reader is referred to
\cite{bickel77:_mathem_statis,dunning93:_statis}.

\subsection{Likelihood ratio test}
\label{sec:lik}

Let us take the hypothesis that the distribution of an observed frame
$f$ in the training data is independent of the distribution of a verb
$v$. We can phrase this hypothesis as $p(f \sep v) = p(f \sep\ !v) =
p(f)$, that is distribution of a frame $f$ given that a verb $v$ is
present is the same as the distribution of $f$ given that $v$ is not
present (written as $!v$). We use the log likelihood test
statistic~\cite{bickel77:_mathem_statis}(p.209) as a measure to
discover particular frames and verbs that are highly associated in the
training data. 

\begin{eqnarray}
  \label{eqn:lik1}
	k_1 & = & c(f,v) \nonumber \\
	n_1 & = & c(v) = c(f,v) + c(!f,v) \nonumber \\
	k_2 & = & c(f,!v) \nonumber \\
	n_2 & = & c(!v) = c(f,!v) + c(!f,!v) \nonumber
\end{eqnarray}

where $c(\cdot)$ are counts in the training data. Using the values
computed above:

\begin{eqnarray}
  \label{eqn:lik2}
	p_1 & = & \frac{k_1}{n_1}  \nonumber \\
	p_2 & = & \frac{k_2}{n_2} \nonumber \\
	p & = & \frac{k_1 + k_2}{n_1 + n_2} \nonumber
\end{eqnarray}

Taking these probabilities to be binomially distributed, the log
likelihood statistic~\cite{dunning93:_statis} is given by:

%\newcommand{\log}{\mbox{{\em log}}}
\begin{eqnarray}
  \label{eqn:lik3}
\lefteqn{-2 \log \lambda = } \nonumber \\
&& 2 [ \log L(p_1, k_1, n_1) + \log L(p_2, k_2, n_2) - \nonumber \\
&&\ \ \ \log L(p, k_1, n_2) - \log L(p, k_2, n_2) ] \nonumber 
\end{eqnarray}

where,

\[ \log L(p,n,k) = k \log p + (n - k) \log(1 - p) \]

According to this statistic, the greater the value of $-2 \log
\lambda$ for a particular pair of observed frame and verb, the more
likely that frame is to be valid SF of the verb.

\subsection{T-scores}

Another statistic that has been used for hypothesis testing is the
{\em t-score}. Using the definitions from Section~\ref{sec:lik} we can
compute t-scores using the equation below and use its value to measure
the association between a verb and a frame observed with it. 

\[ T = \frac{ p_1 - p_2 }
	{ \sqrt{ \sigma^2(n_1, p_1) + \sigma^2(n_2, p_2) } } \]

where,

\[ \sigma(n,p) = n p ( 1 - p ) \]

In particular, the hypothesis being tested using the t-score is
whether the distributions $p_1$ and $p_2$ are {\em not}
independent. If the value of $T$ is greater than some threshold then
the verb $v$ should take the frame $f$ as a SF.

\subsection{Binomial Models of Miscue Probabilities}
\label{sec:miscue}

Once again assuming that the data is binomially distributed, we can
look for frames that co-occur with a verb by exploiting the miscue
probability: the probability of a frame co-occuring with a verb when
it is not a valid SF. This is the method used by several earlier
papers on SF extraction starting
with~\cite{brent91:_subcat,brent93:_unsup_learn,brent94:_acquis_subcat}.

Let us consider probability $p_{!f}$ which is the probability that a
given verb is observed with a frame but this frame is not a valid SF
for this verb. $p_{!f}$ is the error probability on identifying a SF
for a verb. Let us consider a verb $v$ which does {\em not} have as
one of its valid SFs the frame $f$. How likely is it that $v$ will be
seen $m$ or more times in the training data with frame $f$? If $v$ has
been seen a total of $n$ times in the data, then $H^\ast(p_{!f}; m,
n)$ gives us this likelihood.

\[ H^\ast(p_{!f}; m, n) = 
	\sum_{i = m}^{n} p_{!f}^i ( 1 - p_{!f})^{n - i} 
	\left( \begin{array}{c} n \\ i \end{array} \right) 
\]

If $H^\ast(p; m, n)$ is less than or equal to some small threshold
value then it is extremely unlikely that the hypothesis is true, and
hence the frame $f$ must be a SF of the verb $v$. Setting the
threshold value to $0.05$ gives us a 95\% or better confidence value
that the verb $v$ has been observed often enough with a frame $f$ for
it to be a valid SF.

Initially, we consider only the observed frames (OFs) from the
treebank. There is a chance that some are subsets of some others but
now we count only the cases when the OFs were seen themselves. Let's
assume the test statistic rejected the frame. Then it is not a real SF
but there probably is a subset of it that is a real SF. So we select
exactly one of the subsets whose length is one member less: this is
the {\em successor} of the rejected frame and inherits its
frequency. Of course one frame may be successor of several longer
frames and it can have its own count as OF. This is how frequencies
accumulate and frames become more likely to survive.  The example
shown in Figure~\ref{fig:subsets} illustrates how the subsets and
successors are selected.

An important point is the selection of the successor. We have to
select only one of the $n$ possible successors of a frame of length
$n$, otherwise we would break the total frequency of the verb. Suppose
there is $m$ rejected frames of length $n$. This yields $m * n$
possible modifications to consider before selection of the
successor. We implemented two methods for choosing a single successor
frame:

\begin{enumerate}

\item Choose the one that results in the strongest preference for some
frame (that is, the successor frame results in the lowest entropy
across the corpus). This measure is sensitive to the frequency of this
frame in the rest of corpus.

\item Random selection of the successor frame from the alternatives.

\end{enumerate}

Random selection resulted in better precision (88\% instead of
86\%). It is not clear why a method that is sensitive to the frequency
of each proposed successor frame does not perform better than random
selection.

% Initially, we consider only the observed frames (OFs) from the
% treebank. There is a chance that some of them are subsets of some
% others but at the beginning we count only the cases when the OFs were
% seen themselves. We take (one of the) longest frames and test it using
% the test statistic.

% Let's assume the test statistic rejected the frame. Then the OF is not
% a real SF but there probably is a subset of the OF that is a real
% SF. So we select one dependent, remove it from the OF and save the
% rest for a future test. This new OF inherits the frequency of the
% original OF. However, it is also possible that such shorter frame is
% known already, either directly from training data, or as a successor
% of another longer OF. In this case we add the frequency of just
% rejected OF to the count saved with the successor so far. This is how
% frequencies cumulate and frames become more likely to survive.

% After processing a frame we do the same with next frame having the
% same number of dependents. Once there is no such frame we descend to
% the level of frames having one dependent less. We go on this way until
% all frames have been processed.

% An important point is the selection of the successor (or the dependent
% we remove from a rejected frame). Although there are $n$ possible
% successors for a frame of length $n$, we have to select only one of
% them, otherwise we would break the total frequency of the verb. Since
% we aim to choose the most frequent subpart, we wait until the counts
% of the frames on lower level are definite, i.e.  until all the frames
% of the current length are processed. At that point we have $m$
% rejected frames of length $n$, each of which can be shortened in $n$
% ways.  This yields $m * n$ possible modifications of the lower
% level. From these combinations we choose the one that results in the
% strongest preference for some frame on the lower level (lowest entropy
% of the lower level).

The technique described here may sometimes result in subset of a
correct SF, discarding one or more of its members. Such frame can
still help parsers because they can at least look for the dependents
that have survived.

\section{Evaluation}

For the evaluation of the methods described above we used the Prague
Dependency Treebank (PDT). We used 19,126 sentences of training data
from the PDT (about 300K words). In this training set, there were
33,641 verb tokens with 2,993 verb types. There were a total of 28,765
{\em observed frames} (see Section~\ref{sec:expl} for explanation of
these terms).
% which reduced to 13,665 observed frames after preprocessing. 
There were 914 verb types seen 5 or more times.

Since there is no electronic valence dictionary for Czech, we
evaluated our filtering technique on a set of 500 test sentences which
were unseen and separate from the training data. These test sentences
were used as a gold standard by distinguishing the arguments and
adjuncts manually. We then compared the accuracy of our output set of
items marked as either arguments or adjuncts against this gold
standard.

First we describe the baseline methods. Baseline method 1: consider
each dependent of a verb an adjunct.  Baseline method 2: use just the
longest known observed frame matching the test pattern. If no matching
OF is known, find the longest partial match in the OFs seen in the
training data. We exploit the functional and morphological tags while
matching. No statistical filtering is applied in either baseline
method.
% Baseline method 2: use the subset
% technique described in Section~\ref{sec:methods} without using any of
% the statistical tests, instead using a longest match heuristic using
% the functional and morphological tags of the PDT (see
% Section~\ref{sec:expl}).

% \begin{table}[htbp]
% \begin{center}
% \begin{tabular}{|l||c|c|} \hline 
%  & Baseline 1 & Baseline 2 \\
% \hline
% verb complements &     2190 &  2190 \\
% true arguments &        925 & 925 \\
% proposed arguments &      0 & 1126 \\
% proposed adjuncts &    2190 & 1064 \\
% correctly classified & 1264 & 1734 \\
% accuracy & 57.75\% & 79.2\% \\
% \hline
% \end{tabular}
% \caption{Results for the baseline methods}
% \label{tbl:baseline}
% \end{center}
% \end{table}

A comparison between all three methods that were proposed in this
paper is shown in Table~\ref{tbl:comparison}.

% \begin{table*}[htbp]
% \begin{center}
% \begin{tabular}{|l||c|c|c|} \hline 
% 				       & Likelihood Ratio & T-scores & Hyp. Testing \\
% \hline
% accuracy                               & 81.76\%	& 82.23\% &   88.11\% \\
% argument/adjunct decisions             & 2010		& 2010   &	1812	\\
% correct suggestions                    & 1643		& 1652   &	1596  \\
% suggested argument, correct adjunct    & 214		&  236   &	  27	\\
% suggested adjunct, correct argument    & 152		&  120   &	 188	\\
% suggested arguments (correct or wrong) & 973		& 1026   &	 674	\\
% true arguments                         & 910		&  910   &	 834	\\
% \hline
% \end{tabular}
% \caption{Comparison between the three methods}
% \label{tbl:comparison}
% \end{center}
% \end{table*}

\begin{table*}[htbp]
\begin{center}
\begin{tabular}{|l||c|c|c|c|c|} \hline 
  & Baseline 1 & Baseline 2 & Lik. Ratio & T-scores & Hyp. Testing \\
\hline
Precision & 55\% & 78\% & 82\% & 82\% & 88\% \\
Recall: & 55\% & 73\% & 77\% & 77\% & 74\% \\
$F_{\beta=1}$ & 55\% & 75\% & 79\% & 79\% & 80\% \\
\% unknown & 0\% &  6\% &  6\% &  6\% & 16\% \\
\hline
Total verb nodes  & 1027 & 1027 & 1027 & 1027 & 1027 \\
Total complements & 2144 & 2144 & 2144 & 2144 & 2144 \\
Nodes with known verbs & 1027 & 981 & 981 & 981 & 907 \\
Complements of known verbs & 2144 & 2010 & 2010 & 2010 & 1812 \\
Correct Suggestions & 1187.5 & 1573.5 & 1642.5 & 1652.9 & 1596.5 \\
True Arguments & 956.5 & 910.5 & 910.5 & 910.5 & 834.5 \\
Suggested Arguments & 0 & 1122 & 974 & 1026 & 674 \\
Incorrect arg suggestions & 0 & 324 & 215.5 & 236.3 & 27.5 \\
Incorrect adj suggestions & 956.5 & 112.5 & 152 & 120.8 & 188 \\
\hline
\end{tabular}
\caption{Comparison between the baseline methods and the three methods
proposed in this paper. Some of the values are not integers since for
some difficult cases in the test data, the value for each
argument/adjunct decision was set to a value between $[0,1]$. {\em
Recall} is computed as the number of known verb complements divided by
the total number of complements. {\em Precision} is computed as the
number of correct suggestions divided by the number of known verb
complements. $F_{\beta=1} = (2 \times p \times r)/(p+r)$. {\em \%
unknown} represents the percent of test data not considered by a
particular method. }
\label{tbl:comparison}
\end{center}
\end{table*}

The experiments showed that the method improved precision of this
distinction from 57\% to 88\%. We were able to classify as many as 914
verbs which is a number outperformed only by Manning, with 10x more
data (note that our results are for a different language).

Also, our method discovered 137 subcategorization frames from the
data. The known upper bound of frames that the algorithm could have
found (the total number of the {\em observed frame} types) was 450.

% We also
% present results of some additional experiments with data that were
% parsed automatically. Since more such data is available, it still
% improves accuracy.

\section{Comparison with related work}
\label{sec:relwork}

Preliminary work on SF extraction from corpora was done by
\cite{brent91:_subcat,brent93:_unsup_learn,brent94:_acquis_subcat} and
\cite{webster89:_lexical_frames,ushioda93:_verb_subcat}.
Brent~\cite{brent93:_unsup_learn,brent94:_acquis_subcat} uses the
standard method of testing miscue probabilities for filtering frames
observed with a verb. \cite{brent94:_acquis_subcat} presents a method
for estimating $p_{!f}$. Brent applied his method to a small number of
verbs and associated SF types.  \cite{manning93:_subcat} applies
Brent's method to parsed data and obtains a subcategorization
dictionary for a larger set of
verbs. \cite{briscoe97:_subcat,carroll98:_subcat_help_parser} differs
from earlier work in that a substantially larger set of SF types are
considered; \cite{carroll98:_valen_pcfg} use an EM algorithm to learn
subcategorization as a result of learning rule probabilities, and, in
turn, to improve parsing accuracy by applying the verb SFs obtained.
\cite{basili98:_subcat} use a conceptual clustering algorithm for
acquiring subcategorization frames for Italian. They establish a
partial order on partially overlapping OFs (similar to our OF subsets)
which is then used to suggest a potential SF. A complete comparison of
all the previous approaches with the current work is given in
Table~\ref{tbl:previous_work}.

While these approaches differ in size and quality of training data,
number of SF types (e.g. intransitive verbs, transitive verbs) and
number of verbs processed, there are properties that all have in
common.  They all assume that they know the set of possible SF types
in advance. Their task can be viewed as assigning one or more of the
(known) SF types to a given verb. In addition, except for
\cite{briscoe97:_subcat,carroll98:_subcat_help_parser}, only a small
number of SF types is considered.

Using a dependency treebank as input to our learning algorithm has
both advantages and drawbacks. There are two main advantages of
using a treebank:

\begin{itemize}

\item Access to more accurate data. Data is less noisy when compared
with tagged or parsed input data. We can expect correct identification
of verbs and their dependents.

\item We can explore techniques (as we have done in this paper) that
try and learn the set of SFs from the data itself, unlike other
approaches where the set of SFs have to be set in advance.

\end{itemize}

Also, by using a treebank we can use verbs in different contexts which
are problematic for previous approaches, e.g. we can use verbs that
appear in relative clauses. However, there are two main drawbacks:

\begin{itemize}

\item Treebanks are expensive to build and so the techniques presented
here have to work with less data.

\item All the dependents of each verb are visible to the learning
algorithm. This is contrasted with previous techniques that rely on
finite-state extraction rules which ignore many dependents of the
verb. Thus our technique has to deal with a different kind of data as
compared to previous approaches.

\end{itemize}

We tackle the second problem by using the method of observed frame
subsets described in Section~\ref{sec:miscue}.

\begin{table*}[htbp]
\begin{center}
\begin{tabular}{|l||l|c|c|l|l|l|} \hline
 Previous                    & Data     & \#SFs & \#verbs & Method     & Miscue     & Corpus      \\ 
 work                        &          &       & tested  &            & rate       &             \\ 
\hline
\cite{ushioda93:_verb_subcat}& POS +    & 6     & 33      & heuristics & NA         & WSJ (300K)   \\
                             & FS rules &       &         &            &            &              \\
\hline
\cite{brent93:_unsup_learn}  & raw +    & 6     & 193     & Hypothesis & iterative  & Brown (1.1M) \\ 
                             & FS rules &       &         & testing    & estimation &              \\
\hline
\cite{manning93:_subcat}     & POS +    & 19    & 3104    & Hypothesis & hand       & NYT (4.1M)   \\
                             & FS rules &       &         & testing    &            &              \\
\hline
\cite{brent94:_acquis_subcat}& raw +    & 12    & 126     & Hypothesis & non-iter   & CHILDES (32K)\\
                             & heuristics &     &         & testing    & estimation &              \\
\hline
\cite{ersan96:_case_frames}  & Full     & 16    & 30      & Hypothesis & hand       & WSJ (36M)    \\
                             & parsing  &       &         & testing    &            &              \\
\hline
\cite{briscoe97:_subcat}     & Full     & 160   & 14      & Hypothesis & Dictionary & various (70K)\\
                             & parsing  &       &         & testing    & estimation &              \\
\hline
\cite{carroll98:_valen_pcfg} & Unlabeled & 9+   & 3       & Inside-    & NA         & BNC (5-30M)  \\
                             &          &       &         & outside    &            &              \\
\hline
Current Work                 & Fully    & Learned & 914   & Subsets+   & Estimate   & PDT (300K)   \\
                             & Parsed   & 137     &       & Hyp. testing &          &              \\
\hline
\end{tabular}
\caption{Comparison with previous work on automatic SF extraction from corpora}
\label{tbl:previous_work}
\end{center}
\end{table*}

\section{Conclusion}

We are currently incorporating the SF information produced by the
methods described in this paper into a parser for Czech. We hope to
duplicate the increase in performance shown by treebank-based parsers
for English when they use SF information. Our methods can also be
applied to improve the annotations in the original treebank that we
use as training data. The automatic addition of subcategorization to
the treebank can be exploited to add predicate-argument information to
the treebank.
% Subcategorization
% information can also be useful in discovering linguistic information
% about verbs~\cite{siegel97:_class_verbs}.

Also, techniques for extracting SF information from data can be used
along with other research which aims to discover relationships between
different SFs of a
verb~\cite{stevenson99:_verb_class,lapata99:_verb_class,lapata99:_acquir_lexic_gener,stevenson99:_lexical_sem}.

The statistical models in this paper were based on the assumption that
given a verb, different SFs occur independently. This assumption is
used to justify the use of the binomial. Future work perhaps should
look towards removing this assumption by modeling the dependence
between different SFs for the same verb using a multinomial
distribution.

To summarize: we have presented techniques that can be used to learn
subcategorization information for verbs. We exploit a dependency
treebank to learn this information, and moreover we discover the final
set of valid subcategorization frames from the training data. We
achieve upto 88\% precision on unseen data.

We have also tried our methods on data which was automatically
morphologically tagged which allowed us to use more data (82K
sentences instead of 19K). The performance went up to 89\% (a 1\%
improvement).

%\nocite{*}
\bibliographystyle{acl}
{\footnotesize \bibliography{subcat}}
\end{document}


