Jim's
Tutorials

Spring 2012
course
navigation

2012-04-18

Parsing

Classical Parsing:
Catalan numbers
The problem of choosing the correct parse is AI-hard. But words are good predictors of attachment.
Grammars are defined by their grammar, symbols and lexicon. ie G = (T,C,N,S,L,R)
For probabilistic CFGs:
Parsers are good. It can be done in cubic time if we get sentences in binary phrase-structure form. ie every tree is binary. (X̄ form probably comes in handy here, J/M don't say anything about it.)

Parsers

CKY: Cocke-Kasami-Younger
Pseudocode (from wikipedia: CYK_algorithm:
let the input be a string S consisting of n characters: a1 ... an. let the grammar contain r nonterminal symbols R1 ... Rr. # This grammar contains the subset Rs which is the set of start symbols. let P[n,n,r] be an array of booleans. Initialize all elements of P to false. for each i = 1 to n for each unit production Rj -> ai set P[i,1,j] = true for each i = 2 to n -- Length of span for each j = 1 to n-i+1 -- Start of span for each k = 1 to i-1 -- Partition of span for each production RA -> RB RC if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then S is member of language else S is not member of language
Again binarization is crucial for O(n^3) runtime.
Steps:
  1. Initialize empty w*w array [for w len(sentence_to_be_parsed)]
  2. Put a priori probabilities for all possible tags of word w and "non-terminal" tags (S, NP, VP, etc.)
  3. Calc. probs of tags.
http://cs.marlboro.edu/ courses/ spring2012/jims_tutorials/ elias/ 2012-04-18
last modified Wednesday April 18 2012 10:12 am EDT