LINGUISTICA GENERALE E COMPUTAZIONALE
description
Transcript of LINGUISTICA GENERALE E COMPUTAZIONALE
LINGUISTICA GENERALE E COMPUTAZIONALE
ANALISI SINTATTICA (PARSING)
RECAP: CFG
• Bird et al, ch. 8.3
PARSING
• Parsing is the process of recognizing and assigning STRUCTURE
• Parsing a string with a CFG: – Finding a derivation of the string consistent with
the grammar– The derivation gives us a PARSE TREE
EXAMPLE (CFR LAST WEEK)
PARSING AS SEARCH
• Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS
• There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered
TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES
• The search has to be guided by the INPUT and the GRAMMAR
• TOP-DOWN search: the parse tree has to be rooted in the start symbol S– EXPECTATION-DRIVEN parsing
• BOTTOM-UP search: the parse tree must be an analysis of the input– DATA-DRIVEN parsing
AN EXAMPLE OF TOP-DOWN SEARCH(IN PARALLEL)
RECURSIVE DESCENT IN NLTK
• P. 303: – nltk.RecursiveDescentParser(grammar)– nltk.app.rdparser()
NON-PARALLEL SEARCH
• If it’s not possible to examine all alternatives in parallel, it’s necessary to make further decisions:– Which node in the current search space to expand
first (breadth-first or depth-first)– Which of the applicable grammar rules to expand
first– Which leaf node in a parse tree to expand next
(e.g., leftmost)
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)
A T-D, D-F, L-R PARSER
LEFT-RECURSION
• A LEFT-RECURSIVE grammar may cause a T-D, D-F, L-R parser to never return
• Examples of left-recursive rules:– NP NP PP– S S and S– But also:• NP Det Nom• Det NP’s
THE PROBLEM WITH LEFT-RECURSION
LEFT-RECURSION: POOR SOLUTIONS
• Rewrite the grammar to a weakly equivalent one– Problem: may not get correct parse tree
• Limit the depth during search– Problem: limit is arbitrary
AN EXAMPLE OF BOTTOM-UP SEARCH
SHIFT-REDUCE PARSING
• P. 305– nltk.app.srparser()– ShiftReduceParser(grammar)
TOP-DOWN vs BOTTOM-UP
• TOP-DOWN:– Only search among grammatical answers– BUT: suggests hypotheses that may not be
consistent with data– Problem: left-recursion
• BOTTOM-UP:– Only forms hypotheses consistent with data– BUT: may suggest hypotheses that make no sense
globally
LEFT-CORNER PARSING
• A hybrid of top-down and bottom-up parsing• Strategy: don’t consider any expansion unless
the current input can serve as the LEFT-CORNER of that expansion
FURTHER PROBLEMS IN PARSING
• Ambiguity – Church and Patel (1982): the number of
attachment ambiguities grows like the Catalan numbers • C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7)
= 1430, C(8) = 4867
• Avoiding reparsing
COMMON STRUCTURAL AMBIGUITIES
• COORDINATION ambiguity– OLD (MEN AND WOMEN) vs
(OLD MEN) AND WOMEN• ATTACHMENT ambiguity:– Gerundive VP attachment ambiguity• I saw the Eiffel Tower flying to Paris
– PP attachment ambiguity• I shot an elephant in my pajamas
PP ATTACHMENT AMBIGUITY
AMBIGUITY: SOLUTIONS
• Use a PROBABILISTIC GRAMMAR (not covered in this module)
• Use semantics
AVOID RECOMPUTING INVARIANTS
• Consider parsing with a top-down parser the NP:– A flight from Indianapolis to Houston on TWA
• With the grammar rules:– NP Det Nominal– NP NP PP– NP ProperNoun
INVARIANTS AND TOP-DOWN PARSING
THE EARLEY ALGORITHM
DYNAMIC PROGRAMMING
• A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way
• A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work
• The Earley algorithm also– Does not suffer from the left-recursion problem– Solves an exponential problem in O(n3)
THE CHART• The Earley algorithm uses a table (the CHART) of size N+1,
where N is the length of the input– Table entries sit in the `gaps’ between words
• Each entry in the chart is a list of – Completed constituents– In-progress constituents– Predicted constituents
• All three types of objects are represented in the same way as STATES
THE CHART: GRAPHICAL REPRESENTATION
STATES
• A state encodes two types of information:– How much of a certain rule has been encountered
in the input– Which positions are covered– A , [X,Y]
• DOTTED RULES– VP V NP – NP Det Nominal– S VP
EXAMPLES
SUCCESS
• The parser has succeeded if entry N+1 of the chart contains the state– S , [0,N]
THE ALGORITHM
• The algorithm loops through the input without backtracking, at each step performing three operations:– PREDICTOR: add predictions to the chart– COMPLETER: Move the dot to the right when
looked-for constituent is found– SCANNER: read in the next input word
THE ALGORITHM: CENTRAL LOOP
EARLEY ALGORITHM: THE THREE OPERATORS
EXAMPLE, AGAIN
EXAMPLE: BOOK THAT FLIGHT
EXAMPLE: BOOK THAT FLIGHT (II)
EXAMPLE: BOOK THAT FLIGHT (III)
EXAMPLE: BOOK THAT FLIGHT (IV)
CHART PARSING IN NLTK
• 8.4, p. 307 ff
DEPENDENCY PARSING
• 8.5
READINGS
• Bird et al, chapter 8