A grid for deriving the most likely hidden values from a series of observations. For our purpose, the observations are Bopomofo readings, and the hidden values are the actual Mandarin words. This can also be used for segmentation: in that case, the observations are Mandarin words, and the hidden values are the most likely groupings.

While we use the terminology from hidden Markov model (HMM), the actual implementation is a much simpler Bayesian inference, since the underlying language model consists of only unigrams. Once we have put all plausible unigrams as nodes on the grid, a simple DAG shortest-path walk will give us the maximum likelihood estimation (MLE) for the hidden values.

Constructors

Properties

kDefaultSeparator: string = "-"
kMaximumSpanLength: number = 6

Accessors

Methods

  • Returns all candidate values at the location. If spans are not empty and loc is at the end of the spans, (loc - 1) is used, so that the caller does not have to care about this boundary condition.

    Parameters

    • loc: number

    Returns Candidate[]

  • Delete the reading after the cursor, like Del. Cursor is unmoved.

    Returns boolean

  • Delete the reading before the cursor, like Backspace. Cursor will decrement by one.

    Returns boolean

  • Adds weight to the node with the unigram that has the designated candidate value and applies the desired override type, essentially resulting in user override. An overridden node would influence the grid walk to favor walking through it.

    Parameters

    Returns boolean

  • Same as the method above, but since the string candidate value is used, if there are multiple nodes (of different spanning length) that have the same unigram value, it's not guaranteed which node will be selected.

    Parameters

    • loc: number
    • candidate: string
    • overrideType: OverrideType = OverrideType.kOverrideValueWithHighScore

    Returns boolean

  • Find the weightiest path in the grid graph. The path represents the most likely hidden chain of events from the observations. We use the DAG-SHORTEST-PATHS algorithm in Cormen et al. 2001 to compute such path. Instead of computing the path with the shortest distance, though, we compute the path with the longest distance (so the weightiest), since with log probability a larger value means a larger probability. The algorithm runs in O(|V| + |E|) time for G = (V, E) where G is a DAG. This means the walk is fairly economical even when the grid is large.

    Returns WalkResult