Systems and Methods for Optimizing Very Large N-Gram Collections for Speed and Memory (Expired)
- Author(s):
- Flor, Michael
- Patent Issued:
- Oct 29, 2013
- Patent Number:
- 8,572,126
- Source:
- ETS Patent
- Document Type:
- Patent
- Family ID:
- 45353531
- Subject/Key Words:
- Patent, Expired Patent, Automated Scoring and Natural Language Processing, Corpus Analysis, Context (Linguistics), Ranking and Selection (Statistics), Automated Response Evaluation, Processing Speed
Abstract
A computer memory stores a data structure representing a ternary search tree (TST) representing multiple word n-grams for a corpus of documents. The data structure includes plural records in a first memory, each record representing a node of the TST and comprising plural fields. At least some n-grams have a sequence of units. The plurality of fields includes one for identifying a given unit of the sequence for a given node, one reserved for storing payload information for the given node, and plural child fields reserved for storing information for a first, second and third child nodes of the given node. The child fields store a null value indicating the absence of the child node or an identifier identifying a memory location of the child node. For at least one record, at least one of the child fields stores an identifier identifying a memory location of a memory different than the first memory.