Training and Domain Adaptation for Supervised Text Segmentation
- Author(s):
- Somasundaran, Swapna; Glavas, Goran
- Patent Issued:
- Jan 21, 2025
- Patent Number:
- 12,204,856
- Source:
- ETS Patent
- Document Type:
- Patent
- Subject/Key Words:
- Patent, Active Patent, Natural Language Processing
Abstract
Data such as unstructured text is received that includes a sequence of sentences. This received data is then tokenized into a plurality of tokens. The received data is segmented using a hierarchical transformer network model including a token transformer, a sentence transformer, and a segmentation classifier. The token transformer contextualizes tokens within sentences and yields sentence embeddings. The sentences transformer contextualizes sentence representations based on the sentence embeddings. The segmentation classifier predicts segments of the received data based on the contextualized sentence representations. Data can be provided which characterizes the segmentation of the received data. Related apparatus, systems, techniques and articles are also described.