TalkBank Batchalign CHATUtterance
CHATUtterance is a series of Bert-derivative models designed for the task of Utterance Segmentation released by the TalkBank project. This is the Mandarin model, which is trained on the the utterance diarization samples given by CHILDES Mandarin corpora: ZhouAssessment, Zhang Personal Narrative, Li Shared Reading.
Usage
The models can be used directly as a Bert-class token classification model following the instructions from Huggingface. Feel free to inspect this file for a sense of what the classes means. Alternatively, to get the full analysis possible with the model, it is best combined with the TalkBank Batchalign suite of analysis software, available here, using transcribe
mode.
Target labels:
0
: regular form1
: start of utterance/capitalized word2
: end of declarative utterance (end this utterance with a.
)3
: end of interrogative utterance (end this utterance with a?
)4
: end of exclamatory utterance (end this utterance with a!
)5
: break in the utterance; depending on orthography one can insert a,
- Downloads last month
- 48
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.