metadata

language:
  - en
pipeline_tag: text-classification
widget:
  - text: >-
      And it was great to see how our Chinese team very much aware of that and
      of shifting all the resourcing to really tap into these opportunities.
    example_title: Examplary Transformation Sentence
  - text: >-
      But we will continue to recruit even after that because we expect that the
      volumes are going to continue to grow.
    example_title: Examplary Non-Transformation Sentence
  - text: >-
      So and again, we'll be disclosing the current taxes that are there in
      Guyana, along with that revenue adjustment.
    example_title: Examplary Non-Transformation Sentence

TransformationTransformer

TransformationTransformer is a fine-tuned distilroberta model. It is trained and evaluated on 10,000 manually annotated sentences gleaned from the Q&A-section of quarterly earnings conference calls. In particular, it was trained on sentences issued by firm executives to discriminate between setnences that allude to business transformation vis-à-vis those that discuss topics other than business transformations. More details about the training procedure can be found below.

Background

Context on the project.

Usage

The model is intented to be used for sentence classification: It creates a contextual text representation from the input sentence and outputs a probability value. LABEL_1 refers to a sentence that is predicted to contains transformation-related content (vice versa for LABEL_0). The query should consist of a single sentence.

Usage (API)

import json
import requests

API_TOKEN = <TOKEN>

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/simonschoe/call2vec"

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

query({"inputs": "<insert-sentence-here>"})

Usage (transformers)

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("simonschoe/TransformationTransformer")
model = AutoModelForSequenceClassification.from_pretrained("simonschoe/TransformationTransformer")

classifier = pipeline('text-classification',  model=model, tokenizer=tokenizer)
classifier('<insert-sentence-here>')

Model Training

The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been segmented into individual sentences using spacy.

Statistics of Training Data:

Labeled sentences: 10,000
Data distribution: xxx
Inter-coder agreement: xxx

The following code snippets presents the training pipeline: