Study Overview

In this study, we employ the Microsoft DeBERTa v3 model, which introduces an additional embedding for positional indexing, enhancing the model used in Qiu et al. (2022). To date, no model has empirically validated the impact of the positional index on regression or classification tasks. For training and fine-tuning, we developed a custom trainer to evaluate Mean-Squared Error (MSE) as outlined in Qiu et al. (2022). However, the implementation details of the sigmoid activation function with a threshold of ([-1, 1])—where (-1) indicates a personal stance not aligned with the value in question, (1) indicates alignment, and (0) denotes neutrality (irrelevance)—were not clearly specified.

To model this threshold effectively, we opted for the tanh activation function, which provides a more appropriate representation. Consequently, we implemented an MSE loss function with tanh activation, followed by rounding to the nearest integer for evaluation purposes.

Utilizing this approach, we demonstrated improvements in regression tasks for evaluating stances on each test scenario. While the overall MSE did not show significant improvement, we observed higher accuracy, recall, and precision for the regression tasks. It is important to note that the classification task specified in Qiu et al. (2022) solely determines the presence or absence of the value in question, without considering the specific stance presented in the text. Therefore, our regression task, which assesses the particular stance, should not be directly compared with the classification task from Qiu et al. (2022).

Usage


import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
model_path = 'nharrel/Valuesnet_DeBERTa_v3'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.eval()

# Define maximum length for padding and truncation
max_length = 128

def custom_round(x):
    if x >= 0.50:
        return 1
    elif x < -0.50:
        return -1
    else:
        return 0

def predict(text):
    inputs = tokenizer(text, padding='max_length', truncation=True, max_length=max_length, return_tensors='pt')
    with torch.no_grad():
        outputs = model(**inputs)

    prediction = torch.tanh(outputs.logits).cpu().numpy()
    rounded_prediction = custom_round(prediction)
    return rounded_prediction

def test_sentence(sentence):
    prediction = predict(sentence)
    label_map = {-1: 'Against', 0: 'Not Present', 1: 'Supports'}
    predicted_label = label_map.get(prediction, 'unknown')
    print(f"Sentence: {sentence}")
    print(f"Predicted Label: {predicted_label}")

# Define Schwartz's 10 values
schwartz_values = [
    "BENEVOLENCE", "UNIVERSALISM", "SELF-DIRECTION", "STIMULATION", "HEDONISM",
    "ACHIEVEMENT", "POWER", "SECURITY", "CONFORMITY", "TRADITION"
]

for value in schwartz_values:
    print("Values stance is: " + value)
    test_sentence(f"[{value}] You are a very pleasant person to be around.")

Results from Qiu et al. (2022)

Our Results with same dataset using DeBERTa v3

main

augmented

balanced

Classification Tasks

Interpretation

We can only compare our classification task with the BART model that has the highest classification. This model only classifies whether the value is present or not. 1 for present and 0 for not. Qiu et al (2022) used BART to perform this classification with the highest accuracy using the main dataset with 67%. Using DeBERTa v3, we were able to get an accuracy of 73% (0.7283). DeBERTa's disentanglement feature allows for a significant improvement in classifying human values.

We can also see a very noticable improvement with the regression tasks. This is a more difficult task, because the model must determine if the value in question is either present or not; then determine if the agent's perspective is either supporting or against the value's stance. However, we can see that DeBerta v3 outperforms BERT by 4% (65% vs 61%). I simply just replicated Qiu et al (2022) and have not tried to improve their design.

Future Work

I am currently working to develop and ensemble model that will leverage text generation to create multiple stance positions for each values. We hypothesize that if the model can differentiate between different stance positions on the same topic associated with the target value, the model can more accurately predict an agents values stance.

Acknowledgements

We would like to acknowledge the authors of the ValueNet dataset for their valuable contribution to this work.

Please give them credit if you use this model, because this model would not be possible without their work.

@article{Qiu_Zhao_Li_Lu_Peng_Gao_Zhu_2022, 
    title={ValueNet: A New Dataset for Human Value Driven Dialogue System}, 
    volume={36}, 
    url={https://ojs.aaai.org/index.php/AAAI/article/view/21368}, 
    DOI={10.1609/aaai.v36i10.21368}, 
    abstractNote={Building a socially intelligent agent involves many challenges, one of which is to teach the agent to speak guided by its value like a human. However, value-driven chatbots are still understudied in the area of dialogue systems. Most existing datasets focus on commonsense reasoning or social norm modeling. In this work, we present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios. The dataset is organized in ten dimensions that conform to the basic human value theory in intercultural research. We further develop a Transformer-based value regression model on ValueNet to learn the utility distribution. Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks. For example, by teaching a generative agent with reinforcement learning and the rewards from the value model, our method attains state-of-the-art performance on the personalized dialog generation dataset: Persona-Chat. With values as additional features, existing emotion recognition models enable capturing rich human emotions in the context, which further improves the empathetic response generation performance in the EmpatheticDialogues dataset. To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems. The dataset is available at https://liang-qiu.github.io/ValueNet/.}, 
    number={10}, 
    journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
    author={Qiu, Liang and Zhao, Yizhou and Li, Jinchao and Lu, Pan and Peng, Baolin and Gao, Jianfeng and Zhu, Song-Chun}, 
    year={2022}, 
    month={Jun.}, 
    pages={11183-11191}
}


If you like my model; please give me credit:

@misc {nick_h_2024,
    author       = { {Nicholas Harrell} },
    title        = { Valuesnet_DeBERTa_v3 (Revision 7214723) },
    year         = 2024,
    url          = { https://huggingface.co/nharrel/Valuesnet_DeBERTa_v3 },
    doi          = { 10.57967/hf/2873 },
    publisher    = { Hugging Face }
}