Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNoun
A Hierarchy Transformer Encoder (HiT) model that explicitly encodes entities according to their hierarchical relationships.
Model Description
HiT-MiniLM-L6-WordNet is a HiT model trained on WordNet's subsumption (hypernym) hierarchy of noun entities.
- Developed by: Yuan He, Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks
- Model type: Hierarchy Transformer Encoder (HiT)
- License: Apache license 2.0
- Hierarchy: WordNet's subsumption (hypernym) hierarchy of noun entities.
- Training Dataset: Download
wordnet-mixed.zip
from Datasets for HiTs on Zenodo - Pre-trained model: sentence-transformers/all-MiniLM-L6-v2
- Training Objectives: Jointly optimised on Hyperbolic Clustering and Hyperbolic Centripetal losses (see definitions in the paper)
Model Versions
Version | Model Revision | Note |
---|---|---|
v1.0 (Random Negatives) | main or v1-random-negatives |
The variant trained on random negatives, as detailed in the paper. |
v1.0 (Hard Negatives) | v1-hard-negatives |
The variant trained on hard negatives, as detailed in the paper. |
Model Sources
- Repository: https://github.com/KRR-Oxford/HierarchyTransformers
- Paper: Language Models as Hierarchy Encoders
Usage
HiT models are used to encode entities (presented as texts) and predict their hierarhical relationships in hyperbolic space.
Get Started
Install hierarchy_transformers
(check our repository) through pip
or GitHub
.
Use the code below to get started with the model.
from hierarchy_transformers import HierarchyTransformer
# load the model
model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun')
# entity names to be encoded.
entity_names = ["computer", "personal computer", "fruit", "berry"]
# get the entity embeddings
entity_embeddings = model.encode(entity_names)
Default Probing for Subsumption Prediction
Use the entity embeddings to predict the subsumption relationships between them.
# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)
# compute the hyperbolic distances and norms of entity embeddings
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
child_norms = model.manifold.dist0(child_entity_embeddings)
parent_norms = model.manifold.dist0(parent_entity_embeddings)
# use the empirical function for subsumption prediction proposed in the paper
# `centri_score_weight` and the overall threshold are determined on the validation set
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))
Training and evaluation scripts are available at GitHub. See scripts/evaluate.py
for how we determine the hyperparameters on the validation set for subsumption prediction.
Technical details are presented in the paper.
Full Model Architecture
HierarchyTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)
Citation
Preprint on arxiv: https://arxiv.org/abs/2401.11374.
Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks. Language Models as Hierarchy Encoders. To Appear at NeurIPS 2024.
@article{he2024language,
title={Language Models as Hierarchy Encoders},
author={He, Yuan and Yuan, Zhangdie and Chen, Jiaoyan and Horrocks, Ian},
journal={arXiv preprint arXiv:2401.11374},
year={2024}
}
Model Card Contact
For any queries or feedback, please contact Yuan He (yuan.he(at)cs.ox.ac.uk
).
- Downloads last month
- 16
Model tree for Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNoun
Base model
sentence-transformers/all-MiniLM-L6-v2