Edit model card

Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNoun

A Hierarchy Transformer Encoder (HiT) model that explicitly encodes entities according to their hierarchical relationships.

Model Description

HiT-MiniLM-L6-WordNet is a HiT model trained on WordNet's subsumption (hypernym) hierarchy of noun entities.

  • Developed by: Yuan He, Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks
  • Model type: Hierarchy Transformer Encoder (HiT)
  • License: Apache license 2.0
  • Hierarchy: WordNet's subsumption (hypernym) hierarchy of noun entities.
  • Training Dataset: Download wordnet-mixed.zip from Datasets for HiTs on Zenodo
  • Pre-trained model: sentence-transformers/all-MiniLM-L6-v2
  • Training Objectives: Jointly optimised on Hyperbolic Clustering and Hyperbolic Centripetal losses (see definitions in the paper)

Model Versions

Version Model Revision Note
v1.0 (Random Negatives) main or v1-random-negatives The variant trained on random negatives, as detailed in the paper.
v1.0 (Hard Negatives) v1-hard-negatives The variant trained on hard negatives, as detailed in the paper.

Model Sources

Usage

HiT models are used to encode entities (presented as texts) and predict their hierarhical relationships in hyperbolic space.

Get Started

Install hierarchy_transformers (check our repository) through pip or GitHub.

Use the code below to get started with the model.

from hierarchy_transformers import HierarchyTransformer

# load the model
model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun')

# entity names to be encoded.
entity_names = ["computer", "personal computer", "fruit", "berry"]

# get the entity embeddings
entity_embeddings = model.encode(entity_names)

Default Probing for Subsumption Prediction

Use the entity embeddings to predict the subsumption relationships between them.

# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)

# compute the hyperbolic distances and norms of entity embeddings
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
child_norms = model.manifold.dist0(child_entity_embeddings)
parent_norms = model.manifold.dist0(parent_entity_embeddings)

# use the empirical function for subsumption prediction proposed in the paper
# `centri_score_weight` and the overall threshold are determined on the validation set
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))

Training and evaluation scripts are available at GitHub. See scripts/evaluate.py for how we determine the hyperparameters on the validation set for subsumption prediction.

Technical details are presented in the paper.

Full Model Architecture

HierarchyTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)

Citation

Preprint on arxiv: https://arxiv.org/abs/2401.11374.

Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks. Language Models as Hierarchy Encoders. To Appear at NeurIPS 2024.

@article{he2024language,
  title={Language Models as Hierarchy Encoders},
  author={He, Yuan and Yuan, Zhangdie and Chen, Jiaoyan and Horrocks, Ian},
  journal={arXiv preprint arXiv:2401.11374},
  year={2024}
}

Model Card Contact

For any queries or feedback, please contact Yuan He (yuan.he(at)cs.ox.ac.uk).

Downloads last month
16
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support hierarchy-transformers models for this pipeline type.

Model tree for Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNoun

Finetuned
(168)
this model