CLFE(ConMath)
This is a formula embedding model trained on Latex, Presentation MathML and Content MathML of formulas: It maps formulas to a 768 dimensional dense vector space. It was introduced in https://link.springer.com/chapter/10.1007/978-981-99-7254-8_8
Usage
pip install -U sentence-transformers
Put 'MarkuplmTransformerForConMATH.py' into 'sentence_transfomers/models', and add 'from .MarkuplmTransformerForConMATH import MarkuplmTransformerForConMATH' into 'sentence_transfomers/models/_init_'
Then you can use the model like this:
from sentence_transformers import SentenceTransformer
latex = r"13\times x"
pmml = r"<math><semantics><mrow><mn>13</mn><mo>×</mo><mi>x</mi></mrow></semantics></math>"
cmml = r"<math><apply><times></times><cn>13</cn><ci>x</ci></apply></math>"
model = SentenceTransformer('Jyiyiyiyi/CLFE_ConMath')
embedding_latex = model.encode([{'latex': latex}])
embedding_pmml = model.encode([{'mathml': pmml}])
embedding_cmml = model.encode([{'mathml': cmml}])
print('latex embedding:')
print(embedding_latex)
print('Presentation MathML embedding:')
print(embedding_pmml)
print('Content MathML embedding:')
print(embedding_cmml)
Full Model Architecture
SentenceTransformer(
(0): Asym(
(latex-0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(mathml-0): MarkuplmTransformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MarkupLMModel
)
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)
Citing & Authors
@inproceedings{wang2023math,
title={Math Information Retrieval with Contrastive Learning of Formula Embeddings},
author={Wang, Jingyi and Tian, Xuedong},
booktitle={International Conference on Web Information Systems Engineering},
pages={97--107},
year={2023},
organization={Springer}
}