CLFE(ConMath)

This is a formula embedding model trained on Latex, Presentation MathML and Content MathML of formulas: It maps formulas to a 768 dimensional dense vector space. It was introduced in https://link.springer.com/chapter/10.1007/978-981-99-7254-8_8

Usage

pip install -U sentence-transformers

Put 'MarkuplmTransformerForConMATH.py' into 'sentence_transfomers/models', and add 'from .MarkuplmTransformerForConMATH import MarkuplmTransformerForConMATH' into 'sentence_transfomers/models/_init_'

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
latex = r"13\times x"
pmml = r"<math><semantics><mrow><mn>13</mn><mo>×</mo><mi>x</mi></mrow></semantics></math>"
cmml = r"<math><apply><times></times><cn>13</cn><ci>x</ci></apply></math>"

model = SentenceTransformer('Jyiyiyiyi/CLFE_ConMath')

embedding_latex = model.encode([{'latex': latex}])
embedding_pmml = model.encode([{'mathml': pmml}])
embedding_cmml = model.encode([{'mathml': cmml}])

print('latex embedding:')
print(embedding_latex)
print('Presentation MathML embedding:')
print(embedding_pmml)
print('Content MathML embedding:')
print(embedding_cmml)

Full Model Architecture

SentenceTransformer(
  (0): Asym(
    (latex-0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
    (mathml-0): MarkuplmTransformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MarkupLMModel 
  )
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)

Citing & Authors

@inproceedings{wang2023math,
  title={Math Information Retrieval with Contrastive Learning of Formula Embeddings},
  author={Wang, Jingyi and Tian, Xuedong},
  booktitle={International Conference on Web Information Systems Engineering},
  pages={97--107},
  year={2023},
  organization={Springer}
}