README.md · pawan2411/address-emnet at main

metadata

base_model: pawan2411/address_net
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:4008
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: Orchard Road 313, Singapore 238895
    sentences:
      - Orchard Rd 313, Singapore 238895
      - 15 Rue de la Paix/75002/France
      - NY, 5th Avenue and 57th Street
  - source_sentence: 1 Raffles Place, One Raffles Place, Singapore 048616
    sentences:
      - 1 Raffles Place, Singapore 048616
      - Madrid 28001 Spain Calle Serrano 30
      - Kurfürstendamm 185/10707 Berlin/Germany
  - source_sentence: Kurfürstendamm 207-208, 10719 Berlin, Germany
    sentences:
      - Argentina CABA C1073ABA 1925 Avenida 9 de Julio
      - Kurfürstendamm ๒๐๗-๒๐๘, ๑๐๗๑๙ Berlin, Germany
      - 123 Main St, Anytown, AB T1A 1A1
  - source_sentence: Via Tornabuoni, 50123 Firenze FI, Italy
    sentences:
      - Hamngatan 18-20, Stockholm, Sweden
      - 1 Florida, Argentina
      - Tornabuoni St, 50123 Italy
  - source_sentence: Nanjing Road Pedestrian Street, Huangpu, Shanghai 200001, China
    sentences:
      - Nanjing Rd Ped St, Huangpu Dist, Shanghai, China
      - 5 Rue du Faubourg Saint-Honoré, Paris, France
      - 6 Place d'Italie, Paris

Address Embedding Model

This model generates embeddings for addresses, designed to facilitate address matching, deduplication, and standardization tasks.

Model description

The Address Matching Embedding Model is designed to create vector representations of addresses that capture semantic similarities, making it easier to match and deduplicate addresses across different formats and styles.

Model Type: Sentence Transformer
Base model: pawan2411/address_net
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pawan2411/address_emnet")
# Run inference
sentences = [
    '60 Ratchadaphisek Rd, Khwaeng Khlong Toei, Khet Khlong Toei, Krung Thep Maha Nakhon 10110',
    '60 Ratchadaphisek Road, Krung Thep Maha Nakhon, Thailand',
    '61 Ratchadaphisek Road, Krung Thep Maha Nakhon, Thailand'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}