--- base_model: pawan2411/address_net datasets: [] language: [] library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:4008 - loss:MultipleNegativesRankingLoss widget: - source_sentence: Orchard Road 313, Singapore 238895 sentences: - Orchard Rd 313, Singapore 238895 - 15 Rue de la Paix/75002/France - NY, 5th Avenue and 57th Street - source_sentence: 1 Raffles Place, One Raffles Place, Singapore 048616 sentences: - 1 Raffles Place, Singapore 048616 - Madrid 28001 Spain Calle Serrano 30 - Kurfürstendamm 185/10707 Berlin/Germany - source_sentence: Kurfürstendamm 207-208, 10719 Berlin, Germany sentences: - Argentina CABA C1073ABA 1925 Avenida 9 de Julio - Kurfürstendamm ๒๐๗-๒๐๘, ๑๐๗๑๙ Berlin, Germany - 123 Main St, Anytown, AB T1A 1A1 - source_sentence: Via Tornabuoni, 50123 Firenze FI, Italy sentences: - Hamngatan 18-20, Stockholm, Sweden - 1 Florida, Argentina - Tornabuoni St, 50123 Italy - source_sentence: Nanjing Road Pedestrian Street, Huangpu, Shanghai 200001, China sentences: - Nanjing Rd Ped St, Huangpu Dist, Shanghai, China - 5 Rue du Faubourg Saint-Honoré, Paris, France - 6 Place d'Italie, Paris --- ## Address Embedding Model ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b697a4ce95149b1205f652/X7wZsnDYXkZilbCaa2U8g.png) This model generates embeddings for addresses, designed to facilitate address matching, deduplication, and standardization tasks. ## Model description The Address Matching Embedding Model is designed to create vector representations of addresses that capture semantic similarities, making it easier to match and deduplicate addresses across different formats and styles. - **Model Type:** Sentence Transformer - **Base model:** [pawan2411/address_net](https://huggingface.co/pawan2411/address_net) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("pawan2411/address_emnet") # Run inference sentences = [ '60 Ratchadaphisek Rd, Khwaeng Khlong Toei, Khet Khlong Toei, Krung Thep Maha Nakhon 10110', '60 Ratchadaphisek Road, Krung Thep Maha Nakhon, Thailand', '61 Ratchadaphisek Road, Krung Thep Maha Nakhon, Thailand' ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) ``` ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```