s

Tom-Neverwinter

AI & ML interests

Making improvements to help the world.

Recent Activity

Reacted to tomaarsen's post with ā¤ļø about 1 month ago
šŸ“£ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost. 1ļøāƒ£ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference. 2ļøāƒ£ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU. Usage is as simple as `SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")`. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later šŸ˜‰ šŸ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways: 1ļøāƒ£ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with `from_model2vec` or with `from_distillation` where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed. 2ļøāƒ£ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0 Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
Reacted to tomaarsen's post with šŸš€ about 1 month ago
šŸ“£ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost. 1ļøāƒ£ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference. 2ļøāƒ£ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU. Usage is as simple as `SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")`. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later šŸ˜‰ šŸ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways: 1ļøāƒ£ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with `from_model2vec` or with `from_distillation` where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed. 2ļøāƒ£ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0 Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
Reacted to tomaarsen's post with šŸ”„ about 1 month ago
šŸ“£ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost. 1ļøāƒ£ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference. 2ļøāƒ£ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU. Usage is as simple as `SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")`. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later šŸ˜‰ šŸ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways: 1ļøāƒ£ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with `from_model2vec` or with `from_distillation` where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed. 2ļøāƒ£ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0 Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
View all activity

Organizations

None yet

Tom-Neverwinter's activity

New activity in multimodalart/flux-lora-the-explorer 3 months ago

how to make a lora

3
#2 opened 3 months ago by guardiancc
New activity in Xenova/whisper-speaker-diarization 4 months ago

how do we run this?

2
#2 opened 4 months ago by Tom-Neverwinter
New activity in open-llm-leaderboard/open_llm_leaderboard 5 months ago

Evil.

4
#1 opened 5 months ago by Reithan
New activity in TheDrummer/Llama-3SOME-8B-v2-GGUF 5 months ago
New activity in LoneStriker/DeepSeek-Coder-V2-Instruct-GGUF 5 months ago

How good is the gguf?

3
#3 opened 5 months ago by Tom-Neverwinter
New activity in ddh0/Phi-3-mini-4k-instruct-bf16-GGUF 7 months ago

censored

1
#1 opened 7 months ago by Tom-Neverwinter
New activity in LiteLLMs/Mixtral-8x22B-v0.1-GGUF 7 months ago

ram usage

1
#1 opened 7 months ago by Tom-Neverwinter
New activity in bartowski/Mixtral-8x22B-v0.1-GGUF 8 months ago

resources to run

#3 opened 8 months ago by Tom-Neverwinter
New activity in bartowski/Beyonder-4x7B-v3-exl2 8 months ago

3.0 bpw?

16
#1 opened 8 months ago by CulturedMan
New activity in ai21labs/Jamba-v0.1 8 months ago

multiple gpu?

3
#3 opened 8 months ago by bdambrosio

Missing config.json

8
#2 opened 10 months ago by Cayleb
New activity in vikhyatk/moondream2 8 months ago
New activity in hpcai-tech/Open-Sora 8 months ago

safetensors?

1
#5 opened 8 months ago by Tom-Neverwinter
New activity in google/gemma-7b 8 months ago
New activity in ShinojiResearch/Senku-70B-Full 9 months ago

Hardware

8
#5 opened 9 months ago by Tom-Neverwinter