Bui Van Hop's picture

90 83

Bui Van Hop

hllj

·

https://hllj.github.io/

hopbui3
hllj

AI & ML interests

Computer Vision, Deep Learning, NLP

Organizations

hllj's activity

liked 2 datasets 2 months ago

ByteDance/MTVQA

Viewer • Updated May 30 • 8.79k • 294 • 23

Infi-MM/InfiMM-WebMath-40B

Viewer • Updated Sep 24 • 22.8M • 2.24k • 50

updated a collection 3 months ago

Code LLMs

1 item • Updated Sep 10

updated a dataset 3 months ago

hllj/synthetic-text-embedding

Viewer • Updated Aug 24 • 10k • 58

liked 2 datasets 4 months ago

mlfoundations/VisIT-Bench

Viewer • Updated Jan 23 • 574 • 508 • 15

argilla/multi-modal-vlm-visit-bench

Viewer • Updated Aug 7 • 575 • 61 • 4

upvoted a collection 4 months ago

Research projects on top of vLLM

Papers cited in https://blog.vllm.ai/2024/07/25/lfai-perf.html • 6 items • Updated Jul 29 • 12

upvoted an article 4 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29

• 248

upvoted a collection 4 months ago

Vista

A family of Vietnamese Vision Language Model • 4 items • Updated Jun 30 • 2

liked 2 datasets 4 months ago

minhquan6203/ViTextVQA

Updated Jul 22 • 90 • 12

5CD-AI/Viet-Doc-VQA-II

Viewer • Updated Aug 25 • 64.8k • 56 • 30

liked a model 4 months ago

tuanio/ft-moellava-qwen1.5-1.8b-vista-lora-2ep

Text Classification • Updated Jul 26 • 3 • 2

Reacted to kenshinn's post with ❤️ 4 months ago

Post

2023

Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a

updated a collection 4 months ago

Vision-Language Model

16 items • Updated Jul 18 • 1

upvoted a paper 4 months ago

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2 • 21

updated a collection 4 months ago

Retrieval Augmented Generation

3 items • Updated Jul 18

upvoted a paper 4 months ago

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1 • 86

updated a collection 4 months ago

Vision-Language Model

16 items • Updated Jul 18 • 1