[FEEDBACK] Daily Papers
![](https://cdn-uploads.huggingface.co/production/uploads/62d6c40ba71903fb4c1e3261/ZtRaJoI4rnz7y-1Iz81am.png)
Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.
How to submit a paper to the Daily Papers, like @akhaliq (AK)?
- Submitting is available to paper authors
- Only recent papers (less than 7d) can be featured on the Daily
Then drop the arxiv id in the form at https://huggingface.co/papers/submit
- Add medias to the paper (images, videos) when relevant
- You can start the discussion to engage with the community
We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".
Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset
@Yiwen-ntu for now we support only videos as paper covers in the Daily.
we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644
Consistency-diversity realism Pareto fronts of conditional image generative models -- http://arxiv.org/abs/2406.10429
"Data Contamination Can Cross Language Barriers". -- https://arxiv.org/pdf/2406.13236
How do I add papers that are on Nature rather than arXiv?
Share our latest paper: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation (https://arxiv.org/abs/2406.05365)
Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas https://arxiv.org/abs/2404.13944
๐ We are thrilled to announce the publication of my first research paper on model merging, Della-Merging. Della employs a magnitude-based sampling approach to eliminate redundant delta parameters, reducing interference when merging homologous models (those fine-tuned from the same backbone).
Paper: https://arxiv.org/abs/2406.11617
Github: https://github.com/declare-lab/della
Della outperforms existing homologous model merging techniques such as DARE and TIES. Across three expert models (LM, Math, Code) and their corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), Della achieves an improvement of 3.6 points over TIES and 1.2 points over DARE.
LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and extracting information from long videos up to two hours in duration. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks.
Paper: https://arxiv.org/abs/2406.08035
Github: https://github.com/THUDM/LVBench
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
Paper: https://arxiv.org/abs/2406.05872
Code: https://github.com/IBM/starling-agent
SIT: Fine-tuning Large Language Models with Sequential Instructions
Paper: https://arxiv.org/pdf/2403.07794
Data and model: https://seqit.github.io
Code: https://github.com/hanxuhu/SeqIns
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
paper: https://arxiv.org/pdf/2406.17419
code: https://github.com/MozerWang/Loong
TRAIT: Task Oriented In-Domain Data Augmentation (for Continual Pre-training of LLMs), https://arxiv.org/abs/2406.16694
Slot State Space Models
We are excited to share our recent work: "Adam-mini: Use Fewer Learning Rates To Gain Moreโ https://arxiv.org/abs/2406.16793
We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Code available at: https://github.com/zyushun/Adam-mini
We have developed a new text-to-video generation benchmark for metamorphic evaluation. We specifically design four major categories for time.lapse videos (as shown below), including biological, human-created, meteorological, and physical videos.and extend these to 75 subcategories.
paper: https://arxiv.org/abs/2406.18522
leaderboard: https://huggingface.co/spaces/BestWishYsh/ChronoMagic-Bench
code: https://github.com/PKU-YuanGroup/ChronoMagic-Bench
KV cache optimization for LLMs and MLLMs:
- LLMs: D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models, arxiv: https://arxiv.org/abs/2406.13035
- MLLMs: look-m: look-once optimization in kv cache for efficient multimodal long-context inference, arxiv: https://arxiv.org/html/2406.18139v1
We developed a generic schematic for the optimization loop to reduce the memory footprint of second order, full-matrix adaptive optimizers.๐พ
Our target optimizers are the ones that store a window of past gradients, such as M-FAC and GGT, which usually require storing around 500 to 1000 gradients (equivalent to this many model copies in the GPU memory).
Our technique uses sparse/low-rank gradients and Error Feedback and shows we can reduce the memory footprint of optimizers' state by 30x for GGT and 45x to 60x for M-FAC. ๐
Why is this important? ๐ค
In the case of M-FAC, which is an approximation of Natural Gradient (NG) (most commonly known optimizer about this is K-FAC), our work allows using approximations of NG at larger scale, such as ResNet-18 / ImageNet and BERT-Base finetuning.
Please experiment with this NG approximation and let us know about your findings!
๐ Our arxiv paper: https://arxiv.org/pdf/2306.06098
๐ป Our code on GitHub: https://github.com/IST-DASLab/EFCP
was waiting for papers to get verified on my account so i could submit but then the 7 day window closed :( any chance we can still submit ours?
Hi AK and HF team,
I would appreciate your considering my recent ArXiv paper "Model Callers for Transforming Predictive and Generative AI Applications" for inclusion in the HF daily papers. I could not submit directly to your site, since I don't already have a paper in HF DPs.
Paper: https://arxiv.org/abs/2406.15377
Github code: https://github.com/mukdal/modelcaller
Python library: pip install modelcaller
Abstract: We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simplified AI development and management processes, and improved collaboration and accountability across AI/ML/Data Science, software, data, and operations teams. Model callers are valuable for both creators and users of models within both predictive and generative AI applications. Additionally, we have developed and released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.
Thanks,
Mukesh Dalal
Hello AK and HF Team,
We would to add our recent paper "Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification" in HF daily papers. I am putting this request here, since I don't already have a paper in HF daily papers.
Paper: https://arxiv.org/pdf/2407.02352
Authors: Pritish Sahu, Karan Sikka, Ajay Divakaran
Thanks,
Pritish Sahu
Hello AK and HF Team,
We would like to share our 2022 paper now recently published in Automation in Construction, Science, Elsevier, "Vitruvio: Conditional variational autoencoder to generate building meshes via single perspective sketches"
๐ Paper: https://www.sciencedirect.com/science/article/pii/S0926580524002346?dgcid=author (50days free access).
๐ Our arxiv paper: https://arxiv.org/abs/2210.13634
We demonstrated the critical importance of considering building orientation in reconstruction projects. Additionally, we have provided a comprehensive baseline and dataset specifically for building reconstruction. Help us spread the word within the AEC industry to raise awareness about these advancements. Watch our video presenting the problem and our findings: VIDEO .
Code: https://github.com/CDInstitute/Vitruvio
Feel free to use this message on your social media, blog, or any platform where you wish to share your research and video.
Alberto Tono , Heyaojing Huang , Ashwin Agrawal, and Martin Fischer