Hugging face presents FineVideo 🎥! Unlocking the next generation of Video understanding 🚀
🤯3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs. 🔥 @mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.
The cleaning process consists of: - Joining the separate splits together / add split column - Converting string messages into list of structs - Removing empty system prompts
I wanted to introduce myself and my company @Overlaiapp. We are a collective of filmmakers, photographers, and AI engineers working on high resolution (8K+) training data.
We plan to share a lot of our datasets with the community and are kicking things off with two curated datasets:
🎥 Oversampled: Every clip is captured in stunning 8K resolution, delivering rich detail ideal for fine tuning scenic landscapes and ocean dynamics.
📸 Variance: Includes close-up details, slow-motion footage of crashing waves, sweeping landscapes, and wildlife shots.
📋 Detailed Metadata: Every clip is paired with structured metadata, including creative descriptions, precise camera movements, lens information, field of view calculations, and shot settings, ensuring AI models can fully understand and replicate real-world cinematography with accuracy.
⚙️ Consistency: Re-thinking training data at the point of capture by "overshooting" a subject, enabling models to learn more nuanced relationships and views across scenes.
🌅 Light: Shot during early morning and sunset light for optimal color contrast and dynamic range, maximizing visual quality for color and lighting-sensitive tasks.
🔍 Curation: Curated specifically for machine learning, providing clean, high-quality data for next generation model training.
Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications.
💡 Quick reminder on how floats are coded on 8 bits (FP8): In the e4m3 FP8 standard, you encode a number as: Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits) Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625) Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3):
➡️ You get (1 + 0.625) × 2^(8-7) = 3.25
Now back to the paper. 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
⚡️ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)!
🧮 Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) · 2^xe · (1 + ym) · 2^ye = (1 + xm + ym + xm · ym) · 2^(xe+ye)
💡 L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) · 2^(xe+ye), eliminating the costly xm · ym term
🔧 l(m) term is adaptively set based on mantissa size for optimal accuracy
📊 Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision
💬 Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%."
This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but it’s a really exciting path for more sustainable AI! 🌱
🔗 Evaluating Long Context #1: Long Range Arena (LRA)
Accurately evaluating how well language models handle long contexts is crucial, but it's also quite challenging to do well. In this series of posts, we're going to examine the various benchmarks that were proposed to assess long context understanding, starting with Long Range Arens (LRA)
Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation.
📌 Key Features of LRA
1️⃣ Diverse Tasks: The LRA benchmark consists of a suite of tasks designed to evaluate model performance on long sequences ranging from 1,000 to 16,000 tokens. These tasks encompass different data types and modalities: Text, Natural and Synthetic Images, and Mathematical Expressions.
2️⃣ Synthetic and Real-world Tasks: LRA is comprised of both synthetic probing tasks and real-world tasks.
3️⃣ Open-Source and Extensible: Implemented in Python using Jax and Flax, the LRA benchmark code is publicly available, making it easy to extend.
📌 Tasks
1️⃣ Long ListOps
2️⃣ Byte-level Text Classification and Document Retrieval
3️⃣ Image Classification
4️⃣ Pathfinder and Pathfinder-X (Long-range spatial dependency)
We're thrilled to announce the release of Argilla 2.2.0, packed with powerful new features to enhance your data annotation and LLM workflow:
🗨️ ChatField: Work with text conversations natively in Argilla. Perfect for building datasets for conversational LLMs! ⚙️ Adjustable Task Distribution: Modify settings on the fly and automatically recalculate completed and pending records. 📊 Progress Tracking: Monitor annotation progress directly from the SDK, including user-specific metrics. 🧠 Automatic Settings Inference: Importing datasets from Hugging Face Hub just got easier with automatic settings detection. 📋 Task Templates: Jump-start your projects with pre-built templates for common dataset types. 🔧 Background Jobs Support: Improved performance for long-running tasks (requires Redis).