Marcus PRO

createtheimaginable
·

AI & ML interests

Machine Learning, Deep Learning, PyTorch

Recent Activity

Organizations

createtheimaginable's activity

Reacted to qq8933's post with 👍🔥 20 days ago
view post
Post
5625
LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF?
Just a little bite of strawberry!🍓

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
  • 2 replies
·
Reacted to vikhyatk's post with 🔥👍 24 days ago
view post
Post
1521
Just released a dataset with 7000+ hours of synthetically generated lo-fi music. vikhyatk/lofi
Reacted to not-lain's post with 🤗 24 days ago
view post
Post
7677
I am now a huggingface fellow 🥳
·
upvoted an article 2 months ago
Reacted to singhsidhukuldeep's post with 👀 2 months ago
view post
Post
1147
1 hour of OpenAi o1, here are my thoughts...

Here are my few observations:

- Slower response times: o1 can take over 10+ seconds to answer some questions, as it spends more time "thinking" through problems. In my case, it took over 50 seconds.

- Less likely to admit ignorance: The models are reported to be less likely to admit when they don't know the answer to a question.

- Higher pricing: o1-preview is significantly more expensive than GPT-4o, costing 3x more for input tokens and 4x more for output tokens in the API. With more thinking and more tokens, this could require houses to be mortgaged!

- Do we need this?: While it's better than GPT-4o for complex reasoning, on many common business tasks, its performance is just equivalent.

- Not a big deal: No comparisons to Anthropic or Google DeepMind Gemini are mentioned or included.

- This model tries to think and iterate over the response on its own! Think of it as an inbuilt CoT on steroids! Would love a technical review paper on the training process.

A must-read paper: https://cdn.openai.com/o1-system-card.pdf
Reacted to singhsidhukuldeep's post with 👀 3 months ago
view post
Post
3459
This is an absolutely mind-boggling experiment!

@GuangyuRobert (Twitter Handle) from MIT has created Project Sid, which simulates over 1,000 autonomous AI agents collaborating in a Minecraft environment, operating for extended periods without human intervention. This simulation demonstrates unprecedented levels of agent interaction, decision-making, and societal development.

Agents operate independently for hours or days, showcasing advanced decision-making algorithms and goal-oriented behavior.

The simulation produced complex, emergent phenomena, including:
- Economic systems with currency (gems) and trading
- Cultural development and religious practices
- Agents even understood bribing. Priests were moving the most gems to bribe people into following them!
- Governmental structures and democratic processes

Project Sid addresses fundamental challenges in AI research:
- Coherence: Maintaining consistent agent behavior over extended periods.
- Multi-agent Collaboration: Enabling effective communication and coordination among numerous AI entities.
- Long-term Progression: Developing agents capable of learning and evolving over time.

While Minecraft serves as the initial testbed, the underlying AI architecture is designed to be game-agnostic, suggesting potential applications in various digital environments and real-world simulations.

Imagine a policy being debated by the government and how it might affect society; Sid can simulate its impact!

Even if this remains just a game experiment, the project successfully manages 1,000+ agents simultaneously, a feat that requires robust distributed computing and efficient agent architecture.
Reacted to singhsidhukuldeep's post with 👀🔥 3 months ago
view post
Post
1853
Google's Chain-of-Thought (CoT) is one of the most effective ways to improve LLMs' reasoning.

Researchers have now developed a novel approach called Strategic Chain-of-Thought (SCoT) to enhance the reasoning capabilities of large language models even further.

🧠 SCoT uses a two-stage process within a single prompt:
- Strategy Elicitation: The model first identifies and determines an effective problem-solving strategy for the given task. This becomes the strategic knowledge that guides the reasoning process.
- Strategy Application: The model then applies the identified strategic knowledge to solve the problem and generate the final answer.

Essentially, SCoT integrates strategic knowledge to guide reasoning without relying on external knowledge sources or multiple queries.

According to the research, SCoT showed significant improvements over standard CoT across various datasets, including a 21.05% increase on the GSM8K math dataset and a 24.13% increase on the Tracking_Objects spatial reasoning task.

Changes in the Prompt Structure:
The SCoT prompt typically consists of five components:
- Role: Defines the expert role the model should assume.
- Workflow: Outlines the steps for strategy identification and application.
- Rules: Specifies guidelines for generating answers.
- Initialization: Sets up the task.
- Task Input: Provides the specific problem to solve.

Strategy Generation:
The model is prompted to generate strategic knowledge relevant to the problem domain. For example, in mathematics, it might favor elegant solutions like using arithmetic series formulas over brute-force calculations.

Guided Reasoning:
Using the elicited strategy, the model then generates a chain-of-thought reasoning path. This approach aims to produce more stable and higher-quality outputs compared to standard chain-of-thought methods.

Read the full paper: https://arxiv.org/abs/2409.03271
  • 1 reply
·