Displaying README.md
title: Automated Problem Solver (Final Assignment) emoji: ๐ค colorFrom: gray colorTo: blue sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false hf_oauth: true # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. hf_oauth_expiration_minutes: 480
๐ค Automated Problem Solver (Final Assignment)
Hello fellow agent builders! This repository contains the final assignment for an automated problem-solving system. It utilizes a multi-agent architecture built with smolagents
, leveraging various specialized tools and large language models (LLMs) accessed via OpenRouter to tackle a diverse range of questions.
The system is designed to:
- Understand & Clarify: Analyze the input question and associated files.
- Delegate: Route the task to the most suitable specialized agent (Web Search, YouTube Interaction, Multimedia Analysis, Code Interpretation).
- Utilize Tools: Employ custom tools for specific actions like YouTube video downloading, Wikipedia searching, speech-to-text transcription, and video audio extraction.
- Reason & Synthesize: Process information gathered by agents and tools to formulate a final answer.
โจ Core Concepts & Architecture
This project employs a hierarchical multi-agent system:
- Chief Problem Solver Agent (Manager): The main orchestrator (
chief_problem_solver_agent
). It receives the initial problem, potentially clarifies it using a dedicated agent, and delegates the task to the appropriate specialized worker agent. It usesmeta-llama/llama-4-maverick:free
by default. - Specialized Agents:
- Clarification Agent: Refines the user's question if needed. Uses a strong reasoning model (
qwen/qwen3-235b-a22b
by default). - YouTube Interaction Agent: Handles questions involving YouTube videos, utilizing relevant tools. Uses
meta-llama/llama-4-maverick:free
by default. - Web Search Manager Agent: Manages web searches using Serper and delegates specific page retrieval/analysis to its sub-agent. Uses
meta-llama/llama-4-scout:free
(high context) by default.- Website Retrieval Agent: Fetches and processes content from specific web pages. Uses a strong reasoning model (
qwen/qwen3-235b-a22b
by default).
- Website Retrieval Agent: Fetches and processes content from specific web pages. Uses a strong reasoning model (
- Multimedia Analysis Agent: Processes images and audio files (using STT tools internally). Uses a multimodal model capable of vision (
meta-llama/llama-4-scout:free
by default). - Code Interpreter Agent: Executes and analyzes provided code snippets. Uses a coding-specialized model (
open-r1/olympiccoder-32b:free
by default).
- Clarification Agent: Refines the user's question if needed. Uses a strong reasoning model (
Why OpenRouter?
Using OpenRouter provides significant advantages:
- Model Flexibility: Easily swap different LLMs for different agents to optimize for cost, performance, or specific capabilities (reasoning, coding, vision).
- Access to Diverse Models: Test and use a wide variety of models, including powerful free-tier options like
qwerky-72b:free
,olympiccoder-32b:free
, or various Llama models. - Simplified API: Access multiple LLM providers through a single API endpoint and key.
You'll need an OpenRouter API key to run this project.
๐ ๏ธ Custom Tools
The system relies on several custom tools to interact with external resources:
YouTubeVideoDownloaderTool
Downloads YouTube videos.
- Test best quality (default):
python cli.py --test-tool YouTubeVideoDownloaderTool --test-input "https://www.youtube.com/watch?v=aqz-KE-bpKQ"
- Test standard quality:
python cli.py --test-tool YouTubeVideoDownloaderTool --test-input "https://www.youtube.com/watch?v=aqz-KE-bpKQ" --test-quality standard
- Test low quality:
python cli.py --test-tool YouTubeVideoDownloaderTool --test-input "https://www.youtube.com/watch?v=aqz-KE-bpKQ" --test-quality low
CustomWikipediaSearchTool
Searches current or historical Wikipedia articles. Requires a User-Agent.
- Test Current Summary (Wikitext - default):
python cli.py --test-tool CustomWikipediaSearchTool \ --test-input "Python (programming language)" \ --user-agent "MyTestAgent/1.0 (myemail@example.com)" \ --content-type summary
- Test Current Full Text (HTML):
python cli.py --test-tool CustomWikipediaSearchTool \ --test-input "Artificial Intelligence" \ --user-agent "MyTestAgent/1.0 (myemail@example.com)" \ --content-type text \ --extract-format HTML
- Test Historical Version (Dec 31, 2022, Wikitext):
python cli.py --test-tool CustomWikipediaSearchTool \ --test-input "Web browser" \ --user-agent "MyTestAgent/1.0 (myemail@example.com)" \ --revision-date "2022-12-31"
- Test Historical Version (June 1, 2021, HTML):
python cli.py --test-tool CustomWikipediaSearchTool \ --test-input "Quantum computing" \ --user-agent "MyTestAgent/1.0 (myemail@example.com)" \ --revision-date "2021-06-01" \ --extract-format HTML
CustomSpeechToTextTool
Transcribes audio files using Hugging Face Transformers (Whisper).
- Example (Default Checkpoint
openai/whisper-base.en
):python cli.py --test-tool CustomSpeechToTextTool --test-input /path/to/your/audio.wav
- Example (Tiny English Model):
python cli.py --test-tool CustomSpeechToTextTool --test-input /path/to/your/audio.mp3 --checkpoint openai/whisper-tiny.en
- Example (Audio URL): (Requires AgentAudio to support URL loading)
python cli.py --test-tool CustomSpeechToTextTool --test-input https://example.com/audio.ogg
VideoAudioExtractorTool
Extracts audio tracks from video files.
- Basic Test (MP3 to same directory):
python cli.py --test-tool VideoAudioExtractorTool --test-input my_test_video.mp4
- Specify Output Directory, Format (WAV):
python cli.py --test-tool VideoAudioExtractorTool --test-input path/to/another_video.mov --output-dir ./extracted_audio --output-format wav
- Specify AAC Format and Bitrate:
python cli.py --test-tool VideoAudioExtractorTool --test-input my_video.mp4 --output-format aac --audio-quality 192k
๐ Getting Started (Local Setup)
Prerequisites:
- Python 3.12+
git
git-lfs
(Install from https://git-lfs.com)
Clone the Repository:
- Initialize Git LFS:
git lfs install
- Clone the space:
# Use an access token with write permissions as the password when prompted # Generate one: https://huggingface.co/settings/tokens git clone https://huggingface.co/spaces/DataDiva88/AutomatedProblemSolver_Final_Assignment
- (Optional) To clone without downloading large LFS files immediately:
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/spaces/DataDiva88/AutomatedProblemSolver_Final_Assignment
git lfs pull
later to fetch the actual file contents if needed.
- Initialize Git LFS:
Install Dependencies:
cd AutomatedProblemSolver_Final_Assignment pip install -r requirements.txt
โ ๏ธ Note: This might download large model files (e.g., for Transformers/Whisper), which can take time and disk space.
Configure Environment Variables: Create a
.env
file in the root directory or set the following environment variables:# --- Hugging Face (Optional, needed for private spaces/LFS upload) --- # HF_TOKEN=hf_YOUR_HUGGINGFACE_TOKEN # SPACE_ID=DataDiva88/AutomatedProblemSolver_Final_Assignment # --- Application Settings --- DEBUG=true GRADIO_DEBUG=true # For Gradio interface debugging LOG_LEVEL=debug # Set log level (debug, info, warning, error) # --- API Keys (REQUIRED) --- # Get from https://openrouter.ai/ LLM_API_KEY=sk-or-v1-YOUR_OPENROUTER_API_KEY LLM_BASE_URL=https://openrouter.ai/api/v1 # Get from https://serper.dev/ SERPER_API_KEY=YOUR_SERPER_DEV_API_KEY
โถ๏ธ How to Use
There are a few ways to interact with the project:
Gradio Web Interface:
- Run the Gradio app locally:
python app.py
- Or, visit the hosted Hugging Face Space: https://huggingface.co/spaces/DataDiva88/AutomatedProblemSolver_Final_Assignment
- Run the Gradio app locally:
Command Line Interface (CLI) for Custom Questions & Model Experimentation:
Use
cli.py
to ask your own questions and easily experiment with different Large Language Models (LLMs) for various agent roles, thanks to the integration with OpenRouter.Basic Question (Uses Default Models):
# Runs with the default LLMs specified in the code python cli.py --question "What is the capital of France?"
Question with a File (Uses Default Models):
python cli.py --question "Summarize this audio file." --file-name path/to/your/audio.mp3
Overriding the Manager Agent's Model: Want the main orchestrator to use a different LLM? Use the
--manager-agent-llm-id
flag.# Use Qwen 2 72B Instruct for the main manager agent python cli.py --question "Plan the steps to analyze the attached chess diagram." \ --file-name "diagram.png" \ --manager-agent-llm-id qwen/qwen2-72b-instruct:free
Overriding a Specialized Agent's Model (e.g., Coding Agent): Need a different model specifically for code interpretation? Use the corresponding flag.
# Use DeepSeek Coder for the Code Interpreter agent, keeping others default python cli.py --question "Explain the attached Python script's output." \ --file-name "script.py" \ --coding-llm-id tngtech/deepseek-coder:free
Overriding Multiple Models: You can combine flags to customize several agents in a single run.
# Use Llama 4 Maverick for the Manager and Qwen 3 235B for Reasoning tasks python cli.py --question "Analyze the arguments in the provided text." \ --file-name "arguments.txt" \ --manager-agent-llm-id meta-llama/llama-4-maverick:free \ --reasoning-agent-llm-id qwen/qwen3-235b-a22b
How it Works:
- The
cli.py
script accepts arguments like--<agent_role>-llm-id
(e.g.,--manager-agent-llm-id
,--worker-agent-llm-id
,--reasoning-agent-llm-id
,--multimodal-llm-id
,--coding-llm-id
, etc.). - These arguments directly override the default models defined in the
DefaultAgentLLMs
class within theAutoPS
core code (AutoPS/core.py
or similar). - Specify the model using its OpenRouter identifier (e.g.,
meta-llama/llama-4-maverick:free
). You can find available models on the OpenRouter Models page. - This makes it incredibly simple to test how different models perform for specific roles (manager, coding, reasoning, multimodal) without changing the core agent code.
Run Specific Assignment Tasks (
tasks.py
): Thetasks.py
script allows you to run the predefined assignment questions.- Run ALL predefined tasks:
python tasks.py
- Run a SINGLE task by its ID:
# Example: Run the first task python tasks.py 8e867cd7-cff9-4e6c-867a-ff5ddc2550be # Example: Run the task involving the chess image python tasks.py cca530fc-4052-43b2-b130-b30968d8aa44
- Run ALL predefined tasks:
๐ Telemetry & Debugging
This project uses OpenInference and Phoenix for observability and tracing agent runs.
- Start the Phoenix UI:
python -m phoenix.server.main serve
- Access the UI: Open your browser to http://localhost:6006/projects
- Now, when you run tasks via
cli.py
ortasks.py
, the agent interactions, tool usage, and LLM calls will be traced and viewable in the Phoenix UI. - Set the
LOG_LEVEL=debug
environment variable for more verbose console output.
๐ Development Notes & Future Work
Based on initial development and testing, here are some areas for improvement:
- Agent Naming: Rename
clarification_agent
to something more descriptive if its role evolves. - Model Experimentation: Continue trying different models for various agents via OpenRouter (e.g., test
featherless/qwerky-72b:free
,open-r1/olympiccoder-32b:free
more extensively). - Prompt Engineering: Refine the prompts (
TASK_PROMPT_TEMPLATE
,RESOURCE_CHECK_TEMPLATE
, and internal agent prompts) for better clarity, task decomposition, and result quality. - Planning Capabilities: Add explicit planning steps to agents like the
code_interpreter_agent
andmultimedia_analysis_agent
to break down complex tasks more robustly. - Manager Capabilities: Consider giving the
chief_problem_solver_agent
access to all tools/capabilities (similar to a reasoning agent) for more flexibility in handling complex, multi-step problems directly if needed. - PDF Support: PDF support for the agents could be improved. Maybe with a dedicated tool.
Hugging Face Space Configuration
This project is configured to run as a Hugging Face Space using the following settings (./.huggingface/README.md
metadata):
- SDK: Gradio (
sdk: gradio
) - SDK Version: 5.25.2 (
sdk_version: 5.25.2
) - Application File:
app.py
(app_file: app.py
) - OAuth: Enabled for potential HF features (
hf_oauth: true
) - Config Reference
Happy agent building! Let me know if you have questions.
Content loaded from /home/user/app/README.md