title: Automated Problem Solver (Final Assignment) emoji: 🤖 colorFrom: gray colorTo: blue sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false hf_oauth: true # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. hf_oauth_expiration_minutes: 480

🤖 Automated Problem Solver (Final Assignment)

Hello fellow agent builders! This repository contains the final assignment for an automated problem-solving system. It utilizes a multi-agent architecture built with smolagents, leveraging various specialized tools and large language models (LLMs) accessed via OpenRouter to tackle a diverse range of questions.

The system is designed to:

Understand & Clarify: Analyze the input question and associated files.
Delegate: Route the task to the most suitable specialized agent (Web Search, YouTube Interaction, Multimedia Analysis, Code Interpretation).
Utilize Tools: Employ custom tools for specific actions like YouTube video downloading, Wikipedia searching, speech-to-text transcription, and video audio extraction.
Reason & Synthesize: Process information gathered by agents and tools to formulate a final answer.

✨ Core Concepts & Architecture

This project employs a hierarchical multi-agent system:

Chief Problem Solver Agent (Manager): The main orchestrator (chief_problem_solver_agent). It receives the initial problem, potentially clarifies it using a dedicated agent, and delegates the task to the appropriate specialized worker agent. It uses meta-llama/llama-4-maverick:free by default.
Specialized Agents:
- Clarification Agent: Refines the user's question if needed. Uses a strong reasoning model (qwen/qwen3-235b-a22b by default).
- YouTube Interaction Agent: Handles questions involving YouTube videos, utilizing relevant tools. Uses meta-llama/llama-4-maverick:free by default.
- Web Search Manager Agent: Manages web searches using Serper and delegates specific page retrieval/analysis to its sub-agent. Uses meta-llama/llama-4-scout:free (high context) by default.
  - Website Retrieval Agent: Fetches and processes content from specific web pages. Uses a strong reasoning model (qwen/qwen3-235b-a22b by default).
- Multimedia Analysis Agent: Processes images and audio files (using STT tools internally). Uses a multimodal model capable of vision (meta-llama/llama-4-scout:free by default).
- Code Interpreter Agent: Executes and analyzes provided code snippets. Uses a coding-specialized model (open-r1/olympiccoder-32b:free by default).

Why OpenRouter?

Using OpenRouter provides significant advantages:

Model Flexibility: Easily swap different LLMs for different agents to optimize for cost, performance, or specific capabilities (reasoning, coding, vision).
Access to Diverse Models: Test and use a wide variety of models, including powerful free-tier options like qwerky-72b:free, olympiccoder-32b:free, or various Llama models.
Simplified API: Access multiple LLM providers through a single API endpoint and key.

You'll need an OpenRouter API key to run this project.

🛠️ Custom Tools

The system relies on several custom tools to interact with external resources:

`YouTubeVideoDownloaderTool`

Downloads YouTube videos.

Test best quality (default):

python cli.py --test-tool YouTubeVideoDownloaderTool --test-input "https://www.youtube.com/watch?v=aqz-KE-bpKQ"

Test standard quality:

python cli.py --test-tool YouTubeVideoDownloaderTool --test-input "https://www.youtube.com/watch?v=aqz-KE-bpKQ" --test-quality standard

Test low quality:

python cli.py --test-tool YouTubeVideoDownloaderTool --test-input "https://www.youtube.com/watch?v=aqz-KE-bpKQ" --test-quality low

`CustomWikipediaSearchTool`

Searches current or historical Wikipedia articles. Requires a User-Agent.

Test Current Summary (Wikitext - default):

python cli.py --test-tool CustomWikipediaSearchTool \
              --test-input "Python (programming language)" \
              --user-agent "MyTestAgent/1.0 (myemail@example.com)" \
              --content-type summary

Test Current Full Text (HTML):

python cli.py --test-tool CustomWikipediaSearchTool \
              --test-input "Artificial Intelligence" \
              --user-agent "MyTestAgent/1.0 (myemail@example.com)" \
              --content-type text \
              --extract-format HTML

Test Historical Version (Dec 31, 2022, Wikitext):

python cli.py --test-tool CustomWikipediaSearchTool \
              --test-input "Web browser" \
              --user-agent "MyTestAgent/1.0 (myemail@example.com)" \
              --revision-date "2022-12-31"

Test Historical Version (June 1, 2021, HTML):

python cli.py --test-tool CustomWikipediaSearchTool \
              --test-input "Quantum computing" \
              --user-agent "MyTestAgent/1.0 (myemail@example.com)" \
              --revision-date "2021-06-01" \
              --extract-format HTML

`CustomSpeechToTextTool`

Transcribes audio files using Hugging Face Transformers (Whisper).

Example (Default Checkpoint openai/whisper-base.en):

python cli.py --test-tool CustomSpeechToTextTool --test-input /path/to/your/audio.wav

Example (Tiny English Model):

python cli.py --test-tool CustomSpeechToTextTool --test-input /path/to/your/audio.mp3 --checkpoint openai/whisper-tiny.en

Example (Audio URL): (Requires AgentAudio to support URL loading)

python cli.py --test-tool CustomSpeechToTextTool --test-input https://example.com/audio.ogg

`VideoAudioExtractorTool`

Extracts audio tracks from video files.

Basic Test (MP3 to same directory):

python cli.py --test-tool VideoAudioExtractorTool --test-input my_test_video.mp4

Specify Output Directory, Format (WAV):

python cli.py --test-tool VideoAudioExtractorTool --test-input path/to/another_video.mov --output-dir ./extracted_audio --output-format wav

Specify AAC Format and Bitrate:

python cli.py --test-tool VideoAudioExtractorTool --test-input my_video.mp4 --output-format aac --audio-quality 192k

🚀 Getting Started (Local Setup)

Prerequisites:
- Python 3.12+
- git
- git-lfs (Install from https://git-lfs.com)

Clone the Repository:

Initialize Git LFS: git lfs install

Clone the space:

# Use an access token with write permissions as the password when prompted
# Generate one: https://huggingface.co/settings/tokens
git clone https://huggingface.co/spaces/DataDiva88/AutomatedProblemSolver_Final_Assignment

(Optional) To clone without downloading large LFS files immediately:
```
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/spaces/DataDiva88/AutomatedProblemSolver_Final_Assignment
```
You might need to run git lfs pull later to fetch the actual file contents if needed.

Install Dependencies:
```
cd AutomatedProblemSolver_Final_Assignment
pip install -r requirements.txt
```
⚠️ Note: This might download large model files (e.g., for Transformers/Whisper), which can take time and disk space.

Configure Environment Variables: Create a .env file in the root directory or set the following environment variables:

# --- Hugging Face (Optional, needed for private spaces/LFS upload) ---
# HF_TOKEN=hf_YOUR_HUGGINGFACE_TOKEN
# SPACE_ID=DataDiva88/AutomatedProblemSolver_Final_Assignment

# --- Application Settings ---
DEBUG=true
GRADIO_DEBUG=true # For Gradio interface debugging
LOG_LEVEL=debug   # Set log level (debug, info, warning, error)

# --- API Keys (REQUIRED) ---
# Get from https://openrouter.ai/
LLM_API_KEY=sk-or-v1-YOUR_OPENROUTER_API_KEY
LLM_BASE_URL=https://openrouter.ai/api/v1

# Get from https://serper.dev/
SERPER_API_KEY=YOUR_SERPER_DEV_API_KEY

▶️ How to Use

There are a few ways to interact with the project:

Gradio Web Interface:
- Run the Gradio app locally: python app.py
- Or, visit the hosted Hugging Face Space: https://huggingface.co/spaces/DataDiva88/AutomatedProblemSolver_Final_Assignment
Command Line Interface (CLI) for Custom Questions & Model Experimentation:

Use cli.py to ask your own questions and easily experiment with different Large Language Models (LLMs) for various agent roles, thanks to the integration with OpenRouter.
- Basic Question (Uses Default Models):
```
# Runs with the default LLMs specified in the code
python cli.py --question "What is the capital of France?"
```
- Question with a File (Uses Default Models):
```
python cli.py --question "Summarize this audio file." --file-name path/to/your/audio.mp3
```
- Overriding the Manager Agent's Model: Want the main orchestrator to use a different LLM? Use the --manager-agent-llm-id flag.
```
# Use Qwen 2 72B Instruct for the main manager agent
python cli.py --question "Plan the steps to analyze the attached chess diagram." \
              --file-name "diagram.png" \
              --manager-agent-llm-id qwen/qwen2-72b-instruct:free
```
- Overriding a Specialized Agent's Model (e.g., Coding Agent): Need a different model specifically for code interpretation? Use the corresponding flag.
```
# Use DeepSeek Coder for the Code Interpreter agent, keeping others default
python cli.py --question "Explain the attached Python script's output." \
              --file-name "script.py" \
              --coding-llm-id tngtech/deepseek-coder:free
```
- Overriding Multiple Models: You can combine flags to customize several agents in a single run.
```
# Use Llama 4 Maverick for the Manager and Qwen 3 235B for Reasoning tasks
python cli.py --question "Analyze the arguments in the provided text." \
              --file-name "arguments.txt" \
              --manager-agent-llm-id meta-llama/llama-4-maverick:free \
              --reasoning-agent-llm-id qwen/qwen3-235b-a22b
```
How it Works:
- The cli.py script accepts arguments like --<agent_role>-llm-id (e.g., --manager-agent-llm-id, --worker-agent-llm-id, --reasoning-agent-llm-id, --multimodal-llm-id, --coding-llm-id, etc.).
- These arguments directly override the default models defined in the DefaultAgentLLMs class within the AutoPS core code (AutoPS/core.py or similar).
- Specify the model using its OpenRouter identifier (e.g., meta-llama/llama-4-maverick:free). You can find available models on the OpenRouter Models page.
- This makes it incredibly simple to test how different models perform for specific roles (manager, coding, reasoning, multimodal) without changing the core agent code.

Run Specific Assignment Tasks (tasks.py): The tasks.py script allows you to run the predefined assignment questions.

Run ALL predefined tasks:
```
python tasks.py
```

Run a SINGLE task by its ID:

# Example: Run the first task
python tasks.py 8e867cd7-cff9-4e6c-867a-ff5ddc2550be

# Example: Run the task involving the chess image
python tasks.py cca530fc-4052-43b2-b130-b30968d8aa44

📊 Telemetry & Debugging

This project uses OpenInference and Phoenix for observability and tracing agent runs.

Start the Phoenix UI:
```
python -m phoenix.server.main serve
```
Access the UI: Open your browser to http://localhost:6006/projects
Now, when you run tasks via cli.py or tasks.py, the agent interactions, tool usage, and LLM calls will be traced and viewable in the Phoenix UI.
Set the LOG_LEVEL=debug environment variable for more verbose console output.

📝 Development Notes & Future Work

Based on initial development and testing, here are some areas for improvement:

Agent Naming: Rename clarification_agent to something more descriptive if its role evolves.
Model Experimentation: Continue trying different models for various agents via OpenRouter (e.g., test featherless/qwerky-72b:free, open-r1/olympiccoder-32b:free more extensively).
Prompt Engineering: Refine the prompts (TASK_PROMPT_TEMPLATE, RESOURCE_CHECK_TEMPLATE, and internal agent prompts) for better clarity, task decomposition, and result quality.
Planning Capabilities: Add explicit planning steps to agents like the code_interpreter_agent and multimedia_analysis_agent to break down complex tasks more robustly.
Manager Capabilities: Consider giving the chief_problem_solver_agent access to all tools/capabilities (similar to a reasoning agent) for more flexibility in handling complex, multi-step problems directly if needed.
PDF Support: PDF support for the agents could be improved. Maybe with a dedicated tool.

Hugging Face Space Configuration

This project is configured to run as a Hugging Face Space using the following settings (./.huggingface/README.md metadata):

SDK: Gradio (sdk: gradio)
SDK Version: 5.25.2 (sdk_version: 5.25.2)
Application File: app.py (app_file: app.py)
OAuth: Enabled for potential HF features (hf_oauth: true)
Config Reference

Happy agent building! Let me know if you have questions.

Displaying README.md