VT.ai Features¶
This page provides detailed information about VT.ai's key features and how to use them effectively.
Chat Modes¶
VT.ai offers different chat modes to suit your specific needs:
Standard Chat¶
The standard chat mode provides access to all configured LLM providers with dynamic conversation routing:
- Automatic classification and routing of queries to appropriate models
- Support for text, image, and audio inputs
- Full access to all VT.ai features
To use standard chat:
- Simply type your message in the input field
- Press Enter to send
- The system will automatically route your query to the most appropriate model
Assistant Mode (Beta)¶
Assistant mode provides specialized capabilities for more complex tasks:
- Code interpreter for executing Python code
- File attachment support (PDF, CSV, images, etc.)
- Persistent conversation threads
- Function calling for external integrations
- Web search with intelligent summarization
To use assistant mode:
- Switch to the Assistant profile in the dropdown menu
- Upload files if needed
- Type your queries as normal
- View the step-by-step execution in the interface
Specialized Features¶
Web Search with Smart Summarization¶
VT.ai can search the web for information and intelligently summarize the results:
- Accumulate information from multiple search results
- Generate coherent, comprehensive summaries
- Cite sources with proper attribution
- Toggle between raw results and AI-synthesized summaries
To use web search with summarization:
- Ask a question that might require current information
- VT.ai will automatically route to the web search tool
- Results will be summarized into a concise, readable answer
- Sources will be listed with clickable links
You can control summarization behavior:
- Enable/disable summarization in the settings menu
- When enabled, multiple search results are synthesized into a unified response
- When disabled, search results are presented in a more raw format
Example queries:
- "What are the latest developments in quantum computing?"
- "Search for information about sustainable energy solutions"
- "Find recent news about Mars exploration"
Thinking Mode¶
Thinking mode gives you access to step-by-step reasoning from the models, providing transparency into the AI's thought process:
- See the model's internal reasoning process
- Understand how the model arrived at its conclusion
- Great for learning, debugging, and complex problem-solving
- Helps with verification of facts and logical reasoning
How Thinking Mode Works¶
In the background, VT.ai uses special reasoning-enhanced models and prompt engineering to make the model's thought process explicit:
- The query is sent to a reasoning-capable model with instructions to show its work
- The model breaks down the problem into steps
- Each step of reasoning is displayed in the interface
- The final conclusion is presented after the reasoning steps
Using Thinking Mode¶
There are two ways to activate thinking mode:
- Manual Activation: Add the
<think>
tag at the beginning of your message
<think>What are the key factors that contributed to the Industrial Revolution?
- Automatic Activation: Enable "Use Thinking Mode For Reasoning Models" in settings
- When enabled, VT.ai will automatically use thinking mode for models in the reasoning-enhanced list
- This includes models like DeepSeek Reasoner, DeepSeek R1 series, Qwen 2.5, Claude 3, and GPT-4o
Best Uses for Thinking Mode¶
Thinking mode is especially useful for:
- Complex problem solving: Mathematics, logic puzzles, step-by-step analysis
- Fact verification: See how the model reaches factual conclusions
- Learning: Understand reasoning processes for educational topics
- Debugging: Identify where reasoning might go wrong
- Decision making: Follow the model's decision process
Example queries that work well with thinking mode:
<think>Is it more environmentally friendly to use paper bags or plastic bags?
<think>Solve the quadratic equation: 2x² + 7x - 15 = 0
<think>What would happen to Earth's climate if the sun suddenly became 10% brighter?
<think>Analyze the following code and explain what it does:
def mystery_function(arr):
result = []
for i in range(len(arr)):
if i % 2 == 0:
result.append(arr[i] * 2)
else:
result.append(arr[i] + 3)
return result
Models That Excel with Thinking Mode¶
While thinking mode works with all models, these models are specifically optimized for step-by-step reasoning:
- DeepSeek Reasoner: Specifically designed for transparent reasoning
- DeepSeek R1 Series: Enhanced reasoning capabilities across different model sizes
- Qwen 2.5 Models: Excellent structured reasoning abilities
- Claude 3 Opus/Sonnet: Strong logical reasoning with clear explanations
- GPT-4o: Advanced reasoning with multimodal capabilities
Image Analysis¶
VT.ai can analyze and interpret images:
- Upload images directly from your device
- Provide URLs to online images
- Get detailed descriptions and analysis
- Extract text with optical character recognition (OCR)
- Identify objects, scenes, and visual elements
To analyze an image:
- Click the upload button or paste an image URL
- Ask a question about the image
- The system will analyze the image and respond to your query
Example queries:
- "What's in this image?"
- "Can you describe this diagram?"
- "What text appears in this screenshot?"
- "Identify the objects in this picture"
- "What emotion does the person in this image appear to be feeling?"
Image Generation¶
Generate images based on text descriptions:
- DALL-E 3: Create custom images from detailed prompts
- GPT-Image-1: Advanced image generation with extensive customization options
- Transparent backgrounds for logos and graphics
- Multiple output formats (PNG, JPEG, WEBP)
- Customizable dimensions (square, landscape, portrait)
- Variable quality and compression settings
- HD options for higher quality outputs
- Moderation controls
To generate an image:
- Type a prompt like "Generate an image of a futuristic city with flying cars"
- The system will recognize the image generation intent
- The appropriate image generation model will create and display the image based on your description
Advanced Image Generation Settings¶
For advanced GPT-Image-1 options, you can configure:
- Image size: Set with
VT_SETTINGS_IMAGE_GEN_IMAGE_SIZE
-
Options: "1024x1024" (square), "1792x1024" (landscape), "1024x1792" (portrait), "1536x1536" (large square)
-
Quality: Set with
VT_SETTINGS_IMAGE_GEN_IMAGE_QUALITY
-
Options: "standard" (faster), "hd" (higher quality)
-
Background: Set with
VT_SETTINGS_IMAGE_GEN_BACKGROUND
-
Options: "auto" (context-dependent), "transparent" (for PNG format)
-
Output format: Set with
VT_SETTINGS_IMAGE_GEN_OUTPUT_FORMAT
-
Options: "png" (lossless, supports transparency), "jpeg" (smaller file size), "webp" (best compression)
-
Compression: Set with
VT_SETTINGS_IMAGE_GEN_OUTPUT_COMPRESSION
- Range: 0-100, where 100 is maximum quality (webp/jpeg only)
GPT-Image-1 Prompt Guide¶
To get the best results with GPT-Image-1, consider these prompting strategies:
- Be specific and detailed: Describe subjects, setting, lighting, style, mood
- Specify artistic style: Photorealistic, cartoon, oil painting, watercolor, etc.
- Include perspective information: Close-up, aerial view, isometric, etc.
- Mention lighting conditions: Natural light, studio lighting, dramatic shadows
- Reference time period or era: Victorian, futuristic, 1980s, etc.
Example of a detailed prompt:
Generate an image of a serene Japanese garden at sunset, with a small wooden bridge crossing a koi pond. Cherry blossom trees frame the scene, with soft pink petals falling onto the water's surface. The lighting is warm and golden, creating long shadows. Style: watercolor painting with fine details.
Voice Interaction¶
VT.ai supports comprehensive voice-based interaction:
- Speech-to-Text: Real-time voice transcription using OpenAI's Whisper model
- Smart silence detection for natural conversation flow
- High-accuracy transcription across multiple languages
-
Seamless integration with conversation routing
-
Text-to-Speech: Listen to AI responses with natural-sounding voices
- Multiple voice options (alloy, echo, fable, onyx, nova, shimmer)
- Toggle TTS on/off in settings
- High-quality voice synthesis using OpenAI's Audio API
-
Speak response action appears on all messages
-
Audio Understanding: Analyze and understand audio content
- Upload audio files for detailed analysis
- Get both transcription and contextual understanding
- Support for various audio formats (MP3, WAV, M4A, etc.)
To use voice features:
- Enable TTS in the settings menu
- Select your preferred voice model
- Each response will include a speech button to listen to the content
- For voice input, click the microphone icon and speak your query
For the best voice interaction experience:
- Use a good quality microphone in a quiet environment
- Speak clearly and at a moderate pace
- Allow a brief pause after speaking to trigger automatic detection
- Choose a voice model that matches your preference for response playback
Model Selection¶
VT.ai supports a wide range of models:
- OpenAI: GPT-o1, GPT-o1 Mini, GPT-o1 Pro, GPT-o3 Mini, GPT-4.5 Preview, GPT-4o
- Anthropic: Claude 3.5/3.7 (Sonnet, Haiku, Opus)
- Google: Gemini 1.5 Pro/Flash, Gemini 2.0, Gemini 2.5 Pro/Flash
- Vision Models: GPT-4o, Gemini 1.5 Pro/Flash, Gemini 2.5 Pro/Flash, Claude 3 models, Llama 3.2 Vision
- TTS Models: GPT-4o mini TTS, TTS-1, TTS-1-HD
- Local Models: Llama3, Mistral, DeepSeek R1, Qwen2.5-coder (via Ollama)
You can select models in several ways:
- Use the model selector in the settings menu
- Specify a model at startup with
vtai --model model-name
- Let the semantic router automatically select the best model for your query
Configuration Options¶
VT.ai offers various configuration options accessible through the settings menu:
- Temperature: Control randomness in responses (0.0-2.0)
- Top P: Adjust response diversity (0.0-1.0)
- Image Generation Settings: Style, quality, format, and dimension options
- TTS Settings: Voice models and quality options
- Routing Options: Enable/disable dynamic conversation routing
- Thinking Mode: Enable/disable automatic thinking mode for reasoning models
- Web Search: Configure search result display and summarization
For more details on configuration, see the Configuration page.