The Model Standard for Production Teams

TrueFusion Max

TrueFusion Max

TrueFusion Pano

TrueFusion Pano

TrueFusion Video Pro

text-to-videoimage-to-video

TrueFusion Video Pro

TrueFusion Video

text-to-videoimage-to-video

TrueFusion Video Standard

Lumo

Lumo is an image-to-video video model specifically made for motion, animations and general use cases

image-to-video

TrueFusion Variant

TrueFusion Variant

TrueFusion Edge

text-to-imageimage-to-imagefast

TrueFusion Edge Ultra-fast, lightweight AI model delivering stunningly realistic image-to-video results with minimal resource usage—optimized for mobile and real-time applications.

TrueFusion Standard

TrueFusion Standard

TrueFusion Ultra

Our flagship and most advanced text-to-image model yet. TrueFusion Ultra delivers stunning photorealism, artistic creativity, and unmatched consistency across styles. From intricate details to vivid storytelling, it redefines what's possible with generative visuals.

LipFusion

video-to-videoaudio-to-video

LipFusion is a cutting-edge AI model engineered by Skytells to deliver ultra-realistic lip-syncing capabilities across a wide range of content — from videos and animations to avatars and live streams. With advanced deep learning architectures and real-time inference optimization, LipFusion seamlessly aligns speech with visual output, creating an immersive, human-like experience that brings characters, avatars, and digital personas to life like never before.

Mera

image-to-videotext-to-videoaudio

Our latest video generation model is more physically accurate, super realistic, and more controllable than prior systems.

TrueFusion X

Ultra Fast, Ultra High-Resolution - More Pixels in Every Image.

TrueFusion 2.0

image-to-imagetext-to-imagereference

TrueFusion 2.0 Image lets you attach up to three images as ground truth and reference them by tags in your prompt. It preserves identity, style, and materials while giving you control over angle, composition, lighting, and fine details—so the final image matches exactly what you envisioned.

Flux.1 Edge

image-to-imagetext-to-imagequality

Super-fast version of Flux model, Optimized by Skytells for instant image generation.

TrueFusion Optima

image-to-imagetext-to-imagequality

Expert-coordinated realism at production scale, TrueFusion 2.0 Optima is a next-generation MoE architecture delivering unmatched realism,lifelike lighting, and film-grade image precision.

text-to-audiomusicquality

BeatFusion 2.0

audio

Skytells's Flagship music generation model, Generate full-length songs with vocals, lyrics, and rich instrumentation from a text prompt.

text-to-audiomusicquality

BeatFusion 1.0

audio

Skytells's First music generation model, Generate full-length songs with vocals, lyrics, and rich instrumentation from a text prompt.

DeepBrain Router

text-to-textcodingwriting

DeepBrain Router is Skytells’ advanced model orchestration layer, built to intelligently choose the right model for the right task. Optimized for coding, writing, reasoning, and complex multi-domain workloads, it dynamically routes requests across a curated set of flagship models from leading providers. The result is stronger output quality, improved cost-performance balance, and a more reliable AI experience at scale.

text-to-audiomusicquality

BeatFusion 2.1

audio

Skytells's Flagship music generation model, Generate full-length songs with vocals, lyrics, and rich instrumentation from a text prompt.

Mera 1.1 Fast

image-to-videotext-to-videoaudio-to-video

Mera 1.1 Fast is our latest flagship video generation model, combining Mera’s precision with advances from major video generation models to deliver fast inference, rapid draft previews, robust lipsync, built-in audio generation, and highly controllable text-to-video, image-to-video, and audio-to-video creation with up to 1080p at 48 FPS, multi-aspect ratio support, and prompt upsampling for more realistic, physically accurate results.

Mera Avatar

image-to-videotext-to-videoaudio-to-video

Mera Avatar is our latest flagship video generation model, engineered to deliver exceptional speed, realism, and creative control. Built on Mera's advanced architecture and enhanced with innovations inspired by the latest breakthroughs in video generation, Mera Avatar supports text-to-video, image-to-video, and audio-to-video workflows with remarkably fast inference and rapid draft previews. It features robust lip synchronization, native audio generation, prompt upsampling for richer scene understanding, and highly controllable generation, producing physically accurate, cinematic results in up to 1080p resolution at 48 FPS, with full multi-aspect ratio support for every creative workflow.

Alibaba

1 models

Covered vendor

Wan 2.5-i2v

image-to-videoreferencequality

Alibaba Wan 2.5 Image to video generation with background audio

Covered by SkytellsAlibaba

Black Forest Labs

4 models

Covered vendor

FLUX-1.1 Pro

Ultra Fast, Ultra High-Resolution - More Pixels in Every Image.

Flux 2 Pro Legacy

text-to-imageimage-to-imageediting

High-quality image generation and editing with support for eight reference images

Flux 2 Flex

text-to-imageimage-to-imageediting

Max-quality image generation and editing with support for ten reference images

FLUX.2 Pro

Ultra Fast, Ultra High-Resolution - More Pixels in Every Image.

Google

6 models

Covered vendor

Imagen 3

Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty

Imagen 4

Google's flagship text-to-image model, capable of generating images with detail, rich lighting and beauty

Veo 3.1

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

Veo 3.1 Fast

New and improved version of Veo 3 Fast, with higher-fidelity video, context-aware audio and last frame support

Nano Banana

Google's latest image editing model in Gemini 2.5

Veo 3.1 (Preview)

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

Moonshot AI

1 models

Covered vendor

Kimi K2.6

text-to-textlanguagewriting

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and can convert prompts and visual inputs into production-ready interfaces. Its agent swarm architecture scales to hundreds of parallel sub-agents for autonomous task decomposition - delivering documents, websites, and spreadsheets in a single run without human oversight.

Covered by SkytellsMoonshot AI

Nvidia

1 models

Covered vendor

Sana

Covered by SkytellsNvidia

A fast image model with wide artistic range and resolutions up to 4096x4096

text-to-image

OpenAI

8 models

Covered vendor

GPT-Image-1

A multimodal image generation model that creates high-quality images.

Sora 2

OpenAI's Most advanced synced-audio video generation

Sora 2 Pro

OpenAI's Most advanced synced-audio video generation

GPT-5

text-to-textcodingpartner

OpenAI's new model excelling at coding, writing, and reasoning.

GPT-5.3 Codex

text-to-textcodingwriting

GPT‑5.3‑Codex achieves state-of-the-art performance on SWE-Bench Pro, a rigorous evaluation of real-world software engineering. Where SWE‑bench Verified only tests Python, SWE‑Bench Pro spans four languages and is more contamination‑resistant, challenging, diverse and industry-relevant. It also far exceeds the previous state-of-the-art performance on Terminal-Bench 2.0, which measures the terminal skills a coding agent like Codex needs. Notably, GPT‑5.3‑Codex does so with fewer tokens than any prior model, letting users build more.

GPT-5.4

text-to-textlanguagewriting

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.

GPT-5.4 Mini

text-to-textlanguagewriting

GPT-5.4 mini brings the strengths of GPT-5.4 to a faster, more efficient model designed for high-volume workloads.

GPT Image 2

GPT Image 2 is OpenAI's state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.