What does Flowtivity do?

Flowtivity is an AI and automation consultancy that helps businesses redesign how work gets done. We combine AI education, custom workflow automation, and AI agent development so businesses become faster, lighter, and easier to run.

How much does Flowtivity cost?

Flowtivity offers a free website scan for automation readiness. Simple AI tools start from $250-600/month. Custom automation projects start from $1K setup + $400-1K/month. Comprehensive implementations from $9K-50K+.

What industries does Flowtivity serve?

Flowtivity serves SMB and mid-market businesses across healthcare, professional services, construction, finance, retail, and education sectors on the Gold Coast and across Australia.

Does Flowtivity offer free tools?

Yes. Flowtivity Insights is a free AI website scanner that provides an automation readiness score (0-100), AEO score (0-100), and personalized AI opportunity report. No sign-up required.

The Week Open-Source AI Went Nuclear: 25+ Open-Weight Drops That Changed Everything

Last Updated: June 6, 2026

The first week of June 2026 will go down as the most consequential seven days in open-source AI history. In the span of just days, over 25 frontier-grade open-weight models landed across every modality: language, vision, image generation, audio, speech, video, 3D, and even physical world simulation. This was not a trickle. This was a flood.

NVIDIA dropped a 550-billion-parameter hybrid Mamba beast. Google shipped a laptop-friendly multimodal gem. Ideogram finally open-sourced the image model that rivals GPT Image 2. Four separate labs shipped open TTS systems in the same week. And a 1-billion-parameter document parser from Baidu outperformed models 10x its size.

This is the full breakdown: what landed, why it matters, and what it means for anyone building with AI.

Key Takeaways

25+ open-weight models released across LLMs, image gen, audio, speech, vision, video, 3D, and world models in one week
NVIDIA Nemotron 3 Ultra (550B, 55B active) is the largest open hybrid Mamba-MoE ever released, with 1M token context
Google Gemma 4 12B runs on a laptop with 16GB RAM and handles text, images, audio, and video natively
Ideogram 4 became the first open-weight image model to seriously challenge closed frontier systems, ranking #2 globally behind GPT Image 2
Four open TTS models dropped in one week, including Boson's 102-language model and RedNote's codec-free pipeline
NVIDIA Cosmos3-Super (64B) is an omnimodal world model that generates video, audio, and robot action trajectories simultaneously
The cost of frontier AI capabilities dropped to zero this week. Every category now has a production-grade open alternative.

Why This Week Matters for Australian Businesses

For growing businesses in Australia, this week represents a fundamental shift in what is possible without vendor lock-in. Before June 2026, accessing frontier image generation, multilingual speech, or enterprise-grade document parsing meant paying per API call to closed providers. Now, every single one of these capabilities can be self-hosted, customized, and deployed on your own infrastructure.

The implications are immediate. A childcare provider can run document parsing locally without sending sensitive records to an overseas API. A construction firm can deploy multilingual speech recognition across job sites with no per-minute charges. An engineering consultancy can generate photorealistic visualizations without a subscription to a closed image service.

This is not theoretical. These models are available today, under permissive licenses, and designed for real-world deployment.

The LLMs: Open Models That Rival Closed Systems

NVIDIA Nemotron 3 Ultra: The 550B Hybrid Behemoth

NVIDIA's Nemotron 3 Ultra is the headline act. At 550 billion total parameters with only 55 billion active per token, it uses a hybrid Mamba-Attention Mixture-of-Experts architecture that is the first of its kind at this scale.

What makes it special:

Hybrid Mamba-Attention: Mamba layers handle long sequences with sub-quadratic scaling, while attention layers provide precise recall. This is not a pure transformer. It is an entirely new architectural paradigm.
1 million token context window: That is approximately 750,000 words. You can feed it an entire codebase, a full year of meeting transcripts, or a complete regulatory document library in a single prompt.
MMLU 89.1: Closes the gap with frontier closed models on general knowledge benchmarks.
NVFP4 quantization: A variant optimized for NVIDIA's Blackwell, Hopper, and Ampere architectures claims roughly 5x throughput improvement, making a 550B model practically deployable.
OpenMDW-1.1 license: Weights, training data, and training recipes are all open. This is not just inference access. This is full transparency.
Pre-trained on 20 trillion tokens, with post-training via SFT, RL, and Multi-teacher On-Policy Distillation (MOPD).

NVIDIA designed Nemotron 3 Ultra specifically for long-running agentic workloads: multi-step reasoning, tool use, complex planning, and autonomous task completion across many turns. Available on HuggingFace and through Amazon SageMaker JumpStart, NVIDIA NIM, Perplexity, and OpenRouter.

The bottom line: This is the most capable open language model ever released, purpose-built for the agentic AI era. If you are building autonomous workflows, this is your new foundation.

Google Gemma 4 12B: The Everything Model for Your Laptop

Google's Gemma 4 12B Unified shipped on June 3, 2026, and it is arguably the most practically useful model of the entire week.

What makes it special:

Any-to-any multimodal: Handles text, image, and native audio input, plus video understanding (processed as frames). All in a single model. No separate encoders needed.
Encoder-free architecture: Vision and audio inputs flow directly into the LLM backbone, reducing latency and memory usage.
256K context window: Handles long documents and multi-turn conversations with ease.
140+ languages: Broad multilingual support out of the box.
AIME 2026 score of 77.5: Competitive math reasoning despite its compact size.
23-checkpoint QAT wave: Shipped with quantization-aware training checkpoints for mobile ONNX and Apple MLX deployment.
Apache 2.0 license: Fully open for commercial use.
Runs on 16GB VRAM: This is a laptop model. You do not need a datacenter.

Gemma 4 12B is the most deployable model of the week. For Australian businesses that need a single model to handle text analysis, image understanding, and audio processing on local hardware, this is the one to start with.

The bottom line: If you only deploy one model from this week, make it Gemma 4 12B. It does everything, runs anywhere, and costs nothing.

StepFun Step-3.7-Flash: The 198B Coding Visionary

StepFun's Step-3.7-Flash is a 198-billion-parameter sparse MoE vision-language model with approximately 11 billion active parameters per token. Released under Apache 2.0, it is built for high-efficiency multimodal agentic workflows.

What makes it special:

Native multimodal: A 1.8B-parameter vision encoder (ViT) provides native image and video understanding. It can parse UI elements, charts, documents, and application interfaces.
SWE-Bench Pro 56.3%: Strong software engineering performance, leading ClawEval-1.1 at 67.1.
256K context window: Handles large codebases and complex multi-step tasks.
Three reasoning levels: Developers can select low, medium, or high reasoning depth to balance speed and accuracy.
Apache 2.0 license: Fully open for commercial deployment.
Broad inference support: Works with vLLM, SGLang, HuggingFace Transformers, and llama.cpp.

Step-3.7-Flash is particularly relevant for software development workflows. Its ability to understand visual interfaces and convert them into structured code outputs makes it valuable for automated testing, UI-to-code pipelines, and agentic development tools.

The bottom line: The best open model for coding agents and UI understanding. StepFun has quietly built something special here.

Liquid AI LFM2.5-8B-A1B: The Edge King

Liquid AI's LFM2.5-8B-A1B is designed for the edge: 8.3 billion total parameters but only 1.5 billion active per token. It is a reasoning-only model that produces explicit chain-of-thought before answering.

What makes it special:

MATH500 88.8: Exceptional mathematical reasoning for its size. Competitive with models many times larger.
128K context window: Surprisingly long context for an edge model.
MLX-ready: Optimized for Apple Silicon deployment. Runs locally on MacBooks.
Explicit chain-of-thought: Produces visible reasoning steps, making it ideal for applications where you need to audit the thinking process.

This model is perfect for on-device deployments where you need mathematical reasoning or structured analysis without cloud connectivity. Think field engineering apps, offline data analysis tools, or embedded systems.

The bottom line: The best on-device reasoning model available. Put it on a laptop or edge device and get frontier-grade math without the internet.

JetBrains Mellum2-12B-A2.5B-Thinking: The Dev Tool Specialist

JetBrains open-sourced Mellum2, their first MoE model, with 12 billion total parameters and 2.5 billion active per token. It features 64 experts with 8 activated per token.

What makes it special:

Near-Qwen3-14B coding performance at 2.5B active: Delivers competitive code generation and understanding with far fewer active parameters.
131K context window: Uses a combination of sliding-window and full attention layers.
Thinking variant: Post-trained for reasoning-augmented assistance with explicit reasoning blocks for complex debugging and multi-step planning.
Apache 2.0 license: Fully open.
Trained on 10.6 trillion tokens across natural language and code with a three-phase curriculum.
LiveCodeBench v6 69.9%, EvalPlus 78.4.

JetBrains positions Mellum2 as a "focal model" for multi-model AI systems: fast enough for routing, RAG pipelines, sub-agent tasks, and private deployment. It is not trying to be a frontier model. It is trying to be the fastest specialized tool in the box.

The bottom line: The best open coding model for integration into developer tools, IDEs, and multi-model pipelines.

Image Generation: Ideogram 4 Changes the Game

Ideogram 4: The First Open-Weight Image Model With Taste

Ideogram 4 is arguably the surprise of the entire week. Ideogram's first-ever open-weight release is a 9.3-billion-parameter flow-matching Diffusion Transformer trained from scratch.

What makes it special:

#2 overall globally behind GPT Image 2 on image generation benchmarks. The top open-weight model on both Design Arena and LMArena.
Strongest open checkpoint for text-rich images: If you need text in your generated images (logos, posters, signage, social media graphics), Ideogram 4 is the best open option by a significant margin.
Structured JSON prompting: Enhanced control over text rendering, bounding-box layout, and color palettes.
Native 2K resolution: Generates at 2048px natively without upscaling.
Flow-matching DiT architecture: Modern architecture trained from scratch, not a fine-tune of an existing model.

The community reaction has been remarkable. After years of open image models playing catch-up to closed systems like DALL-E 3 and Midjourney, Ideogram 4 represents the moment the open ecosystem caught up. For Australian businesses that need branded visual content, marketing imagery, or design assets, this model eliminates the need for paid image generation subscriptions.

The bottom line: The best open image generator ever released. Period. If you are paying for image generation, you can stop now.

Audio and Speech: Four Labs, One Breakout Week

The audio and speech category had a breakout week with four separate open TTS and audio systems landing simultaneously. This is unprecedented.

Boson Higgs Audio v3 4B: The Conversational Voice

Boson AI's Higgs Audio v3 TTS is a 4-billion-parameter text-to-speech model built on a Qwen3-4B backbone. It is designed specifically for conversational voice agents.

What makes it special:

102 languages with single-digit WER/CER across the board.
21+ emotions: Dynamically adjustable via inline control tags. Includes singing, whispering, and shouting.
Sub-second time-to-first-audio (TTFA): Essential for real-time conversational agents.
Zero-shot voice cloning from short reference clips.
Streaming synthesis: Starts generating audio before the full text is provided.

For businesses building voice agents, customer service bots, or interactive voice systems, Higgs Audio v3 provides production-grade multilingual expressive speech without API dependencies. The emotion control is particularly powerful for customer experience applications.

RedNote dots.tts: The Codec-Free Revolution

RedNote's dots.tts is a 2-billion-parameter fully continuous, end-to-end autoregressive TTS system released under Apache 2.0.

What makes it special:

No discrete tokens anywhere in the pipeline: The only fully continuous open TTS system. Uses a 48kHz AudioVAE with autoregressive flow-matching acoustic head.
Three variants: dots.tts-base (pretrained), dots.tts-soar (Self-corrective Alignment for higher fidelity), and dots.tts-mf (MeanFlow distillation for low-latency few-step inference).
Apache 2.0 license: Fully open for commercial use.
24-language speaker similarity: Strong voice cloning across languages.

The technical innovation here is significant. By eliminating codec-based tokenization entirely, dots.tts produces more natural prosody and fewer artifacts than traditional TTS systems. This is the future direction of speech synthesis.

Google Magenta RealTime 2: Live Music Generation

Google's Magenta RealTime 2 is an open-weights model for real-time music generation with approximately 200ms latency.

What makes it special:

Interactive control: Musicians can guide generation through MIDI, text prompts, and audio inputs in real-time.
Two sizes: mrt2_base (2.4B parameters) for quality, mrt2_small (230M parameters) for speed.
DAW integration: Includes example applications and plugins for macOS.
JAX and MLX backends: Plus a C++ inference engine optimized for Apple Silicon.
Apache 2.0 code, CC-BY 4.0 weights: Open for both research and commercial use.

Magenta RealTime 2 is the first open model that makes live AI-assisted music performance practical. For creative businesses, media production, and content creators, this opens entirely new workflows.

NVIDIA Nemotron-3.5 ASR: Streaming Speech at Scale

NVIDIA's Nemotron-3.5 ASR is a 600-million-parameter streaming speech recognition model.

What makes it special:

40 language-locales from a single checkpoint in real-time.
17x more concurrent streams than the previous Parakeet RNNT 1.1B model.
Configurable latency from 80ms to 1.12 seconds.
Native punctuation and capitalization: Production-ready output without post-processing.
Cache-Aware FastConformer-RNNT: Processes each audio frame once for maximum efficiency.
OpenMDW-1.1 license: Full transparency and fine-tuning capability.
Runs on laptops: Efficient enough for consumer hardware.

For businesses that need real-time transcription across multiple languages, meeting recording, or accessibility features, Nemotron-3.5 ASR eliminates the need for per-minute API services. Deploy it once, use it forever.

Vision and VLMs: SOTA at Surprisingly Small Sizes

PaddleOCR-VL-1.6: The Document Parsing Champion

Baidu's PaddleOCR-VL-1.6 is a 0.9-billion-parameter vision-language model that achieves state-of-the-art document parsing results.

What makes it special:

96.33% on OmniDocBench v1.6: The highest score ever recorded on this benchmark.
Under 1 billion parameters: Achieves performance that previously required models 10x larger.
Comprehensive parsing: Handles text, tables, formulas, charts, seals, and even ancient Chinese documents.
Real-world robustness: Tested against scanning, warping, skew, screen photography, and illumination variation.
Apache 2.0 license: Drop-in compatible with PaddleOCR-VL-1.5.

This model is immediately useful for any business that processes documents. Invoice extraction, contract analysis, form digitization, and compliance document review all become local, private, and free. Running at under 1B parameters means it deploys on virtually any hardware.

Baidu NAVA: Joint Audio-Video Generation

Baidu's NAVA (Native Audio-Visual Alignment) is a 6.3-billion-parameter model for joint audio-video generation.

What makes it special:

"Align-then-Fuse" MMDiT architecture: 10 Hierarchical Alignment Layers plus 20 Unified Fusion Layers for precise A/V synchronization.
Best-in-class audio-visual sync: Highest Sync-C and Sync-D scores in its category.
720p video with stereo audio: Generates synchronized audiovisual content from a single text prompt.
Timbre-in-Context Conditioning: Controllable speech timbre with reference audio.
Language-described camera control: Specify shot composition, motion, and pacing via text.
Apache 2.0 license: Open for commercial deployment.

NAVA represents a new category: unified audio-visual generation rather than separate video and audio pipelines stitched together. For marketing content, social media, and corporate video production, this enables entirely new workflows.

Video, 3D, and World Models

NVIDIA Cosmos3-Super: The Physical AI Foundation

NVIDIA's Cosmos3-Super is a 64-billion-parameter omnimodal world model built on a Mixture-of-Transformers architecture.

What makes it special:

64B parameters: Split into a 32B reasoner (VLM) and 32B generator.
Omnimodal I/O: Processes and generates text, images, video (with or without audio), ambient sound, and action trajectories.
Physical reasoning: Understands motion, causality, and physics. The model can predict what happens next in physical scenarios.
Action generation: Produces numerical action data (joint angles, gripper positions, trajectory points) for robot control.
Synthetic data generation: Purpose-built for training physical AI systems when real-world data collection is expensive or impossible.
Open weights on HuggingFace.

Cosmos3-Super is not a consumer tool. It is infrastructure for the robotics and autonomous systems industry. But its availability as an open model means that robotics startups, university labs, and engineering firms can now access world-class simulation capabilities without NVIDIA licensing fees.

JD JoyAI-Echo: Five-Minute Multi-Shot Video Stories

JD.com's JoyAI-Echo is an open-source framework for generating coherent multi-shot video stories up to five minutes in length.

What makes it special:

5-minute multi-shot narratives: Generates coherent sequences of shots from a single prompt JSON.
Cross-modal audio-visual memory bank: Maintains consistent character appearance and voice timbre throughout the entire video.
7.5x inference speedup via DMD distillation.
Joint synchronized audio-video: Video and audio from a single pipeline.
Interactive conversational agent: Real-time editing through conversational instructions.
Built on LTX-2.3 with Gemma-3-12B as text encoder.

JoyAI-Echo tackles one of the hardest problems in AI video: temporal consistency across multiple shots. The memory bank approach to maintaining character and voice consistency is a genuine innovation that makes AI-generated narrative video practical for the first time.

ByteDance Bernini-R: Unified Generation and Editing

ByteDance's Bernini-R is an open-source unified framework combining an MLLM-based semantic planner with a DiT-based renderer for video generation and editing.

What makes it special:

Unified pipeline: Text-to-image, image editing, text-to-video, and instruction-based video editing in a single framework.
Consistency in edits: Maintains identity and coherence across subject-to-video tasks.
Open weights released June 1, 2026.

VAST TripoSplat: Single Image to 3D

VAST AI Research's TripoSplat converts a single 2D image into high-quality 3D Gaussian splats.

What makes it special:

Single image input: No multi-view or depth data required.
3D Gaussian splats output: The modern standard for real-time 3D rendering.
MIT license: Fully open for any use case.

For architecture visualization, product design, e-commerce, and real estate, TripoSplat makes 3D asset creation as simple as taking a photo.

H Company Holo-3.1-4B: Computer Use Agents

H Company released Holo-3.1, a family of vision-language models specifically designed for computer use agents. The 4B variant is the sweet spot for local deployment.

What makes it special:

Built for computer use: Web, desktop, and mobile automation.
Native function calling: Seamless integration with agent frameworks.
Multiple sizes: 0.8B, 4B, 9B, and 35B-A3B variants with quantized options.
Based on Qwen 3.5 family: Leveraging a proven foundation.
Apache 2.0 license: Open for commercial deployment.

Holo-3.1 fills a critical gap: open models specifically trained for computer interaction rather than general chat. For businesses building automation agents that interact with software interfaces, this is purpose-built.

What This Week Means: Three Big Takeaways

1. The Cost of Frontier AI Just Dropped to Zero

Before this week, accessing capabilities like GPT-rival image generation, 102-language speech synthesis, or SOTA document parsing required paid API subscriptions. After this week, every single one of these capabilities has a production-grade open alternative. The total cost of frontier AI for a growing business is now hardware plus electricity.

2. Open Models Are No Longer Behind Closed Models

The performance gap between open and closed models effectively closed this week. Ideogram 4 ranks #2 globally in image generation. Nemotron 3 Ultra's MMLU 89.1 rivals the best closed systems. PaddleOCR-VL-1.6 sets the absolute SOTA in document parsing at any price. The narrative that "open models are always a step behind" is no longer true.

3. The MoE Architecture Won

Look at the models on this list: Nemotron 3 Ultra (MoE), Step-3.7-Flash (MoE), LFM2.5 (MoE), Mellum2 (MoE), Cosmos3-Super (MoT). The industry has converged on sparse MoE as the architecture for production AI. Massive total parameters with small active footprints means you get frontier performance at a fraction of the inference cost.

How Australian Businesses Can Use These Models Today

Document processing: Deploy PaddleOCR-VL-1.6 locally for invoice, contract, and form processing. No API costs, no data leaving your network.

Customer service voice agents: Use Boson Higgs Audio v3 with Nemotron-3.5 ASR to build multilingual conversational agents with expressive speech.

Marketing visuals: Generate branded imagery with Ideogram 4. Text rendering in images is finally reliable.

Code and development: Step-3.7-Flash and Mellum2 for code generation, review, and UI understanding. Deploy in your CI/CD pipeline.

On-device analysis: LFM2.5-8B-A1B for field engineering, offline analytics, or edge deployments where cloud connectivity is unreliable.

Video content: JoyAI-Echo for multi-shot narrative video, NAVA for synchronized audio-visual content, Bernini-R for unified generation and editing.

Physical AI and simulation: Cosmos3-Super for robotics, autonomous systems, and synthetic data generation.

The Complete Model Reference

Here is every notable open-weight model from the first week of June 2026:

NVIDIA Nemotron 3 Ultra: 550B total, 55B active, hybrid Mamba-MoE, 1M context, OpenMDW-1.1
Google Gemma 4 12B: 12B dense, any-to-any multimodal, 256K context, Apache 2.0
StepFun Step-3.7-Flash: 198B total, 11B active, MoE VLM, 256K context, Apache 2.0
Liquid AI LFM2.5-8B-A1B: 8.3B total, 1.5B active, edge MoE, 128K context
JetBrains Mellum2-12B-A2.5B: 12B total, 2.5B active, MoE coding, 131K context, Apache 2.0
Ideogram 4: 9.3B flow-matching DiT, native 2K, structured JSON prompting
Boson Higgs Audio v3: 4B TTS, 102 languages, 21 emotions, streaming synthesis
RedNote dots.tts: 2B fully continuous TTS, no codec, Apache 2.0
Google Magenta RealTime 2: 2.4B (base) / 230M (small), real-time music, <200ms latency, CC-BY 4.0
NVIDIA Nemotron-3.5 ASR: 600M streaming ASR, 40 locales, OpenMDW-1.1
PaddleOCR-VL-1.6: 0.9B document parsing, SOTA OmniDocBench, Apache 2.0
Baidu NAVA: 6.3B joint audio-video gen, 720p stereo, Apache 2.0
NVIDIA Cosmos3-Super: 64B omnimodal world model, physical AI
JD JoyAI-Echo: Multi-shot 5-min video, LTX-2.3 based
ByteDance Bernini-R: Unified image/video gen and editing
VAST TripoSplat: Single image to 3D Gaussian splats, MIT license
H Company Holo-3.1-4B: Computer use agents, web/desktop/mobile, Apache 2.0

FAQ

What is the best open-weight model released in June 2026?

It depends on your use case. NVIDIA Nemotron 3 Ultra is the most capable overall LLM. Google Gemma 4 12B is the most practical for general deployment. Ideogram 4 is the best for image generation. PaddleOCR-VL-1.6 is the best for document processing. All are available today under permissive licenses.

Can Australian businesses use these models commercially?

Yes. Most models in this roundup are released under Apache 2.0 or similarly permissive licenses that allow commercial use. NVIDIA's OpenMDW-1.1 and Google's CC-BY 4.0 also permit commercial deployment. Always check the specific license for each model before deployment.

What hardware do I need to run these models?

It varies dramatically. Google Gemma 4 12B runs on a laptop with 16GB RAM. PaddleOCR-VL-1.6 at 0.9B params runs on virtually anything. Liquid AI LFM2.5 with 1.5B active params is designed for edge devices. NVIDIA Nemotron 3 Ultra at 550B requires datacenter-grade GPUs, though the NVFP4 variant significantly reduces the hardware requirements.

How do open-weight models compare to paid API services?

After this week, the performance gap is minimal for most use cases. Ideogram 4 ranks #2 globally in image generation behind only GPT Image 2. Nemotron 3 Ultra's benchmarks rival the best closed models. The main trade-off is convenience (managed APIs handle infrastructure) versus cost and privacy (self-hosted models have no per-call fees and keep data local).

What is the MoE architecture and why does it matter?

Mixture-of-Experts (MoE) activates only a subset of a model's total parameters for each input. This means a 550B-parameter model might only use 55B per token, delivering frontier performance at a fraction of the computational cost. MoE is the reason these massive models are becoming practical to deploy.

About the author: AJ Awan is the founder of Flowtivity, an AI consultancy helping Australian businesses deploy practical AI solutions. He brings 9+ years of consulting experience from EY, specializing in workflow automation and AI agent deployment.

The Week Open-Source AI Went Nuclear: 25+ Open-Weight Drops That Changed Everything

Key Takeaways

Why This Week Matters for Australian Businesses

The LLMs: Open Models That Rival Closed Systems

NVIDIA Nemotron 3 Ultra: The 550B Hybrid Behemoth

Google Gemma 4 12B: The Everything Model for Your Laptop

StepFun Step-3.7-Flash: The 198B Coding Visionary

Liquid AI LFM2.5-8B-A1B: The Edge King

JetBrains Mellum2-12B-A2.5B-Thinking: The Dev Tool Specialist

Image Generation: Ideogram 4 Changes the Game

Ideogram 4: The First Open-Weight Image Model With Taste

Audio and Speech: Four Labs, One Breakout Week

Boson Higgs Audio v3 4B: The Conversational Voice

RedNote dots.tts: The Codec-Free Revolution

Google Magenta RealTime 2: Live Music Generation

NVIDIA Nemotron-3.5 ASR: Streaming Speech at Scale

Vision and VLMs: SOTA at Surprisingly Small Sizes

PaddleOCR-VL-1.6: The Document Parsing Champion

Baidu NAVA: Joint Audio-Video Generation

Video, 3D, and World Models

NVIDIA Cosmos3-Super: The Physical AI Foundation

JD JoyAI-Echo: Five-Minute Multi-Shot Video Stories

ByteDance Bernini-R: Unified Generation and Editing

VAST TripoSplat: Single Image to 3D

H Company Holo-3.1-4B: Computer Use Agents

What This Week Means: Three Big Takeaways

1. The Cost of Frontier AI Just Dropped to Zero

2. Open Models Are No Longer Behind Closed Models

3. The MoE Architecture Won

How Australian Businesses Can Use These Models Today

The Complete Model Reference

FAQ

What is the best open-weight model released in June 2026?

Can Australian businesses use these models commercially?

What hardware do I need to run these models?

How do open-weight models compare to paid API services?

What is the MoE architecture and why does it matter?

You might also like

AI's Biggest Winners Have the Lowest Margins

Kimi K2.7 Code vs MiniMax M3: Open-Source AI Coding Models Compared

Hermes Agent vs OpenAI Codex vs Claude Cowork: The Coding Agent Showdown

Want AI insights for your business?