AI & LLMs

30 articles · latest first

Moonshot Kimi K3: reasoning LLM optimized for long-context and code workflows

Moonshot released Kimi K3, a reasoning-focused LLM with improved long-context and code abilities. Platform teams should prioritize inference stack updates.

Jul 18, 2026·3mmoonshot-kimi-k3llm-deployment

AI & LLMs

Anthropic Claude Sonnet: 1M-Token Code Context, Introductory Pricing, and Platform Impact

Anthropic's Claude Sonnet is now a mid-tier default with a native 1M-token code context and intro $2/$10 per-million-token pricing; platform teams must act.

Jul 16, 2026·3manthropicclaude-sonnet

AI & LLMs

NVIDIA nvDock & CWIP-1.0: Containerized LLM Inference for Multi-GPU Clusters

NVIDIA's nvDock and CWIP-1.0 package containerized LLM images and inference workflows for multi-GPU clusters, simplifying sharding, registries, and hooks.

Jul 13, 2026·3mnvidia-nvdockllm-inference

AI & LLMs

OpenAI rolls out o3-pro and makes gpt-4o-mini the small-model default

OpenAI rolled out o3-pro to Pro/Team users and made gpt-4o-mini the small-model default, shifting ChatGPT and API workflows to the o-series reasoning stack.

Jul 9, 2026·3mopenaigpt-4o-mini

AI & LLMs

DeepSeek V4‑Pro 1.6T: Open‑weight LLM with 1M‑Token Context and Platform Implications

DeepSeek V4‑Pro claims a 1M‑token context. Platform teams must treat context management, memory sharding, and inference cost as infrastructure problems.

Jul 7, 2026·3mopen-weight-llmlong-context

AI & LLMs

Hugging Face: open-weight LLM uploads and vLLM/TGI/Ollama/llama.cpp inference tooling updates

Multiple open-weight LLM uploads plus inference/runtime SDK updates on Hugging Face lower the bar for platform teams to self-host competitive 7–20B models.

Jul 6, 2026·3mopen-weightsinference-tooling

AI & LLMs

OpenAI ChatGPT memory upgrade: reviewable memories and Enterprise workspace agents — a platform checklist

OpenAI added cross-session ChatGPT memory and reviewable summaries plus Enterprise workspace agents that run background workflows. Platform ops must adapt.

Jul 5, 2026·3mopenaichatgpt-memory

AI & LLMs

Undated OpenAI model release notes break week-long LLM release roundups

OpenAI's aggregated, undated model release notes make week‑bounded LLM roundups unreliable and leave platform teams blind to exact change timestamps now.

Jul 3, 2026·3mllm-releasesmodel-release-notes

AI & LLMs

OpenAI GPT-4o mini tier, Realtime API expansion, and agent/inference primitives that matter

OpenAI rolled a smaller GPT-4o 'mini' tier, widened Realtime availability, and vendors shipped agent and inference primitives that shift platform engineering.

Jul 1, 2026·3mopenaiagents

AI & LLMs

Hugging Face open-weight releases: self-hostable reasoning & code checkpoints (week ending 2026-06-27)

This week new open-weight Hugging Face checkpoints report MMLU and HumanEval gains, making self-hosted reasoning- and code-specialized models more viable.

Jun 30, 2026·3mhugging-facellm-inference

AI & LLMs

Anthropic Claude Opus 4.x: Minor Rollout and API Tuning — LLM Ops Implications

Anthropic rolled out a minor Claude Opus 4.x update with API tuning and code-gen gains. Vendors pushed small model and runtime tweaks; ops teams must adapt.

Jun 28, 2026·3mmodel-releasesagent-frameworks

AI & LLMs

OpenAI exposes GPT-4o reasoning variants in Assistants & Realtime APIs — platform implications

OpenAI added reasoning-focused GPT-4o configs to Assistants and Realtime APIs; platform teams should invest in orchestration, tool reliability, and inference

Jun 26, 2026·3mopenaigpt-4o

18 more