Friday Roundup - Week 4: New Tools for AI Development

Google is making moves. Their Gemini 3 Pro model now powers AI Overviews for complex search queries. API calls more than doubled to 85 billion by August. Andrew Ng called it at Davos: “Google is clearly having a moment.”

This week I shipped six new tools for building AI-powered developer workflows. Here’s what they do and why they matter.

New Tools This Week

sdlc-quality automates quality gates across your software lifecycle. It runs standards checks, enforces documentation requirements, and catches issues before they reach review.

rlm-rs and rlm-rs-plugin implement the Recursive Language Models paper in Rust. The paper shows how LLMs can handle inputs two orders of magnitude beyond context windows by recursively processing snippets. These tools bring that approach to production with fast retrieval and minimal overhead.

human-voice detects AI writing patterns in your documentation. It catches em dashes, buzzwords like “delve” and “realm”, and hedging phrases that make content sound robotic. Run it before publishing anything.

aesth focuses on aesthetic consistency in documentation and code. It enforces visual standards and catches formatting drift that creeps in over time.

documentation-review manages your technical docs end-to-end. It reviews for accuracy, generates docs from code analysis, updates outdated content, and enforces changelog formats.

These tools solve real friction in AI-assisted development: keeping quality high while moving fast.

AI Agent Research

Three papers from ArXiv caught my attention this week:

AI Agent Systems: Architectures, Applications, and Evaluation (arXiv:2601.01743v1) surveys the current state of AI agents that combine foundation models with reasoning, planning, memory, and tool use. The taxonomy covers agent architectures from policy cores to tool routers. Essential reading if you’re building tool-using agents. The benchmarking section alone is worth your time.

Memory in the Age of AI Agents (arXiv:2512.13564) breaks down what “agent memory” actually means. It distinguishes token-level context from RAG systems and static caches. The paper categorizes memory types (parametric, latent, experiential) and proposes frameworks for benchmarking. If your agents forget things they shouldn’t, this paper explains why.

AstroReason-Bench: Evaluating Unified Agentic Planning (arXiv:2601.11354) presents a benchmark for evaluating agentic planning in complex space mission scenarios. Agents coordinate tools, memory, and reasoning over extended problem sequences. The benchmark helps compare planning approaches across different agent architectures.

Google’s Gemini Push

Google routes complex queries to Gemini 3 Pro while using faster models for simple tasks. Smart routing means you get frontier-level reasoning only when you need it. Subscribers see this in AI Overviews at the top of search results.

The enterprise side tells the real story. Gemini Enterprise grew to 8 million users. Google Cloud revenue jumped 34% year-over-year to $15.2 billion in Q3 2025. They signed more billion-dollar deals in three quarters than in the previous two years combined.

Andrew Ng’s take: the AI landscape is “white hot” with opportunities for Anthropic, OpenAI, and others. Google’s distribution advantage and the influence of Page and Brin are driving their current momentum. He sees Gemini and ChatGPT as the two leaders in horizontal information discovery.

What This Means

The tools I shipped this week address the same challenge Google is tackling: maintaining quality at scale. When AI generates code, documentation, and content faster than humans can review it, you need automated checks that catch problems early.

The research papers map the territory we’re exploring. Agent memory, tool coordination, and planning remain hard problems. The benchmarks give us ways to measure progress instead of just claiming it.

Google’s success with Gemini 3 shows the market wants AI that knows when to work hard and when to stay fast. That same principle applies to development tools: run expensive checks when they matter, skip them when they don’t.

Try the tools: All six are open source on GitHub. Start with human-voice if you publish technical content. It catches patterns you’ve stopped noticing.

Read the papers: The AI Agent Systems survey covers everything from architecture to evaluation. Good starting point for agent development.

Watch Google: Their routing approach (fast models for simple queries, Gemini 3 for complex ones) is the pattern we’ll all end up using. Pay attention to how they balance cost and quality.