Travis Muhlestein's picture

Travis Muhlestein PRO

TravisMuhlestein

·

travis-muhlestein

AI & ML interests

all AI & ML Interests

Recent Activity

posted an update 2 days ago

From AI demos to production systems: what breaks when agents become autonomous? A recurring lesson from production AI deployments is that most failures are system failures, not model failures. As organizations move beyond pilots, challenges increasingly shift toward: • Agent identity and permissioning • Trust boundaries between agents and human operators • Governance and auditability for autonomous actions • Security treated as a first-class architectural constraint This recent Fortune article highlights how enterprises are navigating that transition, including work with AWS’s AI Innovation Lab. Open question for the community: What architectural patterns or tooling are proving effective for managing identity, permissions, and safety in autonomous or semi-autonomous agent systems in production? Context: https://fortune.com/2025/12/19/amazon-aws-innovation-lab-aiq/

posted an update about 1 month ago

Calibrating LLM-as-a-Judge: Why Evaluation Needs to Evolve As AI systems become more agentic and interconnected, evaluation is turning into one of the most important layers of the stack. At GoDaddy, we’ve been studying how LLMs behave when used as evaluators—not generators—and what it takes to trust their judgments. A few highlights from our latest engineering write-up: 🔹 Raw LLM scores drift and disagree, even on identical inputs 🔹 Calibration curves help stabilize model scoring behavior 🔹 Multi-model consensus reduces single-model bias and variance 🔹 These techniques support safer agent-to-agent decision making and strengthen our broader trust infrastructure (ANS, agentic systems, etc.) If you're building agents, autonomous systems, or any pipeline that relies on “AI judging AI,” calibration isn’t optional — it's foundational. 👉 Full write-up: Calibrating Scores of LLM-as-a-Judge https://www.godaddy.com/resources/news/calibrating-scores-of-llm-as-a-judge Would love feedback from the HF community: How are you calibrating or benchmarking model evaluators in your own workflows?

posted an update about 1 month ago

🚀 GoDaddy ANS API Now Live — Bringing Verifiable Identity to the Agent Ecosystem We just launched the Agent Name Service (ANS) API) publicly, along with the new ANS Standards site, extending decades of GoDaddy internet-scale trust into the emerging world of autonomous agents. ANS provides cryptographically verifiable identity, human-readable names, and policy metadata for agents — designed to work across frameworks like A2A, MCP, and future agent protocols. What’s new: 🔹ANS API is open to all developers — generate a GoDaddy API key and start testing registration, discovery, and lifecycle ops. 🔹ANS Standards Site is live — includes the latest spec, architecture, and implementation guidance. 🔹Protocol-agnostic adapter layer — supports interoperability without vendor lock-in. Why it matters: As autonomous agents continue to proliferate, we need neutral, verifiable identity to prevent spoofing, trust rot, and fragmented ecosystems. ANS brings DNS-like discovery and PKI-based validation to the agent economy. 🔗 Links Standards & docs: https://www.agentnameregistry.org/ API keys: https://developer.godaddy.com/keys Repo: https://github.com/godaddy/ans-registry PR: https://aboutus.godaddy.net/newsroom/news-releases/press-release-details/2025/GoDaddy-advances-trusted-AI-agent-identity-with-ANS-API-and-Standards-site/default.aspx Would love to hear thoughts from the community: What should a universal agent identity layer guarantee — and what should it avoid?

View all activity

Organizations

TravisMuhlestein 's models

None public yet