Agent-Native platform with claim-based task dispatching, pre-qualification gating, and heterogeneous role collaboration across annotator and QA agents.
抢注式任务分发 + 资格门控 + 异构角色协作,Plan-Act + Reflection 推理范式,Workflow + Agent 混合架构。
AI Product Manager
Agent & Data Strategy
Building AI-Native products at Baidu — Agent platforms, multi-scenario evaluation systems, and training-data strategy across 15+ business lines.
Currently leading Agent platforms, multi-scenario evaluation systems, and training-data strategy at Baidu — covering medical agents, virtual-companion agents, and 15+ business lines.
I practice an AI Native methodology: Agent as a first-class citizen, Specification-Driven, Workflow + Agent hybrid, Evaluation-First, Vibe Coding.
A compact case index: each row exposes the outcome first, then expands into the system logic behind it.
Agent-Native platform with claim-based task dispatching, pre-qualification gating, and heterogeneous role collaboration across annotator and QA agents.
抢注式任务分发 + 资格门控 + 异构角色协作,Plan-Act + Reflection 推理范式,Workflow + Agent 混合架构。
Five-stage evaluation pipeline: wiki, generation, refine, eval, analyse. Wiki-RAG enhanced Rubric generation plus Propose-Evaluate-Revise self-iteration.
医疗问诊 Agent 五阶段解耦评测 Pipeline,覆盖 50+ 临床病种,Cohen’s κ = 0.78。
Specification-Driven evaluation for virtual companion dialogue agents across WenXiaoYan, Shoubai, and in-car products.
-1/0/1 Likert scoring + 11-dimension Analytic Rubric; 21+ batch runs; label consistency 97%。
End-to-end automated pipeline for SFT / DPO / preference data across 15+ business lines, with Bad Case feedback loops as a standard correction mechanism.
线上 Bad Case 回流 → 归因 → 数据补充,面向概率性 Agent 产品建立标准纠错机制。
A multi-agent system modeling real companies — AI agents communicate peer-to-peer through Feishu/Lark with LLM-planned discussion phases.
A knowledge base for LLM evaluation: methodologies, benchmarks, model comparisons, industry trends, and hands-on experience.
面向大模型评测工作的知识库:评测方法论、基准测试、模型对比、行业动态和评测实践经验。
Daily AI intelligence digest — automatically fetches trending AI projects from GitHub, performs deep analysis via LLMs, and delivers daily digest emails.
每日 AI 情报摘要系统:自动获取 AI 领域热点项目,通过大模型深度分析后发送每日情报邮件。
The original card grid is reframed as an operating model: specs define good work, agents execute open-ended work, workflows keep deterministic control, and evaluation closes the loop.
For Data Strategy / Data Production / Data Evaluation roles. Focus on SFT/DPO data pipelines, multi-scenario evaluation systems, and 15+ business-line delivery.
View Resume → Track BFor Agent / Agent Infra / AI-Native PM roles. Focus on Multi-Agent orchestration, Tool / Skill protocols, and AI-Native product methodology.
View Resume →