SSenWei AI-native PM
Product · Agents · Data Building systems where AI does the work

SenWei.

AI Product Manager
Agent & Data Strategy

Building AI-Native products at Baidu — Agent platforms, multi-scenario evaluation systems, and training-data strategy across 15+ business lines.

Scroll ↓
About

4 years building
AI-Native products.

Currently leading Agent platforms, multi-scenario evaluation systems, and training-data strategy at Baidu — covering medical agents, virtual-companion agents, and 15+ business lines.

I practice an AI Native methodology: Agent as a first-class citizen, Specification-Driven, Workflow + Agent hybrid, Evaluation-First, Vibe Coding.

Methodology

AI NativeAgent-NativeSpecification-DrivenEvaluation-FirstVibe Coding

Capabilities

Multi-Agent OrchestrationTool / Skill ProtocolLLM-as-JudgeRubric DesignRAGPrompt Engineering

Domain

Agent PlatformData StrategyEvaluation SystemSFT / DPOMCPFunction Calling
Selected Work

What I’m building.

A compact case index: each row exposes the outcome first, then expands into the system logic behind it.

Agent-Native platform with claim-based task dispatching, pre-qualification gating, and heterogeneous role collaboration across annotator and QA agents.

抢注式任务分发 + 资格门控 + 异构角色协作,Plan-Act + Reflection 推理范式,Workflow + Agent 混合架构。

System proof Role0→1 PM / architect LoopPlan · Act · Reflect Scale100k+ tasks/day

Five-stage evaluation pipeline: wiki, generation, refine, eval, analyse. Wiki-RAG enhanced Rubric generation plus Propose-Evaluate-Revise self-iteration.

医疗问诊 Agent 五阶段解耦评测 Pipeline,覆盖 50+ 临床病种,Cohen’s κ = 0.78。

Evaluation proof AssetCase-specific rubric LoopPropose · Evaluate · Revise AgreementCohen’s κ = 0.78

Specification-Driven evaluation for virtual companion dialogue agents across WenXiaoYan, Shoubai, and in-car products.

-1/0/1 Likert scoring + 11-dimension Analytic Rubric; 21+ batch runs; label consistency 97%。

Rubric proof Scoring-1 / 0 / 1 Likert Reuse3 product lines Quality97% consistency

End-to-end automated pipeline for SFT / DPO / preference data across 15+ business lines, with Bad Case feedback loops as a standard correction mechanism.

线上 Bad Case 回流 → 归因 → 数据补充,面向概率性 Agent 产品建立标准纠错机制。

Data proof Scope15+ business lines Throughput100k+ daily MechanismBad Case flywheel

A multi-agent system modeling real companies — AI agents communicate peer-to-peer through Feishu/Lark with LLM-planned discussion phases.

View GitHub →

Builder proof MediumFeishu / Lark PatternPeer-to-peer agents ModePersonal lab

A knowledge base for LLM evaluation: methodologies, benchmarks, model comparisons, industry trends, and hands-on experience.

面向大模型评测工作的知识库:评测方法论、基准测试、模型对比、行业动态和评测实践经验。

View GitHub →

Knowledge proof DomainLLM evaluation FormatWiki / field notes ModePersonal practice

Daily AI intelligence digest — automatically fetches trending AI projects from GitHub, performs deep analysis via LLMs, and delivers daily digest emails.

每日 AI 情报摘要系统:自动获取 AI 领域热点项目,通过大模型深度分析后发送每日情报邮件。

View GitHub →

Automation proof SourceGitHub trends AnalysisLLM deep summary DeliveryDaily email
Methodology

AI Native
Product Principles.

The original card grid is reframed as an operating model: specs define good work, agents execute open-ended work, workflows keep deterministic control, and evaluation closes the loop.

AI Native
Operating Model
Agent as First-class CitizenAPI / CLI first. GUI as wrapper.
Specification-DrivenRubrics replace hardcoded rules.
Workflow + Agent HybridDeterminism + autonomy.
Evaluation-FirstEvaluation drives iteration.
Vibe CodingPMs ship MVPs themselves.
Capability over FeatureEmergence beats stacked modules.
Resume

Two tracks.
One operator.