Product · Agents · Data Building systems where AI does the work

SenWei.

AI Product Manager
Agent & Data Strategy

Building AI-Native products at Baidu — Agent platforms, multi-scenario evaluation systems, and training-data strategy across 15+ business lines.

GitHub → Email → Selected Work ↓

Scroll ↓

About

4 years building
AI-Native products.

Currently leading Agent platforms, multi-scenario evaluation systems, and training-data strategy at Baidu — covering medical agents, virtual-companion agents, and 15+ business lines.

I practice an AI Native methodology: Agent as a first-class citizen, Specification-Driven, Workflow + Agent hybrid, Evaluation-First, Vibe Coding.

Selected Work

What I’m building.

A compact case index: each row exposes the outcome first, then expands into the system logic behind it.

Agent-Native platform with claim-based task dispatching, pre-qualification gating, and heterogeneous role collaboration across annotator and QA agents.

抢注式任务分发 + 资格门控 + 异构角色协作，Plan-Act + Reflection 推理范式，Workflow + Agent 混合架构。

System proof Role0→1 PM / architect LoopPlan · Act · Reflect Scale100k+ tasks/day

Five-stage evaluation pipeline: wiki, generation, refine, eval, analyse. Wiki-RAG enhanced Rubric generation plus Propose-Evaluate-Revise self-iteration.

医疗问诊 Agent 五阶段解耦评测 Pipeline，覆盖 50+ 临床病种，Cohen’s κ = 0.78。

Evaluation proof AssetCase-specific rubric LoopPropose · Evaluate · Revise AgreementCohen’s κ = 0.78

Specification-Driven evaluation for virtual companion dialogue agents across WenXiaoYan, Shoubai, and in-car products.

-1/0/1 Likert scoring + 11-dimension Analytic Rubric; 21+ batch runs; label consistency 97%。

Rubric proof Scoring-1 / 0 / 1 Likert Reuse3 product lines Quality97% consistency

End-to-end automated pipeline for SFT / DPO / preference data across 15+ business lines, with Bad Case feedback loops as a standard correction mechanism.

线上 Bad Case 回流 → 归因 → 数据补充，面向概率性 Agent 产品建立标准纠错机制。

Data proof Scope15+ business lines Throughput100k+ daily MechanismBad Case flywheel

Full-stack HR operations platform for data-annotation teams: nine-dimension roster management, cross-project dispatch with approval flows and timelines, unified person-day performance metrics, and natural-language AI queries. React 18 + Express 5 + Prisma 6 + PostgreSQL 16, shipped via Docker Compose.

面向数据标注团队的项目人力看板与协同平台——花名册、跨项目调度审批、绩效与成长值统计、AI 自然语言查询、三级 RBAC 权限体系。React + Express + Prisma + PostgreSQL，Docker Compose 部署。

System proof Role0→1 full-stack Scale30K+ LOC · 40+ APIs Access3-tier RBAC

Three-layer evaluation for a micro-expression labeling system (spec v2.7): S0–S5 clip screening with hard filters and VLM quality gates, IAA-based spec stability, VLM-vs-human pipeline accuracy, and a label→generate→relabel semantic-fidelity loop, backed by a seven-factor ablation framework.

微表情标签体系三层评测——S0–S5 视频片段筛选（硬筛 + VLM 质量门）、规范稳定性（IAA）、pipeline 准确度（VLM vs 人工）与端到端语义保真闭环，配套 7 因子消融实验框架。

Evaluation proof Specv2.7 · 42 emotions ScreeningS0–S5 hard + soft gates Ablation7 factors · 4 runs

Professional tarot-reading MVP — a GSAP-driven shuffle-and-draw ritual, a rules engine computing elemental dignities and numeric arcs for free readings, two-tier paid AI readings via DeepSeek, and a complete international payment loop with Creem, Cloudflare Functions and KV.

专业塔罗阅读 MVP——GSAP 洗牌选牌仪式，基于元素尊卑与数字弧的规则引擎驱动免费解读，DeepSeek 双档位付费 AI 解读，Creem + Cloudflare Functions + KV 跑通完整海外收款链路。

Builder proof Engine78 cards + dignities Monetize$1.99 / $2.99 tiers StackReact 19 · CF Functions

A multi-agent system modeling real companies — AI agents communicate peer-to-peer through Feishu/Lark with LLM-planned discussion phases.

View GitHub →

Builder proof MediumFeishu / Lark PatternPeer-to-peer agents ModePersonal lab

A knowledge base for LLM evaluation: methodologies, benchmarks, model comparisons, industry trends, and hands-on experience.

面向大模型评测工作的知识库：评测方法论、基准测试、模型对比、行业动态和评测实践经验。

View GitHub →

Knowledge proof DomainLLM evaluation FormatWiki / field notes ModePersonal practice

Daily AI intelligence digest — automatically fetches trending AI projects from GitHub, performs deep analysis via LLMs, and delivers daily digest emails.

每日 AI 情报摘要系统：自动获取 AI 领域热点项目，通过大模型深度分析后发送每日情报邮件。

View GitHub →

Automation proof SourceGitHub trends AnalysisLLM deep summary DeliveryDaily email

Methodology

AI Native
Product Principles.

The original card grid is reframed as an operating model: specs define good work, agents execute open-ended work, workflows keep deterministic control, and evaluation closes the loop.

AI Native
Operating Model

Agent as First-class CitizenAPI / CLI first. GUI as wrapper.

Specification-DrivenRubrics replace hardcoded rules.

Workflow + Agent HybridDeterminism + autonomy.

Evaluation-FirstEvaluation drives iteration.

Vibe CodingPMs ship MVPs themselves.

Capability over FeatureEmergence beats stacked modules.

Resume

One track.
AI-Native Agent.

Resume Dossier

AI-Native Agent PM

Agent data-strategy product manager practicing a complete AI-Native methodology — Agent as first-class citizen, Specification-Driven, Workflow + Agent hybrid architecture, Evaluation-First, and full-stack Vibe Coding. From 0→1 Agent data-production platform to multi-scenario agent evaluation systems.

View Resume →

SenWei.

4 years building
AI-Native products.

What I’m building.

Multi-Agent Data Production Platform

Medical Agent Evaluation Pipeline

Virtual Persona Agent Evaluation

Agent Training Data Strategy

HRBoard — Staffing & Ops Platform

Micro-Expression Evaluation Suite

Moonlit Cards — AI Tarot Studio

OPC — Multi-Agent Company

LLM Evaluation Wiki

Daily AI Digest

AI Native
Product Principles.

One track.
AI-Native Agent.

AI-Native Agent PM

SenWei.

4 years buildingAI-Native products.

Methodology

Capabilities

Domain

What I’m building.

Multi-Agent Data Production Platform

Medical Agent Evaluation Pipeline

Virtual Persona Agent Evaluation

Agent Training Data Strategy

HRBoard — Staffing & Ops Platform

Micro-Expression Evaluation Suite

Moonlit Cards — AI Tarot Studio

OPC — Multi-Agent Company

LLM Evaluation Wiki

Daily AI Digest

AI NativeProduct Principles.

One track.AI-Native Agent.

AI-Native Agent PM

4 years building
AI-Native products.

AI Native
Product Principles.

One track.
AI-Native Agent.