飞桨开源社区博客

Wonderful stories from PaddlePaddle contributors

Published on
2025年10月16日
【论文分享】| KV 压缩技术综述：高效LLM推理的 KV Cache 优化
本文梳理近期主流的 KV 压缩/驱逐思路（Prefill vs. Decoding），对比 H2O、PyramidKV、SnapKV、Quest 等代表方法的策略与表现。

Read more →
Published on
2025年10月15日
飞桨&文心开源高校行之北京大学
Read more →
Published on
2025年9月29日
飞桨进浙江大学软件学院：一场浙大限定的开源大冒险 🚀
9 月 23 日，飞桨团队携手文心大模型走进浙江大学软件学院，带来一场特别的开源分享活动——“OpenSource in Paddle：浙大限定的开源大冒险”。

活动邀请多位飞桨资深工程师与产品经理，与浙大软院师生面对面交流，分享前沿的开源实践与大模型应用。现场气氛热烈，技术与创意交织，激发了关于开源的深度思考。

Read more →
Published on
2025年9月18日
【论文分享】| Medusa: Accelerating Serverless LLM Inference with Materialization (ASPLOS '25)
本工作旨在解决Serverless LLM 推理中的冷启动（Cold Start）问题。冷启动延迟严重影响了用户体验的关键指标——首令牌时间（Time-To-First-Token, TTFT）。

Read more →
Published on
2025年9月17日
Post-training of LLM（产品经理民科普及版）
本文概述了大型语言模型（LLM）的后训练（post-training）方法，主要包括监督微调（Supervised Fine-tuning, SFT）、直接偏好优化（Direct Preference Optimization, DPO）和在线强化学习（Online Reinforcement Learning, Online RL）。尽量通过通俗易懂的方式介绍这些技术细节，适合对 LLM 有兴趣但非专业的读者。

Read more →
Published on
2025年9月2日
【论文分享】| vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
论文链接：vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

关键词：KV cache, CUDA
Read more →
Published on
2025年8月12日
【论文分享】｜ cosyvoice语音合成论文分享
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

论文链接：https://arxiv.org/pdf/2407.05407

项目链接：https://github.com/FunAudioLLM/CosyVoice
Read more →
Published on
2025年8月9日
用 ERNIE 4.5 与 PaddleOCR 3.0 实现文档翻译实践指南
Read more →
Published on
2025年7月30日
GLCC明日之星陆琦：从开源新人到飞桨核心框架贡献者的技术进阶之路

“我之前以为 GLCC 就是大厂开放一些边角料课题给在校生练练手，但参与之后发现，飞桨的赛题足够硬核，它的难度、复杂度、完备度都远超我的预期。最终，它给我的收获也远超预期。”

Read more →
Published on
2025年7月29日
【论文分享】｜MoE 层训练推理的 I/O 相关优化策略
Read more →

1 2 3 4 5 ... 9