分类 - 基础知识 - 滑滑蛋的个人博客

03-02

【论文阅读】GLM-5:from Vibe Coding to Agentic Engineering

12-28

【论文阅读】ByteScale:Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000

12-22

【论文阅读】ScheMoE:An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling

12-13

【论文阅读】The Llama 3 Herd of Models（Section 3 Pre-Training）

12-08

【论文阅读】Reducing Activation Recomputation in Large Transformer Models

06-01

深度学习中反向传播及优化器使用详解

05-25

Pytorch torch.distributed 及NCCL初探

05-10

GPU架构概览

04-30

大模型显存占用浅析

04-25

Transformer-KV cache浅析