滑滑蛋
  • 首页
  • 归档
  • 分类
  • 标签
  • 关于

共计 13 篇文章


2025

12-28
【论文阅读】ByteScale:Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000
12-22
【论文阅读】ScheMoE:An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling
12-13
【论文阅读】The Llama 3 Herd of Models(Section 3 Pre-Training)
12-08
【论文阅读】Reducing Activation Recomputation in Large Transformer Models
12-07
【论文阅读】Megatron-LM论文阅读

2024

08-18
【论文阅读】{MegaScale}:Scaling Large Language Model Training to More Than 10,000 {GPUs}
08-17
【论文阅读】Fluid:Dataset Abstraction and Elastic Acceleration for Cloud-native Deep Learning Training Jobs
05-01
【论文阅读】Gödel:Unified Large-Scale Resource Management and Scheduling at ByteDance
03-14
【论文阅读】Not All Resources are Visible:Exploiting Fragmented Shadow Resources in Shared-State Scheduler Architecture

2023

12-26
【论文阅读】In Search of an Understandable Consensus Algorithm
12

搜索

Hexo Fluid
总访问量 次 总访客数 次