28) How DeepSeek Rewrote Quantization Part 2 Accumulation Precision Online Quantization5просмотров9 дней назад
27) How DeepSeek Rewrote Quantization Part 1 Mixed Precision Fine-grained quantization3просмотра9 дней назад
20) Mixture of Experts Balancing Techniques Auxiliary Loss Load Balancing Capacity Factor5просмотров10 дней назад
15) All about Sinusoidal Positional Encodings What’s with the weird sin-cos formula1просмотр10 дней назад
14) Integer and Binary Positional Encodings Journey towards Rotary Positional Encodings (RoPE)3просмотра10 дней назад
12) Multi-Head Latent Attention From Scratch One of the major DeepSeek innovation4просмотра11 дней назад
11) Understand Grouped Query Attention (GQA) The final frontier before latent attention5просмотров11 дней назад