Blog-post figures — source index

Blog-post figures — source index

Each PNG in this folder is exported from a PDF in the paper’s figure/ tree. This file maps every blog figure back to its source.

Quick regenerate

cd blog_post/figures
python3 regenerate.py

Requires PyMuPDF (pip install pymupdf). The script reads from the paper’s figure/ folder (resolved as ../../figure/ relative to itself), renders the first page of each source PDF at the listed DPI with fitz.Page.get_pixmap, and writes the PNG into the current folder.

If you add, remove, or rename a blog figure, edit the FIGS list in regenerate.py and also update the mapping table below.

Mapping

Paths are relative to the repo root (mue_git/).

Blog PNGSource PDFDPIUsed in index.html
dense_sweep_lm_lr.pngfigure/blog_post/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_lr_val_loss_step25k_line_only_dense_ffn_1_only.pdf200“The dense calibration sweep” — LM, learning-rate scan
dense_sweep_lm_wd.pngfigure/blog_post/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_w_decay_val_loss_step25k_line_only_dense_ffn_1_only.pdf200“The dense calibration sweep” — LM, weight-decay scan
dense_sweep_lm_init.pngfigure/blog_post/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_init_std_val_loss_step25k_line_only_dense_ffn_1_only.pdf200“The dense calibration sweep” — LM, init-std scan
dense_sweep_df_lr.pngfigure/blog_post/dim_128_bs256_it100k_moe_sigmoid_activated_experts_scaling_final_dense_ffn_1_only.pdf200“The dense calibration sweep” — DF, learning-rate scan
dense_sweep_df_wd.pngfigure/blog_post/dim_128_bs256_25k_dense_moe_weight_decay_scan_final_avg_loss_dense_ffn_1_only.pdf200“The dense calibration sweep” — DF, weight-decay scan
dense_sweep_df_init.pngfigure/blog_post/dim_128_bs256_25k_dense_moe_init_std_scan_final_avg_loss_until_3e-1_dense_ffn_1_only.pdf200“The dense calibration sweep” — DF, init-std scan
activated_experts_lr_lm.pngfigure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_lr_val_loss_step25k_line_only.pdf200“Activated-experts: optima align” — LR sweep, LM
activated_experts_lr_df.pngfigure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_sigmoid_activated_experts_scaling_final.pdf200“Activated-experts: optima align” — LR sweep, DF
activated_experts_wd_lm.pngfigure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_w_decay_val_loss_step25k_line_only.pdf200“Activated-experts: optima align” — weight-decay sweep, LM
activated_experts_wd_df.pngfigure/diffusions/df_std_wdecay/dim_128_bs256_25k_dense_moe_weight_decay_scan_final_avg_loss.pdf200“Activated-experts: optima align” — weight-decay sweep, DF
activated_experts_init_lm.pngfigure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_init_std_val_loss_step25k_line_only.pdf200“Activated-experts: optima align” — init-std sweep, LM
activated_experts_init_df.pngfigure/diffusions/df_std_wdecay/dim_128_bs256_25k_dense_moe_init_std_scan_final_avg_loss_until_3e-1.pdf200“Activated-experts: optima align” — init-std sweep, DF
activated_experts_initial_loss_lm.pngfigure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_lr_initial_iterations_2_seed_avg_lr_1e-3_max_step_100.pdf200“Activated-experts: optima align” — initial training-loss curves (first 100 steps), LM
activated_experts_initial_loss_df.pngfigure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_sigmoid_activated_experts_scaling_initial_iterations_4_seed_avg_lr_1p6e-3_max_step_100.pdf200“Activated-experts: optima align” — initial training-loss curves (first 100 steps), DF
lr_sweep_capacity_lm.pngfigure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_4_moe_sigmoid_capacity_lr_val_loss_step25k_line_only.pdf200“MoE architectural axes” — capacity scaling, LM
lr_sweep_capacity_df.pngfigure/diffusions/MoE_capacity/dim_128_bs256_it100k_moe_sigmoid_capacity_final.pdf200“MoE architectural axes” — capacity scaling, DF
lr_sweep_granularity_lm.pngfigure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_4_moe_sigmoid_granularity_lr_val_loss_step25k_line_only.pdf200“MoE architectural axes” — granularity, LM
lr_sweep_granularity_df.pngfigure/diffusions/granularity/dim_128_bs256_it100k_moe_sigmoid_granularity_final_with_ffn4.pdf200“MoE architectural axes” — granularity, DF
lr_sweep_shared_lm.pngfigure/llms/gen7_text_dim_128_layer_32_ffn_1_moe_sigmoid_shared_expert_lr_val_loss_step25k_line_only.pdf200“MoE architectural axes” — shared-expert sweep, LM
lr_sweep_shared_df.pngfigure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_shared_expert_scan_final.pdf200“MoE architectural axes” — shared-expert sweep, DF
lr_sweep_grouped_lm.pngfigure/llms/gen7_text_dim_128_layer_32_ffn_1_moe_sigmoid_group_global_norm_lr_val_loss_step25k_line_only.pdf200“MoE architectural axes” — group-balanced routing, LM
lr_sweep_grouped_df.pngfigure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_gn_grouped_experts_scaling_final.pdf200“MoE architectural axes” — group-balanced routing, DF
lr_sweep_depth_lm.pngfigure/llms/gen7_text_dim_128_layer_4_8_16_32_ffn_1_moe_16e4a_model_layer_lr_val_loss_step25k_line_only.pdf200“MoE architectural axes” — depth, LM
lr_sweep_depth_df.pngfigure/diffusions/depth_width/dim_128_bs256_it100k_layer_lr_scaling_final_ft.pdf200“MoE architectural axes” — depth, DF
lr_sweep_width_lm.pngfigure/llms/gen7_text_dim_128_256_512_1024_layer_32_ffn_1_moe_16e4a_model_width_lr_val_loss_step25k_line_only.pdf200“MoE architectural axes” — backbone width, LM
lr_sweep_width_df.pngfigure/diffusions/depth_width/dim_128_bs256_it100k_moe_width_scaling_final_ft.pdf200“MoE architectural axes” — backbone width, DF
fixed_lr_llm_activated.pngfigure/paper_loss_scaling/fixed_lr_llm_activated_experts_scaling.pdf200“Fixed-LR scaling across MoE axes” — LM, activated experts
fixed_lr_df_activated.pngfigure/paper_loss_scaling/fixed_lr_diffusion_activated_experts_scaling.pdf200“Fixed-LR scaling across MoE axes” — DF, activated experts
fixed_lr_llm_capacity.pngfigure/paper_loss_scaling/fixed_lr_llm_capacity_scaling.pdf200“Fixed-LR scaling across MoE axes” — LM, capacity
fixed_lr_df_capacity.pngfigure/paper_loss_scaling/fixed_lr_diffusion_capacity_scaling.pdf200“Fixed-LR scaling across MoE axes” — DF, capacity
fixed_lr_llm_granularity.pngfigure/paper_loss_scaling/fixed_lr_llm_granularity_scaling.pdf200“Fixed-LR scaling across MoE axes” — LM, granularity
fixed_lr_df_granularity.pngfigure/paper_loss_scaling/fixed_lr_diffusion_granularity_scaling.pdf200“Fixed-LR scaling across MoE axes” — DF, granularity
fixed_lr_llm_layers.pngfigure/paper_loss_scaling/fixed_lr_llm_layer_scaling.pdf200“Fixed-LR scaling across MoE axes” — LM, layers
fixed_lr_df_layers.pngfigure/paper_loss_scaling/fixed_lr_diffusion_layer_scaling.pdf200“Fixed-LR scaling across MoE axes” — DF, layers
large_loss_256p.pngfigure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_1_256_256_train_loss.pdf200“Large-scale evidence” — 256P image training loss, dense vs. MoE
large_loss_512p.pngfigure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_1_512_512_train_loss.pdf200“Large-scale evidence” — 512P image training loss, dense vs. MoE
large_loss_240p_keyframe.pngfigure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_4_240_432_train_loss.pdf200“Large-scale evidence” — 240P key-frame training loss, dense vs. MoE
large_loss_240p_video.pngfigure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_57_240_432_train_loss.pdf200“Large-scale evidence” — 240P 5s video training loss, dense vs. MoE
large_speedup_256p.pngfigure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_1_256_256_convergence_speedup_vs_dense_step.pdf200“Large-scale evidence” — 256P image convergence speedup
large_speedup_240p_video.pngfigure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_57_240_432_convergence_speedup_vs_dense_step.pdf200“Large-scale evidence” — 240P 5s video convergence speedup
large_llm_train_loss.pngfigure/llms/large_run/gen7_text_100b_dense_vs_moe_smoothed_train_loss.pdf200“Large-scale” LLM trio — smoothed training loss
large_llm_val_loss.pngfigure/llms/large_run/gen7_text_100b_dense_vs_moe_val_loss.pdf200“Large-scale” LLM trio — validation loss on C4
large_llm_speedup.pngfigure/llms/large_run/gen7_text_100b_dense_vs_moe_convergence_speedup_vs_dense_step.pdf200“Large-scale” LLM trio — convergence speedup vs. dense

Regenerating one figure manually

If you only want to refresh a single PNG, the per-figure recipe is:

import fitz   # PyMuPDF
SRC = "/path/to/figure/paper_loss_scaling/fixed_lr_llm_capacity_scaling.pdf"
DST = "fixed_lr_llm_capacity.png"
DPI = 200

doc = fitz.open(SRC)
pix = doc[0].get_pixmap(matrix=fitz.Matrix(DPI/72, DPI/72), alpha=False)
pix.save(DST)
doc.close()

For the full set, just use regenerate.py — it does the same thing for every entry in the mapping table.

Notes on the teaser

The big top-of-page teaser in index.html is not in this folder — it’s the inline <svg viewBox="0 0 1900 1080">…</svg> block in the post body. It was authored directly in HTML so it scales as vector and uses the same Inter font family as the surrounding page. No raster version is needed.

(If you later want a social-card preview image for og:image / twitter:image, export the inline SVG — or the original figure/teaser_figure/teaser.pdf — to a ~1200×630 PNG, add it here, and reinstate the meta tags in index.html.)