Blog-post figures — source index
Blog-post figures — source index
Each PNG in this folder is exported from a PDF in the paper’s figure/ tree. This file maps every blog figure back to its source.
Quick regenerate
cd blog_post/figures
python3 regenerate.py
Requires PyMuPDF (pip install pymupdf). The script reads from the paper’s figure/ folder (resolved as ../../figure/ relative to itself), renders the first page of each source PDF at the listed DPI with fitz.Page.get_pixmap, and writes the PNG into the current folder.
If you add, remove, or rename a blog figure, edit the FIGS list in regenerate.py and also update the mapping table below.
Mapping
Paths are relative to the repo root (mue_git/).
| Blog PNG | Source PDF | DPI | Used in index.html |
|---|---|---|---|
dense_sweep_lm_lr.png | figure/blog_post/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_lr_val_loss_step25k_line_only_dense_ffn_1_only.pdf | 200 | “The dense calibration sweep” — LM, learning-rate scan |
dense_sweep_lm_wd.png | figure/blog_post/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_w_decay_val_loss_step25k_line_only_dense_ffn_1_only.pdf | 200 | “The dense calibration sweep” — LM, weight-decay scan |
dense_sweep_lm_init.png | figure/blog_post/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_init_std_val_loss_step25k_line_only_dense_ffn_1_only.pdf | 200 | “The dense calibration sweep” — LM, init-std scan |
dense_sweep_df_lr.png | figure/blog_post/dim_128_bs256_it100k_moe_sigmoid_activated_experts_scaling_final_dense_ffn_1_only.pdf | 200 | “The dense calibration sweep” — DF, learning-rate scan |
dense_sweep_df_wd.png | figure/blog_post/dim_128_bs256_25k_dense_moe_weight_decay_scan_final_avg_loss_dense_ffn_1_only.pdf | 200 | “The dense calibration sweep” — DF, weight-decay scan |
dense_sweep_df_init.png | figure/blog_post/dim_128_bs256_25k_dense_moe_init_std_scan_final_avg_loss_until_3e-1_dense_ffn_1_only.pdf | 200 | “The dense calibration sweep” — DF, init-std scan |
activated_experts_lr_lm.png | figure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_lr_val_loss_step25k_line_only.pdf | 200 | “Activated-experts: optima align” — LR sweep, LM |
activated_experts_lr_df.png | figure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_sigmoid_activated_experts_scaling_final.pdf | 200 | “Activated-experts: optima align” — LR sweep, DF |
activated_experts_wd_lm.png | figure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_w_decay_val_loss_step25k_line_only.pdf | 200 | “Activated-experts: optima align” — weight-decay sweep, LM |
activated_experts_wd_df.png | figure/diffusions/df_std_wdecay/dim_128_bs256_25k_dense_moe_weight_decay_scan_final_avg_loss.pdf | 200 | “Activated-experts: optima align” — weight-decay sweep, DF |
activated_experts_init_lm.png | figure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_init_std_val_loss_step25k_line_only.pdf | 200 | “Activated-experts: optima align” — init-std sweep, LM |
activated_experts_init_df.png | figure/diffusions/df_std_wdecay/dim_128_bs256_25k_dense_moe_init_std_scan_final_avg_loss_until_3e-1.pdf | 200 | “Activated-experts: optima align” — init-std sweep, DF |
activated_experts_initial_loss_lm.png | figure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_8_moe_sigmoid_activated_experts_lr_initial_iterations_2_seed_avg_lr_1e-3_max_step_100.pdf | 200 | “Activated-experts: optima align” — initial training-loss curves (first 100 steps), LM |
activated_experts_initial_loss_df.png | figure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_sigmoid_activated_experts_scaling_initial_iterations_4_seed_avg_lr_1p6e-3_max_step_100.pdf | 200 | “Activated-experts: optima align” — initial training-loss curves (first 100 steps), DF |
lr_sweep_capacity_lm.png | figure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_4_moe_sigmoid_capacity_lr_val_loss_step25k_line_only.pdf | 200 | “MoE architectural axes” — capacity scaling, LM |
lr_sweep_capacity_df.png | figure/diffusions/MoE_capacity/dim_128_bs256_it100k_moe_sigmoid_capacity_final.pdf | 200 | “MoE architectural axes” — capacity scaling, DF |
lr_sweep_granularity_lm.png | figure/llms/gen7_text_dim_128_layer_32_dense_ffn_1_4_moe_sigmoid_granularity_lr_val_loss_step25k_line_only.pdf | 200 | “MoE architectural axes” — granularity, LM |
lr_sweep_granularity_df.png | figure/diffusions/granularity/dim_128_bs256_it100k_moe_sigmoid_granularity_final_with_ffn4.pdf | 200 | “MoE architectural axes” — granularity, DF |
lr_sweep_shared_lm.png | figure/llms/gen7_text_dim_128_layer_32_ffn_1_moe_sigmoid_shared_expert_lr_val_loss_step25k_line_only.pdf | 200 | “MoE architectural axes” — shared-expert sweep, LM |
lr_sweep_shared_df.png | figure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_shared_expert_scan_final.pdf | 200 | “MoE architectural axes” — shared-expert sweep, DF |
lr_sweep_grouped_lm.png | figure/llms/gen7_text_dim_128_layer_32_ffn_1_moe_sigmoid_group_global_norm_lr_val_loss_step25k_line_only.pdf | 200 | “MoE architectural axes” — group-balanced routing, LM |
lr_sweep_grouped_df.png | figure/diffusions/activated_shared_grouped_experts/dim_128_bs256_it100k_moe_gn_grouped_experts_scaling_final.pdf | 200 | “MoE architectural axes” — group-balanced routing, DF |
lr_sweep_depth_lm.png | figure/llms/gen7_text_dim_128_layer_4_8_16_32_ffn_1_moe_16e4a_model_layer_lr_val_loss_step25k_line_only.pdf | 200 | “MoE architectural axes” — depth, LM |
lr_sweep_depth_df.png | figure/diffusions/depth_width/dim_128_bs256_it100k_layer_lr_scaling_final_ft.pdf | 200 | “MoE architectural axes” — depth, DF |
lr_sweep_width_lm.png | figure/llms/gen7_text_dim_128_256_512_1024_layer_32_ffn_1_moe_16e4a_model_width_lr_val_loss_step25k_line_only.pdf | 200 | “MoE architectural axes” — backbone width, LM |
lr_sweep_width_df.png | figure/diffusions/depth_width/dim_128_bs256_it100k_moe_width_scaling_final_ft.pdf | 200 | “MoE architectural axes” — backbone width, DF |
fixed_lr_llm_activated.png | figure/paper_loss_scaling/fixed_lr_llm_activated_experts_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — LM, activated experts |
fixed_lr_df_activated.png | figure/paper_loss_scaling/fixed_lr_diffusion_activated_experts_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — DF, activated experts |
fixed_lr_llm_capacity.png | figure/paper_loss_scaling/fixed_lr_llm_capacity_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — LM, capacity |
fixed_lr_df_capacity.png | figure/paper_loss_scaling/fixed_lr_diffusion_capacity_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — DF, capacity |
fixed_lr_llm_granularity.png | figure/paper_loss_scaling/fixed_lr_llm_granularity_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — LM, granularity |
fixed_lr_df_granularity.png | figure/paper_loss_scaling/fixed_lr_diffusion_granularity_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — DF, granularity |
fixed_lr_llm_layers.png | figure/paper_loss_scaling/fixed_lr_llm_layer_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — LM, layers |
fixed_lr_df_layers.png | figure/paper_loss_scaling/fixed_lr_diffusion_layer_scaling.pdf | 200 | “Fixed-LR scaling across MoE axes” — DF, layers |
large_loss_256p.png | figure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_1_256_256_train_loss.pdf | 200 | “Large-scale evidence” — 256P image training loss, dense vs. MoE |
large_loss_512p.png | figure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_1_512_512_train_loss.pdf | 200 | “Large-scale evidence” — 512P image training loss, dense vs. MoE |
large_loss_240p_keyframe.png | figure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_4_240_432_train_loss.pdf | 200 | “Large-scale evidence” — 240P key-frame training loss, dense vs. MoE |
large_loss_240p_video.png | figure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_57_240_432_train_loss.pdf | 200 | “Large-scale evidence” — 240P 5s video training loss, dense vs. MoE |
large_speedup_256p.png | figure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_1_256_256_convergence_speedup_vs_dense_step.pdf | 200 | “Large-scale evidence” — 256P image convergence speedup |
large_speedup_240p_video.png | figure/diffusions/large_run/gr6_conv_mini_512p_dense_vs_moe_multiresolution_57_240_432_convergence_speedup_vs_dense_step.pdf | 200 | “Large-scale evidence” — 240P 5s video convergence speedup |
large_llm_train_loss.png | figure/llms/large_run/gen7_text_100b_dense_vs_moe_smoothed_train_loss.pdf | 200 | “Large-scale” LLM trio — smoothed training loss |
large_llm_val_loss.png | figure/llms/large_run/gen7_text_100b_dense_vs_moe_val_loss.pdf | 200 | “Large-scale” LLM trio — validation loss on C4 |
large_llm_speedup.png | figure/llms/large_run/gen7_text_100b_dense_vs_moe_convergence_speedup_vs_dense_step.pdf | 200 | “Large-scale” LLM trio — convergence speedup vs. dense |
Regenerating one figure manually
If you only want to refresh a single PNG, the per-figure recipe is:
import fitz # PyMuPDF
SRC = "/path/to/figure/paper_loss_scaling/fixed_lr_llm_capacity_scaling.pdf"
DST = "fixed_lr_llm_capacity.png"
DPI = 200
doc = fitz.open(SRC)
pix = doc[0].get_pixmap(matrix=fitz.Matrix(DPI/72, DPI/72), alpha=False)
pix.save(DST)
doc.close()
For the full set, just use regenerate.py — it does the same thing for every entry in the mapping table.
Notes on the teaser
The big top-of-page teaser in index.html is not in this folder — it’s the inline <svg viewBox="0 0 1900 1080">…</svg> block in the post body. It was authored directly in HTML so it scales as vector and uses the same Inter font family as the surrounding page. No raster version is needed.
(If you later want a social-card preview image for og:image / twitter:image, export the inline SVG — or the original figure/teaser_figure/teaser.pdf — to a ~1200×630 PNG, add it here, and reinstate the meta tags in index.html.)
