memorial

paper-list

memorial

Author

Shiguang Wu

Published

October 5, 2024

will be added to the awesome list of papers

“Towards Understanding Grokking: An Effective Theory of Representation Learning”,“2022-05-20”,“https://arxiv.org/abs/2205.10343”,“Ziming Liu; Ouail Kitouni; Niklas Nolte; Eric J. Michaud; Max Tegmark; Mike Williams”

paper about the benefits of very large stepsize in gradient descent / EoS:

“Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency”,“2024-02-24”,“https://arxiv.org/abs/2402.15926”,“Jingfeng Wu; Peter L. Bartlett; Matus Telgarsky; Bin Yu”
“Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization”,“2024-06-12”,“https://arxiv.org/abs/2406.08654”,“Yuhang Cai; Jingfeng Wu; Song Mei; Michael Lindsey; Peter L. Bartlett”
“Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability”,“2023-05-19”,“https://arxiv.org/abs/2305.11788”,“Jingfeng Wu; Vladimir Braverman; Jason D. Lee”
“On the Noisy Gradient Descent that Generalizes as SGD”,“2019-06-18”,“https://arxiv.org/abs/1906.07405”,“Jingfeng Wu; Wenqing Hu; Haoyi Xiong; Jun Huan; Vladimir Braverman; Zhanxing Zhu”

to-read list:

“Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling”,“2024-09-23”,“https://arxiv.org/abs/2409.15156”,“Lechao Xiao”
“Learning From Biased Soft Labels”,“2023-02-16”,“https://arxiv.org/abs/2302.08155”,“Hua Yuan; Ning Xu; Yu Shi; Xin Geng; Yong Rui”
“Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks”,“2022-06-08”,“https://arxiv.org/abs/2206.03826”,“Jiachun Pan; Pan Zhou; Shuicheng Yan”
“How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?”,“2023-10-12”,“https://arxiv.org/abs/2310.08391”,“Jingfeng Wu; Difan Zou; Zixiang Chen; Vladimir Braverman; Quanquan Gu; Peter L. Bartlett”

miscellaneous: