On the Emergent Structures of Deep Neural Networks
Speaker
Tianyu He
Event Type
Thesis Defense
Dissertation Committee Chair: Dr. Maissam Barkeshli
Committee:
Dr. Andrey Gromov
Dr. Victor V. Albert
Dr. Daniel P. Lathrop
Dr. Tom Goldstein (Dean’s Representative)
Abstract: Despite the remarkable success of artificial intelligence (AI), we still lack a fundamental understanding of why deep learning works—why certain architectural designs are preferred, how optimization navigates complex landscapes, and what data characteristics enable learning. This dissertation addresses these questions through a phenomenological approach, abstracting essential features from complex real-world experiments and constructing minimal setups that faithfully capture them. The investigation unfolds in three interconnected parts.
In the first part, we study signal propagation in randomly initialized networks under the infinite-width limit, revealing a phase transition from ordered to chaotic dynamics at large depth. We introduce the inter-layer Jacobian as an observable that precisely characterizes this transition. Through extensive empirical validation, we demonstrate that this metric provides universal guidance and an efficient algorithm for selecting network initializations across diverse architectures, offering a principled alternative to heuristic initialization schemes.
In the second part, we elucidate the optimization landscape of neural networks by carefully analyzing the dynamics of the loss Hessian (sharpness) under gradient descent. Remarkably, we show that even a minimal two-layer linear network trained on a single example exhibits all the essential sharpness phenomenology observed in real-world scenarios. By analyzing the structure of dynamical fixed points in function space, we reveal the fundamental mechanisms governing sharpness evolution during training, thereby clarifying the interplay between initialization, parameterization, and gradient descent dynamics at large learning rates.
In the third part, we investigate how data characteristics and network architecture choices give rise to the emergent capabilities of neural networks. We begin by analyzing models trained on modular arithmetic tasks. Across different setups, we establish necessary data-diversity thresholds and minimal network sizes for successful task performance, demonstrating that certain computational capabilities emerge only when both architectural capacity and data coverage exceed critical values. Furthermore, we extend this understanding to a more practical setting by tackling pseudo-random number generator (PRNG) tasks.
Finally, synthesizing insights from the preceding parts, we challenge conventional architectural design paradigms. We propose a new (proto-)architecture that enables arbitrary, learned inter-layer connectivity patterns. Through careful analysis of trained models in both vision and language domains, we observe the emergence of distinct connectivity structures, accompanied by interpretable layer specialization and meaningful correlations between the nature of the data and the network’s internal computational patterns.