
Vedant Puri
PhD Candidate, Carnegie Mellon University
Efficient Transformer Architectures | Scientific Machine Learning
I design transformer architectures with explicit attention to scaling and memory efficiency. My recent work, FLARE, enables million-token regimes on a single GPU. I implement new architectures directly in PyTorch and Triton. My background spans high-performance computing, numerical analysis, and computational fluid dynamics.
Research Interests
- Efficient attention architectures
- Numerical methods for ML and for PDEs
- Scientific machine learning
Featured Work
FLARE - Fast Low-Rank Attention Routing Engine
- Unified low-rank reformulation of self-attention
- O(NM) memory scaling
- Scales to 1M tokens on a single GPU
- Benchmarked on PDE, NLP, and vision tasks

SNF-ROM
SNF-ROM is a projection-based nonlinear reduced-order modeling framework with smooth neural fields for advection-dominated PDEs.
- Combines projection-based ROM with continuous neural field representations
- Targets challenging transport-dominated PDE regimes
- Implemented in Julia with experiment suites for 1D and 2D advection and Burgers systems
- Includes reproducible pipelines for dataset generation, training, and model comparison
Project page | JCP paper | Code | Slides | Talk

Previous Work: Computational fluid dynamics on HPC systems
I previously worked on turbulence simulation and analysis workflows in high-performance computing settings, with emphasis on spectral element methods and large-scale post-processing. This background in numerical methods and PDE solvers informs how I design stable and efficient transformer architectures for scientific ML.
Velocity magnitude for flow past wall-mounted cube case at Reynolds Number 3900 with respect to cube height. Computation performed using spectral element code NEK5000 at Argonne Leadership Computing Facility.
Not Work
Not So Up-to-Date Photography Portfolio
For the past decade, I have used a Canon DSLR as an excuse to walk around and photograph people, geometry, and city texture.
Open Source
FLARE
FLARE.py: Fast Low-rank Attention Routing Engine for scalable transformer attention.
Julia Open Source Tools
- SciMLOperators.jl: operator abstractions for SciML and PDE workflows
- LinearSolve.jl: linear solver interface for scientific machine learning
Additional Julia repos I have worked on include OrdinaryDiffEq.jl, NonlinearSolve.jl, Optimization.jl, SciMLBase.jl, SciMLSensitivity.jl, DiffEqFlux.jl, StochasticDiffEq.jl, and DiffEqBase.jl.
KolmogorovArnold.jl
KolmogorovArnold.jl: Julia implementation of Kolmogorov-Arnold Networks with custom gradients for faster training.
NekTools
NekTools: FORTRAN 77 utilities for turbulence statistics and post-processing in NEK5000.