A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published Apr 30, 2024 • 11
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published 5 days ago • 10
Deep Ignorance Collection This collection contains the model and data artifacts from O'Brien et al. (2025). https://deepignorance.ai • 32 items • Updated 26 days ago • 6
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published 29 days ago • 174
AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models Paper • 2508.02269 • Published Aug 4 • 1
Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity Paper • 2507.13423 • Published Jul 17 • 1
AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models Paper • 2508.02269 • Published Aug 4 • 1
Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity Paper • 2507.13423 • Published Jul 17 • 1
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • Aug 5 • 489
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 356
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 646
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published Nov 5, 2024 • 69