Transformer
The neural network architecture introduced in 2017 that revolutionized AI by processing entire sequences of data simultaneously using 'attention'.
The Transformer is the breakthrough neural network architecture introduced by Google researchers in the 2017 paper "Attention Is All You Need". It is the underlying architecture for virtually all modern Large Language Models, including GPT, Claude, and Gemini.
Prior to Transformers, models processed text sequentially (word by word), which was slow and struggled with long-range context. Transformers process entire sequences simultaneously using a mechanism called "Self-Attention," allowing the model to mathematically weigh the importance of every word in a sentence relative to every other word, granting it a profound understanding of context and nuance.