Up next
42 Shot, 7 Murdered in Gun-Controlled Democrat Hellhole Chicago This Weekend Alone
Most Painful Places to be Shot
Biden administration proposes "sustainable calm" in new language for Gaza ceasefire deal
Terence Crawford CALLS OUT Vergil Ortiz for TURNING DOWN his Mandatory Title Shot w/ Shawn Porter
Crowds protest Utica officials after teenager fatally shot by police
HISTORIC & VERY DANGEROUS CAT4 HURRICANE BERYL TAKES AIM AT ??? MODELS SHOW.....
'We really saw President Biden in trouble': Body language expert on debate | Morning in America
Analytic Philosophy Part 3: Language and Meaning
WSHH Presents "Down In the DM's" Hosted by DamnHomie - OnlyFans Models Read Their Wildest DMs! Ep. 6
The Importance Of Language To The Abortion Debate
President Trump's Spiritual Adviser Paula White - "Saka Tara" - Scream/Foreign Language/Alien Talk?
Live CEOing Ep 816: Language Design in Wolfram Language [Tabular]
Ted Bundy and Paul Bernardo: Similarities in Language and Psychology
Latest AI Advancements and Concerns Over OpenAI Hiring Former NSA Chief
AI Like OpenAI’s Sora...But Free To Try!
Common arguments for gun control, shot down
51524 1P(TX) 2nd CUBAN MISSILE CRISIS SINCE 1962! SLOVAKIA PM ROBERT FICO-SHOT 5X! W162
52024 10A(TX) US MIL! IRAN-PRES RAISI-HELICOPTER SHOT DN! SLOVAKIA-PM FICO -DN'D BY SNIPER! W166
Q2B23 SV | Quantum Generative Models of Financial Time Series | Vanio Markov & Vladimir Rastunkov
Weekend Update: Kristi Noem Shot Her Dog, Trump Complains About Trial - SNL
CTMU, MADE SIMPLE: Reality = Language
The Fascinating History of Sign Language
New Mexico officers who fatally shot man at wrong home won’t face charges
Moshe Kasher on Raves and Sign Language + Stunt Driver Robert Nagle on The Biscuit Rig
Easy MEGA Guide to LLMs in 2024 (Large Language Models) Get Into AI!
Timcast IRL - Sports Illustrated FIRES MOST Staff, Trans Models & AI Scandal BREAK Company w/ALX
What Goes Into Training AI Language Models?
Introduction to the Latin Language
Dean Phillips CHANGES DEI Language After $1M Bill Ackman Donation
1946 FLOODING IN SHANGHAI CHINA HOME MOVIES SHOT BY AMERICAN VISITOR YELLOW RIVER XD49594
Man shot, killed in front of family after road rage incident
Foolio on Getting Shot, Running Down on NBA Youngboy, Ksoo Getting Convicted & More
Houthi-launched drones, missiles shot down by US, British navies over Red Sea
What’s Your Leadership Language? | Rosita Najmi | TED
Joe Concha: Biden's New Year's resolution a 'long shot'
The Learners Fund - The Khan Academy story
One person fatally shot inside Florida mall, suspect still at large
Friday LIVE: Hamas Tunnels, Hunter Biden, The Muon Shot, Biden and Medical Patents
Did Gemini just dethrone GPT4? My first thoughts on Google DeepMind vs OpenAI and their future
"WHO SHOT YA WAS NOT ABOUT TUPAC!!!" D DOT GETS INTO THE DETAILS BEHIND BIGGIE SMALLS WHO SHOT YA!
Understanding and Mitigating Copying in Diffusion Models
OpenAI GPT-3: Language Models are Few-Shot Learners
**ERRATA**: Open AI/GPT-3 DOES NOT USE Microsoft's ZeRO/DeepSpeed for training Discord: https://discord.gg/4H8xxDF In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model. OpenAI trained a 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning. 00:00:00 Intro 00:00:54 ZeRO1+2 (model + Data parallelism) [GPT-3 DOES *NOT* USE THIS] (Connor) 00:03:17 Recent history of NLP (Tim) 00:06:04 Yannic "Light-speed" Kilcher's brief overview of GPT-3 00:14:25 Reviewing Yannic's YT comments on his GPT-3 video (Tim) 00:20:26 Main show intro 00:23:03 Is GPT-3 reasoning? 00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT) 00:36:18 Utility of GPT-3 in industry 00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2) 00:51:03 Generalisation 00:56:48 Esoterics of language models 00:58:46 Architectural trade-offs 01:07:37 Memorization machines and intepretability 01:17:16 Nearest neighbour probes / watermarks 01:20:03 YouTube comments on GPT-3 video 01:21:50 GPT-3 news article generation issue 01:27:36 Sampling data for language models / bias / fairness / politics 01:51:12 Outro These paradigms of task adaptation are divided into zero, one, and few shot learning. Zero-shot learning is a very extreme case where we expect a language model to perform a task such as sentiment classification or extractive question answering, without any additional supervision. One and Few-shot learning provide some examples to the model. However, GPT-3s definition of this diverges a bit from the conventional literature. GPT-3 provides one and few-shot examples in the form of “In-Context Learning”. Instead of fine-tuning the model on a few examples, the model has to use the input to infer the downstream task. For example, the GPT-3 transformer has an input sequence of 2048 tokens, so demonstrations of a task such as yelp sentiment reviews, would have to fit in this input sequence as well as the new review. **ERRATA-continued** It has come to our attention that there was a serious factual error in our video -- GPT-3 DOES NOT USE Microsoft's ZeRO/ZeRO2 or DeepSpeed for training and there is no reference to this in either their blog post or paper. We are really sorry about this mistake and will be more careful to fact-check in future. Thanks for watching! Please Subscribe! Paper Links: GPT-3: https://arxiv.org/abs/2005.14165 #machinelearning #naturallanguageprocessing #deeplearning #gpt3
- Top Comments
- Latest comments