Large Language Models – Roger D. Pease

The “talking computers” (Chat-GPT/Bart/etc) are essentially predictive text generation. Neural Nets are essentially function estimators. That is, if you have a set of 10 features, your 10-input, 2-output feed-forward neural network is essentially a giant function estimator

(target1,target2)= f(feature1,feature2,…feature10).

Whether you are training a thermostat, figuring out if a picture is a cat or an orange, or anything else, your neural network is really just a giant mathematical function taking in numbers and outputting (in this case) probabilities representing a desired output.

LLMs work by tokenizing words. Orange and Beige are different words with different spellings and mostly different letters. However, they more similar to each other than they are to, say, tomato or inquisition. LLMs work by tokenizing those words and coming up with vector representations. For instance, colors could be represented with RGB values. “tomato” might have vector amplitudes for types of food, “upset” might have amplitudes for emotion, “likely” might have an amplitude for probability of 0.8 while “very likely” might be 0.9, “possibly” might be 0.4, “rarely” might be 0.1.

Essentially LLMs tokenize one (or more) words into a vector representation and then feed that into a neural network to predict the next word.

LLM elements

Tuning:

Chat-GPT is a multi-billion parameter model with multiple-billions of runs of training data. ‘Basic’ LLM training can cost millions and take an enormously powerful computer (more powerful than anything you have at home). It can also talk about anything from recipes for tomato soup to the Hundred Years war. Companies which want to implement chat-bots will generally start with a “pre-trained” model and then ‘fine tune’ it on specific datasets.

The main ways to tune are:

Feeding a specific prompt and anticipated completion. I.E. “Prompt”: “When does flight 17 leave New York?” “Completion”: “Flight #flightnum left #flightnumdepartcity at #flightnumdeparttime”

One thought on “Large Language Models”