Large Language Models

The “talking computers” (Chat-GPT/Bart/etc) are essentially predictive text generation. Neural Nets are essentially function estimators. That is, if you have a set of 10 features, your 10-input, 2-output feed-forward neural network is essentially a giant function estimator

(target1,target2)= f(feature1,feature2,…feature10).

Whether you are training a thermostat, figuring out if a picture is a cat or an orange, or anything else, your neural network is really just a giant mathematical function taking in numbers and outputting (in this case) probabilities representing a desired output.

LLMs work by tokenizing words. Orange and Beige are different words with different spellings and mostly different letters. However, they more similar to each other than they are to, say, tomato or inquisition. LLMs work by tokenizing those words and coming up with vector representations. For instance, colors could be represented with RGB values. “tomato” might have vector amplitudes for types of food, “upset” might have amplitudes for emotion, “likely” might have an amplitude for probability of 0.8 while “very likely” might be 0.9, “possibly” might be 0.4, “rarely” might be 0.1.

Essentially LLMs tokenize one (or more) words into a vector representation and then feed that into a neural network to predict the next word.

LLM elements

Tuning:

Chat-GPT is a multi-billion parameter model with multiple-billions of runs of training data. ‘Basic’ LLM training can cost millions and take an enormously powerful computer (more powerful than anything you have at home). It can also talk about anything from recipes for tomato soup to the Hundred Years war. Companies which want to implement chat-bots will generally start with a “pre-trained” model and then ‘fine tune’ it on specific datasets.

The main ways to tune are:

Feeding a specific prompt and anticipated completion. I.E. “Prompt”: “When does flight 17 leave New York?” “Completion”: “Flight #flightnum left #flightnumdepartcity at #flightnumdeparttime”

Machine Learning Approaches

Most Machine Learning courses start with Linear Regression

AlgorithmTypeUses
k-Nearest NeighborSupervised ClusteringClassifying items based on the most common of the k-nearest neighbors
k-MeansUnsupervised ClusteringClassifying items into one of k groups
RecommenderClusteringPicking Movies/Foods/Music based on either (A) Your prior favorites or (B) prior favorites of people like you.
Neural NetworksLinear RegressionFinding the price of a house based on neighborhood and square footage.
Neural NetworksLogistic RegressionFinding the probability a tumor with certain size, proteins and shape is cancerous.
Convolutional Neural NetworksLogistic RegressionFinding the probability that picture is of a dog, or cat, or house.
Recurrent Neural NetworkLogistic RegressionFinding a probability where there is a prior state dependence (i.e. not just based on immediate inputs).
Restricted Boltzman MachineUnsupervisedFinding hidden relationships in data.
AutoEncodersUnsupervised RegressionEncoding a picture down to a certain number of bytes (say, 30) then re-encoding it back. Good for identifying essential dimensions of an image.

  • Source Data manipulation (i.e. transforming a list of strings into numbers, doing one-hot encoding, etc).
  • Model (LR, kNN, Neural Network)
  • Width/Depth of Neural network model or k-value.
  • Activation functions (sigmoid, tanh, ReLU)
  • Distance Functions (distance between elements in a kNN model)
  • Cost/Error Functions
  • Epoch count and Learning Rate

Talking Computers

Linux Embedded LCD1602

LCD 1602 Character Driver

This is a simple character driver for the Raspberry Pi.
It allows users to write an PCM8574-driving-LCD1602 over an I2C module with echo commands.

Steps:

1) Update the device tree as follows:

&i2c1  { 
         [...] 
   lcd1602 {
           compatible = "arrow,lcd1602";
           reg = <0x27>;             
        };
};  

2) Pull the git repo. Compile and copy the lcd1602DeviceDriver.ko to your Raspberry pi
3) Hook up the lcd1602 (power, gnd, scl and sca should map directly to pins on pi board).
4) insmod lcd1602DeviceDriver.ko
5) echo “Roger D. Pease\\Houston Aug 2020” > /dev/lcd160200

6) The screen should show (this is a sample from another implementation)

Linux Simple Character Driver

Background/Usage

This is a simple character driver meant as a reference design. A simple character string can be written to the driver and then recalled multiple times.

For example:

echo "Hello Universe" > /dev/simpleCharDevice0
cat /dev/simpleCharDevice0 

The cat command will playback the string written. The driver is robust enough to handle the fact that read() doesn’t always return the exact number of bytes as are in the string.

On the user side, the echo command acts as a system write() function, so this is tantamount to a C sysopen() and write(). Likewise the cat is a sysopen() and read().

Usage

Tested on Ubuntu 20.

%  git clone https://github.com/rogerpease/LinuxKernelSimpleCharacterDriver
% cd LinuxSimpleCharacterDriver/src 
% make
% insmod SimpleCharacterDriver.ko  # (if you get an Operation Not Permitted you may need to Sign the module)
% mknod /dev/simpleCharDevice0 c 228 0         
% echo "Hello Universe" > /dev/simpleCharDevice0
% cat /dev/simpleCharDevice0 

Notes

The Major Device number hardcoding is a bit antequated.. I know there are ways of dynamically allocating those device numbers. Otherwise, this is essentially a file operation:

// Declare File operations structure.
static const struct file_operations my_dev_fops = {
.owner = THIS_MODULE,
.read = charDriverFileRead, // Called when read() is called.
.write = charDriverFileWrite, // Called when write() is called
.open = charDriverFileOpen, // Called when open() is called.
.release = charDriverFileClose, // called with close() is called.
.unlocked_ioctl = charDriverFileIOCTL,

};

Each filehandle is passed as a context. For example, the read function

ssize_t charDriverFileRead (struct file *filp, char * buf, size_t byteCount, loff_t * fileOffset)

can be used to keep information about where this individual file access left off, just like you can have multiple processes reading a disk file but keeping separate location pointers.

Relevance to work experience

I have done plenty of work on TMS320C6X devices (DMA, HPIs, etc) including Device drivers in assembly, C and C with SYS/BIOS.

Machine Learning Experience

My first encounter with machine learning was a Compute’s Gazette article which provided a Commodore 64 program attempted to predict which of two keys you would press based on your prior inputs. That was probably a simple Recurrent Neural Network. I’ve always had an interest in the topic but the applications for ML have exploded in the past 5 years.

I am still in the learning phases on this topic, so please do not take anything I say here as authoritative or as a claim of expert-level ML experience. However, much as one would expect a programmer in the 1990’s to be familiar with basic Object Oriented methods, I would expect a programmer in the 2020s to be familiar with basic Machine Learning.

Approaches/Problems

Classes/Online Learning

Projects

Potential Projects

  • FPGA Based Neural Network — Using an FPGA fabric to implement a set of Neurons. Addition and multiplication would be reasonably simple although floating point mutlipliers tend to take a bit of time and area. The RELU activation function could be done with a sign bit while tanh and Sigmoid could be approximated with lines.