CUDA Bouncing Ball

Background

This is a simple GPU program for my Jetson Nano where I develop some shapes in C++ and use GPU elements to compute various sections of the frame (in this case a picture of shapes moving around the screen). Obviously this could also be done with OpenGL or some other language/API as well.

Building/Use

You can access the source code from my Cuda Bouncing Ball Github.
A build script is provided and the output is a BouncingShapes.avi file.

Possible Next Steps

Similar implementation with an ARM GPU.

Large Language Models

The “talking computers” (Chat-GPT/Bart/etc) are essentially predictive text generation. Neural Nets are essentially function estimators. That is, if you have a set of 10 features, your 10-input, 2-output feed-forward neural network is essentially a giant function estimator

(target1,target2)= f(feature1,feature2,…feature10).

Whether you are training a thermostat, figuring out if a picture is a cat or an orange, or anything else, your neural network is really just a giant mathematical function taking in numbers and outputting (in this case) probabilities representing a desired output.

LLMs work by tokenizing words. Orange and Beige are different words with different spellings and mostly different letters. However, they more similar to each other than they are to, say, tomato or inquisition. LLMs work by tokenizing those words and coming up with vector representations. For instance, colors could be represented with RGB values. “tomato” might have vector amplitudes for types of food, “upset” might have amplitudes for emotion, “likely” might have an amplitude for probability of 0.8 while “very likely” might be 0.9, “possibly” might be 0.4, “rarely” might be 0.1.

Essentially LLMs tokenize one (or more) words into a vector representation and then feed that into a neural network to predict the next word.

LLM elements

Tuning:

Chat-GPT is a multi-billion parameter model with multiple-billions of runs of training data. ‘Basic’ LLM training can cost millions and take an enormously powerful computer (more powerful than anything you have at home). It can also talk about anything from recipes for tomato soup to the Hundred Years war. Companies which want to implement chat-bots will generally start with a “pre-trained” model and then ‘fine tune’ it on specific datasets.

The main ways to tune are:

Feeding a specific prompt and anticipated completion. I.E. “Prompt”: “When does flight 17 leave New York?” “Completion”: “Flight #flightnum left #flightnumdepartcity at #flightnumdeparttime”

Machine Learning Approaches

Most Machine Learning courses start with Linear Regression

Algorithm	Type	Uses
k-Nearest Neighbor	Supervised Clustering	Classifying items based on the most common of the k-nearest neighbors
k-Means	Unsupervised Clustering	Classifying items into one of k groups
Recommender	Clustering	Picking Movies/Foods/Music based on either (A) Your prior favorites or (B) prior favorites of people like you.
Neural Networks	Linear Regression	Finding the price of a house based on neighborhood and square footage.
Neural Networks	Logistic Regression	Finding the probability a tumor with certain size, proteins and shape is cancerous.
Convolutional Neural Networks	Logistic Regression	Finding the probability that picture is of a dog, or cat, or house.
Recurrent Neural Network	Logistic Regression	Finding a probability where there is a prior state dependence (i.e. not just based on immediate inputs).
Restricted Boltzman Machine	Unsupervised	Finding hidden relationships in data.
AutoEncoders	Unsupervised Regression	Encoding a picture down to a certain number of bytes (say, 30) then re-encoding it back. Good for identifying essential dimensions of an image.

Source Data manipulation (i.e. transforming a list of strings into numbers, doing one-hot encoding, etc).
Model (LR, kNN, Neural Network)
Width/Depth of Neural network model or k-value.
Activation functions (sigmoid, tanh, ReLU)
Distance Functions (distance between elements in a kNN model)
Cost/Error Functions
Epoch count and Learning Rate

Talking Computers

Large Language Models

Linux Embedded LCD1602

LCD 1602 Character Driver

This is a simple character driver for the Raspberry Pi.
It allows users to write an PCM8574-driving-LCD1602 over an I2C module with echo commands.

Steps:

1) Update the device tree as follows:

&i2c1  { 
         [...] 
   lcd1602 {
           compatible = "arrow,lcd1602";
           reg = <0x27>;             
        };
};

2) Pull the git repo. Compile and copy the lcd1602DeviceDriver.ko to your Raspberry pi
3) Hook up the lcd1602 (power, gnd, scl and sca should map directly to pins on pi board).
4) insmod lcd1602DeviceDriver.ko
5) echo “Roger D. Pease\\Houston Aug 2020” > /dev/lcd160200

6) The screen should show (this is a sample from another implementation)

Linux Simple Character Driver

Background/Usage

This is a simple character driver meant as a reference design. A simple character string can be written to the driver and then recalled multiple times.

For example:

echo "Hello Universe" > /dev/simpleCharDevice0
cat /dev/simpleCharDevice0

The cat command will playback the string written. The driver is robust enough to handle the fact that read() doesn’t always return the exact number of bytes as are in the string.

On the user side, the echo command acts as a system write() function, so this is tantamount to a C sysopen() and write(). Likewise the cat is a sysopen() and read().

Usage

Tested on Ubuntu 20.

%  git clone https://github.com/rogerpease/LinuxKernelSimpleCharacterDriver
% cd LinuxSimpleCharacterDriver/src 
% make
% insmod SimpleCharacterDriver.ko  # (if you get an Operation Not Permitted you may need to Sign the module)
% mknod /dev/simpleCharDevice0 c 228 0         
% echo "Hello Universe" > /dev/simpleCharDevice0
% cat /dev/simpleCharDevice0

Notes

The Major Device number hardcoding is a bit antequated.. I know there are ways of dynamically allocating those device numbers. Otherwise, this is essentially a file operation:

// Declare File operations structure.
static const struct file_operations my_dev_fops = {
.owner = THIS_MODULE,
.read = charDriverFileRead, // Called when read() is called.
.write = charDriverFileWrite, // Called when write() is called
.open = charDriverFileOpen, // Called when open() is called.
.release = charDriverFileClose, // called with close() is called.
.unlocked_ioctl = charDriverFileIOCTL,

};

Each filehandle is passed as a context. For example, the read function

ssize_t charDriverFileRead (struct file *filp, char * buf, size_t byteCount, loff_t * fileOffset)

can be used to keep information about where this individual file access left off, just like you can have multiple processes reading a disk file but keeping separate location pointers.

Relevance to work experience

I have done plenty of work on TMS320C6X devices (DMA, HPIs, etc) including Device drivers in assembly, C and C with SYS/BIOS.

Machine Learning Experience

My first encounter with machine learning was a Compute’s Gazette article which provided a Commodore 64 program attempted to predict which of two keys you would press based on your prior inputs. That was probably a simple Recurrent Neural Network. I’ve always had an interest in the topic but the applications for ML have exploded in the past 5 years.

I am still in the learning phases on this topic, so please do not take anything I say here as authoritative or as a claim of expert-level ML experience. However, much as one would expect a programmer in the 1990’s to be familiar with basic Object Oriented methods, I would expect a programmer in the 2020s to be familiar with basic Machine Learning.

Approaches/Problems

Machine Learning Approaches — Various algorithms and what they are used for.

Classes/Online Learning

Machine Learning with Python Machine Learning with Python, Computer Vision (CNNs/Convolutions), Keras Models Tensorflow PyTorch Coursera Final Project — Searching for concrete cracks by modifying a RESNET 18 model and the IBM Digital Certificate IBM Digital Certificate for completing the Coursera series.

Projects

Simple KNN program to predict what points are in a circle
Thompson Scattering is mostly useful for selecting one option from several with varying success rates, such as picking the best one-arm bandit or marketing strategy.
Simple Logic Neuron
Finding a Mole Interviewer
Finding a way through a maze

Potential Projects

FPGA Based Neural Network — Using an FPGA fabric to implement a set of Neurons. Addition and multiplication would be reasonably simple although floating point mutlipliers tend to take a bit of time and area. The RELU activation function could be done with a sign bit while tanh and Sigmoid could be approximated with lines.