Rosella       Machine Intelligence & Data Mining

Computer Vision for Single Board Computers: Nvidia Jetson, Orange Pi 5, LatterPanda, Raspberry Pi 4B

Computer Vision and Edge computing is a very promissing application area of Single Board Computers (SBC) such as Nvidia Jetson, LattePanda, Orange Pi, Raspberry Pi, etc. Intelligent surveilance and monitoring functions can be embedded into SBC applications, achieving edge computing. To see what applications you can develop with computer vision, you need to understand what computer vision models can do. It is important to understand that current computer vision technology has limitations. So limited applications are possible. Common computer vision modeling types include;

  • Image classification: Given an image, this provides classification information. This gives probability of each object class. For example, bird/35%, car/15%, horse/5%, etc. Of course, object with the highest probability is the object class/type of the image.
  • Image regression: Given an image, this gives single or multiple numerical output values. For example, probability of being cancerous, temperature, left/right moves, etc.
  • Object detection: Given an image, this gives bounding box information of detected objects, such as probability, X/Y coordinates, width, height. It can also provide detected object class/type.
  • Similarity regression of two images: Given two images, this gives (whatever) probability of two images. One such application can be face recognition whether two images are of the same person.

Vehicle/Car Detection Convolutional Neural Network

Person Detection Convolutional Neural Network

Computer Vision Modeling

Convolutional Neural Networks (CNN) are used to model computer vision tasks. To do computer vision modeling, fairy good understanding of CNN is essential. Computer vision model development involves the following stages;

Computer vision machine learning is a very complex process. You need powerful but still easy to use and learn Machine Learning Software. CMSR Machine Learning Studio is a fully GUI based machine learning model development platform. You don't code anything until you embed CMSR generated model codes into your application projects. You just need to call a model function from your main program. Users can train machine learning models without coding. It's very easy to use and comes with powerful features. It provides the following types of computer vision modeling tools;

  • CNN: Convolutional neural network for image classification and class probability.
  • FCN: Fully convolutional network for image classification and class probability. Same as CNN.
  • M-CNN: Multi-value output CNN. It's a regression modeling algorithm.
  • OD-CNN: Object detection CNN. It detects objects and provides bounding box information. This is very similar to YOLO. You can develop your own YOLO models with this.
  • T-CNN: Twin CNN for similarity prediction for such as face recognition.
Free Codingless Computer Vision Development / Modeling Software Download

Free download of CMSR Machine Learning Studio is available for computer vision developers with free (limitted) technical support.

For free downloads, please visit CMSR Download/Install.

Powerful GPU and Large RAM Memory Computer is Essential!

Computer vision is extremely compute intensive. Especially training will take huge computing time on powerful computers. You cannot do on Nvidia Jetson SBC! Computer with powerful GPU and large memory RAM is essential. High performance gaming computers are ideal for machine learning. However this doesn't prevent you developing computer vision models since you can develop small models. More shading GPU cores are better as CMSR Studio can take advantage of bigger shading cores such as Nvidia Cuda cores. CMSR employes fine grained data parallelism. On 896 Nvidia Cuda cores, we observed 165 times fast than single CPU core model training. With more Cuda cores, it can get over 1,000 times fast.

Model training is done with randomized order images. Otherwise, models will develop skews towards later images. To read images randomly, all training data images must be brought into main memory. Otherwise training will be extremely slow. You can estimate needed total RAM memory size in bytes with the following formula;

   total image dataset size = ((image width) * (image height) * 3 + 3) * (number of images)

This should be your maximum image dataset size. For your computer RAM, it should be about twice of this size as OS will also use RAM. If you don't have large memory, you will have to content with small image training datasets. Note that this is CPU RAM size. GPU VRAM size is different. It can be much smaller as GPU VRAM stores only model parameters and some extras.

Model Code Generation for Embedded Applications

Forget about ChatGPT thing! CMSR ML Studio can generate highly efficient AI ML codes.

CMSR ML Studio provides easy embedding into applications. Just generate program codes and compile with your project codes. Just call a function from your main code. CMSR can generate the following type codes;

  • Single CPU thread: Java, C, Swift.
  • Multicore CPU: C++, Java.
  • GPU: OpenCL (Java, C++), Cuda (Java, C++), OpenGL ES3 (C++), Metal (Swift, Objective-C).

For a generated program example, please see CMSR Generated Program Example: C++ OpenCL.

Nvidia Jetson

Nvidia Jetson single board computers are about GPU power. Their CPU cores are average. If your applications demand huge GPU power, Nvidia Jetson SBCs are the ones to go. However, Jetson units are bulky and consume high power and generate more heat. So there are tradeoffs. If your applications need to process large images more than several times a second, Nvidia should be considered seriously. Otherwise Orange Pi 5 may do your job.

AMD Ryzen Embedded with Vega GPU

AMD's embedded processors with Vega GPU cores are quite impressive. Vega 8 GPU has 512 shader cores while Vega 3 has 192 cores. They are powerful for computer vision applications. There are a number of companies producing single board computers based on AMD Ryzen Embedded processors.

Orange Pi 5

Orane Pi 5 is very impressive. It's very suitable for edge computing and computer vision. We tested 25 million parameter object detection computer vision model on Raspberry Pi 4B and Orange Pi 5 computers. Results are as follows;

  • Raspberry Pi 4B: 27 seconds (using 4 CPU threads).
  • Orange Pi 5: 14.5 seconds (using 4 CPU threads).
  • Orange Pi 5: 13 seconds (using 8 CPU threads).
  • Orange Pi 5: 0.672 seconds (using OpenCL GPU).

Orange Pi 5 OpenCL GPU performance is very impressive. If you don't need Nvidia Jetson-like GPU power, Orange Pi 5 is the one to go. It's suitable for large computer vision models. When 4 or 8 CPU threads are used, average power consumption is 11 watts during 13 or 14.5 seconds. When OpenCL GPU is used, average power consumption is only 8 watt during 0.68 second. GPU is more than 26 times more power efficient than CPUs. So the winner is Oprange Pi 5 OpenCL GPU. Note that you can enable OpenCL on Debian and Ubuntu for Orange Pi 5. Installation details can be found from the following link;

LattePanda

LattePandas are Intel-based single board computers. So it can run Windows as well as Linux. Naturally Windows and Linux applications can be run natively. Internal Intel GPUs also perform well. They have similar performance as Orange Pi 5 GPU. Definitely OpenCL should be supported. This is very important factor for computer vision applications. Downside is than Intel CPUs tend to produce lots of heat and consume more power. So Intel-based SBCs are much bigger than Orange Pi 5.

Raspberry Pi 4B

For Raspberry Pi 4B, the best option is C++ 4 CPU Threads. It will give the maximum speed. We tested performance on Raspberry Pi 4B 4GB. With a single CPU thread, 25 million super parameter object detection deep neural network takes 87 seconds to complete. With 4 CPU threads, it takes 27 seconds. It's about 70% reduction in elapsed time. 5% less from the perfect reduction of 75%. If you use small neural networks with less than 1 million parameters, it will take only about a second or less. That's not bad performance. So making neural network small is the way to go. Note that elapsed time is roughly proportional to trainable paramaters.

The major drawback of Raspberry Pi SBCs is the lack of GPU compute support such as OpenCL and OpenGL ES3. Because of this, performance of computer vision tasks will suffer. So we have to rely on multiple CPU cores.

Embedding Models into Applications

CMSR generated codes are of very high efficiency. Especially in edge computing, efficiency and speed is one of the most important factors. All you need to code is what you actually use them. The following code shows usage of CNN classification model in C++. You will create a couple of arrays to receive results. Then call the main evaluate function with parameters. You can repeat "evaluate" function as much as you need in your applications. Of course, your applications should get image data from onboard camera!

#include <iostream>
#include "CMSRModel.hpp"
using namespace std;

int main(void) {
	char filename[] = "data/modelfile.cnn";
	char imagefile[] = "data/cnnimages.rgb";

	int IMAGEARRAY[64*64*3];
	int outLabelCount = 4;
	int outLabelIndices[5];
	float outLabelProbabilities[5];
	int blackandwhite = 0;
	int r0g1b2 = 1;

	CMSRModel *model = new CMSRModel();
	model->verbose = true;

	// initialize model;
	model->initializeModel(4, filename);

	// The following steps can be repeated many times;
	model->populateImageArray((int*)IMAGEARRAY, imagefile, 64*64*3); // you can get data from camera!
	model->evaluate (
		outLabelCount, /* result label count */
		outLabelIndices, /* ordered result output label indices */
		outLabelProbabilities, /* ordered result output label relative probabilities */
		blackandwhite, /* 1 if black and white, otherwise 0 */
		r0g1b2,        /* 1 if IMAGEARRAY[][][0] is red, otherwise 0 */
		IMAGEARRAY   /* [row/height][column/width][colors] */
	);
	cout << "Results;\n";
	for (int i=0; i < outLabelCount; i++) {
		cout << i << ": " << outLabelIndices[i] << " / " << outLabelProbabilities[i] << "\n";
	}

	// release memory resources;
	model->releaseMemoryResources();
	delete model;

	cout << "End.\n";

	return 0;
}