Sophisticated SSH server with AI and camera capabilities

Sophisticated SSH server with AI and camera capabilities

A Look at Emerging Technologies

This project is a combination of multiple camera and computer-vision technologies to form a cohesive demonstration. It is possible to connect via either a CLI shell or Web-based GUI to command an SSH server. This allows for the interaction with a connected camera or to analyze captured video and image data. This demonstration is not intended to be a complete commercial product or production ready solution. Rather, it is a novel way to demonstrate emerging technologies in a realistic fashion. This is intended to serve as a proof-of-concept rather than a practical application.

Here I will demonstrate some of the more novel features implemented in this project. The features have been divided into three categories:

  1. Networking Connectivity and Development

  2. AI (More specifically features that take advantage of LLMs and Machine Vision, local to the server)

  3. Classic DVR/NVR Security features like recording video and capturing images

Initially this was just going to be a secure text based application before I added the web-based GUI (WebUI). That being the case, let’s look at it first. The server-end can be accessed across a network via SSH. This is an industry standard for interacting securely with computers and equipment, commonly in a text-based environment. This gets the job done but I wanted to create a dashboard to easily access the features without having to type in and remember a bunch of commands.

By using a WebUI dashboard, it’s easy to mock-up a closer approximation to what the final product could look like. This could feasibly be run on a phone or tablet.

I am not a programmer by trade but I have always had a fascination with coding and development. The client end is built on python while the server side is python for the flask backend and javascript and HTML for the frontend WebUI.

My process included a lot of LLM assistance. This is becoming more and more common practice in coding/development work. Although this is controversial, and is shunned in some circles I don’t think I could have gotten this off the ground without using these tools. I have the utmost respect for the experts in this field that don’t have to rely on LLMs.

The term AI means so many different things to different people in the modern context. It has become a catch-all buzzword that is overused and vague. However for the sake of simplicity, I’ll use it here to refer to features that take advantage of Local LLMs and Computer/Machine Vision.

Here is a list of the AI features in the app:

  • Facial Analysis (Age, gender, emotion)

  • Image Interrogation - This gives a detailed textual description of an image

  • Face detection - Ability to recognize a face

  • Object Recognition - Differentiate objects in an image and name them

Fundamental DVR/NVR Security/Surveillance features:

  • Camera Image capture

  • Video Recording

This is a work in progress and in conclusion I would like to share some of the features I would like to add or expand upon.

future feature-set:

  • Complete overhaul of the handling of username and password credentials

  • Testing for use across remote networks

  • Pose Estimation

  • Expanded Emotion Recognition (Numerical Score)

  • Improved object detection

  • Action detection (what the person is doing)

  • Hand tracking (could be linked to commands or used for analytics/telemetry)

  • improved face detection

  • WebGUI ported to android or IOS app.

  • Server-side could be de-bloated to run on SBCs, Low powered devices, and/or motorized equipment.

Currently the whole stack can be run on a single machine or two machines on the same network. If not already posted at the time this is published, I intend to upload the code to a Git-repo.