Machine Learning as a Video Filter

Machine learning has a lot of possibilities in almost any software moving forward. You could use it to learn your preferences in apps, to train an AI how to understand your voice, and there are a lot of clever things you can do when dealing with imagery.

I started looking into using machine learning as a video filtering tool.

There is a good rundown and sample code here: https://github.com/affinelayer/pix2pix-tensorflow which is what I used for this session. This page says that it requires Linux, but you can use it in either Windows or macOS. You need a CUDA-capable processor, which is most recent NVIDIA video cards. If you’ve been experimenting with VR, you likely have a compatible video card.

The basic premise of my usage of pix2pix was to give the algorithm frames from video to compare, to have that made into a “model” which is like a list of rules, and to then give it fresh images to run those rules on. ie, compare frame 1 of video to what has changed in frame 2 of that same video, do that for the rest of the frames in the video (2>3, 3>4, and so on), once they’ve all been looked at, use what was learned to do the same things that changed from 1>2 to your new image.

So far this merely adds stripe artifacts and some glitching appropriate for a black metal music video. Subtle, and not too useful so far.

The sample code I was using has a lot of extra pieces attached in order to save out a web gallery, which I don’t need, and has image conversion to format peoples images to show better results. Since I know how to get the images into the format and sizing that I want, and don’t want a web site of my results, I’m working on stripping all of that out.