Facebook Challenges Google, Microsoft In AI, Builds Vision Algorithms For Video

Facebook refuses to remain stagnant and allow Google and Microsoft to excel at artificial intelligence (AI), so it is working on applying machine vision image technology to video, where objects are moving, interacting and continually changing.

Engineers in the Facebook AI Research (FAIR) group call the technique "real-time classification," which they believe can help people search for relevant and important Live videos on Facebook. The technology will have the ability to detect scenes, objects, and actions that could one day allow individuals to search and find real-time narration.

The ability to search for and identify moving images and audio in videos will allow for more precise ad targeting.

The project uses the source code for three of its image machine vision algorithms that Facebook recently made open and accessible to all in hopes of spurring rapid adoption.  

Computer vision technology makes it easier to search for specific images without an explicit tag on each photo, according to Facebook. Those with vision loss can understand the image in the photos that friends share because the technology will tell them through audio, regardless of the caption posted alongside the image.

The AI behind the technology identifies and tags photos and helps read the content of images to visually impaired users who are visiting the site. The three new algorithms driving the social network's advanced are called DeepMask and SharpMask, which have enabled Facebook AI Research (FAIR) machine vision technology to detect and precisely delineate each object in an image, as well as MultiPathNet, which label each object. Those three algorithms, along with the related research papers, are now available.

The algorithms find patterns in the pixels to identify objects. The image is encoded as numbers representing color values for each pixel. Trained neural networks with millions of parameters programmatically identify the objects based on defined rule-based systems. It learns from each image seen.

Very simply put, DeepMask identifies the objects and "blobs," SharpMask refines and generates a sharper mask of the image that more accurately defines the object's boundaries. Then MultiPathNet more accurately identifies the objects defined by each.

Next story loading loading..