Google Video Intelligence API film labelling

The Video Intelligence API allows you to analyze the content of videos. For my use case this is super useful, because now I can
  • Label videos with their content, and use those labels to navigate to the points in the video they appear
  • Detect scene changes, and navigate or jump to each scene in a video
  • OCR text discovered in a video. For TV advertisments, this helps to identify the produce being advertised
  • Get labels for each segment in the video, to identify the general content of the video
  • There are other capabilitities such as object and logo detection, as well as transcription, which I may use in the future.
  • Deduplication of alternate cuts or formats of the same the film
  • Identification of films with similar content

Modes of analysis

You can analyze content either by loading the content to cloud storage (I cover that in More cloud streaming) or by analyzing content real time (streaming video intelligence). The Streaming video intelligence is a little newer, and doesn’t have quite the depth of capabilities as the analysis on a file in cloud storage. It also produces less result depth, so I’m sticking with the regular variant.

The UI

For this example, I’m using an ad from the Guardian, publicly available here.
Here’s the result of the analyzed video in the UI of my app.
Clicking on any label to takes you to that point in the video.

How is it done

The workflow is
  • If there are no labels already available. The user chooses to analyze a film. The UI sets off a graphQL mutation, and the GraphQL API publishes a pubsub message. It’s a longish running task so it can’t really be done interactively.
  • A process running in my Kubernetes cluster subscribes to the message and kicks off a process that finds the best known quality version of the film (you get more labels with higher confidence with a goof quality copy), uploads it to cloud storage, runs it through video intelligence and mutates the result via the GraphQL API.
  • In the meantime the UI is watching the progress of the workflow via a GraphQL subscription and keeps the user updated on where it’s up to.
  • Eventually the labels are available to the UI.
  • Once analyzed they are available for all future accesses.

Doing the labelling

This is the entire process from receiving a message, but I’m only going to cover the labelling part in this article. I’ll cover the rest in other articles.

There’s a couple of wrappers to make the results from each stage consistent. The secrets object contains a GCP service account that has credentials authorized to run the video intelligence API and access cloud storage.

The labelling

You can do a number of feature detection in the same run. Here I’m doing shot changes, labels and text detection. A couple of points of interest are.
  • The segment object is pretty standard for each result type, and consists of a start offset an end offset and a confidence value.
  • This is a ‘google long running operation’ which has a pretty funky interface to get the result. See here for my write up on it, and below for it in action
And that’s about it.  Just pass the uri on cloud storage, tell it the features required, unravel the rather complex responses and you’re done.

Confidence

By default, the confidence level in the UI is 75%, but all annotation labels are stored in the database. We see a lot more labels when the slider is moved to 20%. I’m not sure what the best setting is for this yet.

What else

Dealing with different versions of a video and identifying if they are indeed the same can be very hard. Of course if they are the same encoding and exactly the same cut (in other words the exact same file), then you can use the md5 digest, but more often than not they won’t be. Using labelling, confidence scores and shot changes can be a good way to de-deuplicate different formats or even cuts of the same film. But that’s for another post.

More

Since G+ is closed, you can now star and follow post announcements and discussions on github, here