How to measure video quality perception

Update 2(01/06/2016): Fixed reference video bitrate unit from Kbps to KBps

Update 1(10/16/2016): Anne Aaron presented the VMAF at the Demuxed 2016.

When working with videos, you should be focusing all your efforts on best quality of streaming, less bandwidth usage, and low latency in order to deliver the best experience for the users.

This is not an easy task. You often need to test different bitrates, encoder parameters, fine tune your CDN and even try new codecs. You usually run a process of testing a combination of configurations and codecs and check the final renditions with your naked eyes. This process doesn’t scale, can’t we just trust computers to check that?

bit rate (bitrate): is a measure often used in digital video, usually it is assumed the rate of bits per seconds, it is one of the many terms used in video streaming.

screen-shot-2016-10-08-at-9-30-26-am
same resolution, different bitrates.

codec: is an electronic circuit or software that compresses or decompresses digital content. (ex: H264 (AVC), VP9, AAC (HE-AAC), AV1 and etc)

We were about to start a new hack day session here at Globo.com and since some of us learned how to measure the noise introduced when encoding and compressing images, we thought we could play with the stuff we learned by applying the methods to measure video quality.

We started by using the PSNR (peak signal-to-noise ratio) algorithm which can be defined in terms of the mean squared error (MSE) in decibel scale.

PSNR: is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise.

First, you calculate the MSE which is the average of the squares of the errors and then you normalize it to decibels.

For 3D signals (colored image), your MSE needs to sum all the means for each plane (ie: RGB, YUV and etc) and then divide by 3 (or 3 * MAX ^ 2).

To validate our idea, we downloaded videos (720p, h264) with the bitrate of 3400 kbps from distinct groups like News, Soap Opera and Sports. We called this group of videos the pivots or reference videos. After that, we generated some transrated versions of them with lower bitrates. We created 700 kbps, 900 kbps, 1300 kbps, 1900 kbps and 2800 kbps renditions for each reference video.

Heads Up! Typically the pivot video (most commonly referred to as reference video), uses a truly lossless compression, the bitrate for a YUV420p raw video should be 1280x720x1.5(given the YUV420 format)x24fps /1000 = 33177.6KBps, far more than what we used as reference (3400KBps).

We extracted 25 images for each video and calculate the PSNR comparing the pivot image with the modified ones. Finally, we calculate the mean. Just to help you understand the numbers below, a higher PSNR means that the image is more similar to the pivot.

700 kbps 900 kbps 1300 kbps 1900 kbps 2800 kbps 3400 kbps
Soap Op. 35.0124 36.5159 38.6041 40.3441 41.9447
News 28.6414 30.0076 32.6577 35.1601 37.0301
Sports 32.5675 34.5158 37.2104 39.4079 41.4540
screen-shot-2016-10-08-at-9-15-24-am
A visual sample.

We defined a PSNR of 38 (from our observations) as the ideal but then we noticed that the News group didn’t meet the goal. When we plotted the News data in the graph we could see what happened.

The issue with the video from the News group is that they’re a combination of different sources: External traffic camera with poor resolution, talking heads in a studio camera with good resolution and quality, some scenes with computer graphics (like the weather report) and others. We suspected that the News average was affected by those outliers but this kind of video is part of our reality.

kitbcrnx2uuu4
The different video sources are visible in clusters. (PSNR(frames))

We needed a better way to measure the quality perception so we searched for alternatives and we reached one of the Netflix’s posts: an approach toward a practical perceptual video quality metric (VMAF). At first, we learned that PSNR does not consistently reflect human perception and that Netflix is creating ways to approach this with the VMAF model.

They created a dataset with several videos including videos that are not part of the Netflix library and put real people to grade it. They called this score of DMOS. Now they could compare how each algorithm scores against DMOS.

netflix
FastSSIM, PSNRHVS, PSNR and SSIM (y) vs DMOS (x)

They realized that none of them were perfect even though they have some strength in certain situations. They adopted a machine-learning based model to design a metric that seeks to reflect human perception of video quality (a Support Vector Machine (SVM) regressor).

The Netflix approach is much wider than using PSNR alone. They take into account more features like motion, different resolutions and screens and they even allow you train the model with your own video dataset.

“We developed Video Multimethod Assessment Fusion, or VMAF, that predicts subjective quality by combining multiple elementary quality metrics. The basic rationale is that each elementary metric may have its own strengths and weaknesses with respect to the source content characteristics, type of artifacts, and degree of distortion. By ‘fusing’ elementary metrics into a final metric using a machine-learning algorithm – in our case, a Support Vector Machine (SVM) regressor”

Netflix about VMAF

The best news (pun intended) is that the VMAF is FOSS by Netflix and you can use it now. The following commands can be executed in the terminal. Basically, with Docker installed, it installs the VMAF, downloads a video, transcodes it (using docker image of FFmpeg) to generate a comparable video and finally checks the VMAF score.

You saved around 1.89 MB (37%) and still got the VMAF score 94.

Using a composed solution like VMAF or VQM-VFD proved to be better than using a single metric, there are still issues to be solved but I think it’s reasonable to use such algorithms plus A/B tests given the impractical scenario of hiring people to check video impairments.

A/B tests: For instance, you could use X% of your user base for Y days offering them the newest changes and see how much they would reject it.

Testing your lua scripts for nginx is easy

A friend and I were extending Nginx by using  lua scripts on it. Just in case you don’t know, to enable lua scripting at nginx you can use a lua module you can read more about how to push Nginx to its limits with Lua.

Anyway we were in a cycle:

  • We did some lua coding.
  • Restart nginx.
  • Test it manually.

And this cycle was repeating until we have what we want. This was time consuming and pretty boring as well.

We then thought we could try to do some test unit with the scripts. And it was amazingly simple. We created a file called tests.lua and then we import the code we were using on nginx config.

package.path = package.path .. ";puppet/modules/nginx/functions.lua.erb"
require("functions")

We also created a simple assertion handler which outputs function name when it fails or pass.

function should(assertive)
local test_name = debug.getinfo(2, "n").name
assert(assertive, test_name .. " FAILED!")
print(test_name .. " OK!")
end

Then we could create test suit to run.

function it_sorts_hls_playlist_by_bitrate()
local unsorted_playlist = [[#EXTM3U
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1277952
    stream_1248/playlist.m3u8
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=356352
    stream_348/playlist.m3u8
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=485376
    stream_474/playlist.m3u8]]

local expected_sorted = [[#EXTM3U
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=356352
    stream_348/playlist.m3u8
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=485376
    stream_474/playlist.m3u8
    #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1277952
    stream_1248/playlist.m3u8]]

should(sort_bitrates(unsorted_playlist) == expected_sorted)
end

I think that helped us speeding up a lot our cycle. Once again, by isolating a component and testing it, it’s a great way to make us productive.

Testing in python can be fun too!

Humble beginning

I come from (lately) Ruby,  Java and Clojure and currently I’m working with some python projects and I was missing the way I used to test my code with rspec.  After a quick research, I found three great projects that helps to make more readable tests, they are: py.test, Mock and sure.

Better assertions

I was missing a better way to make asserts into my tests. The option about using the plain assert is good but not enough and by using sure I could do some awesome code looking similar to rspec.

#instead of plain old assert
assert add(1, 4) == 5
#I'm empowered by
add(1, 4).should.be.equals(5)
[].should.be.empty
['chip8', 'schip8', 'chip16'].shouldnt.be.empty
[1, 2, 3, 4].should.have.length_of(4)

And this makes all the difference, my test code now is more expressive.

Struggling with monkeypatch

The other challenge I was facing was to understand and use monkeypatch for mocks and stubs.  I find easier to use the Mock library even though its @patch looks similar to monkeypatch but I could grasp it quickly.

#Stubing
def test_simple_math_remotely_stubed():
  server = Mock()
  server.computes_add = Mock(return_value=3)

  add_remotely(1, 2, server).should.be.equals(3)

#Mocking
def test_simple_math_remotely_mocked():
  server = Mock()

  add_remotely(1, 2, server)

  server.computes_add.assert_called_once_with(1, 2)

#Stubing an internal dependency
@patch('cmath.nasa.random')
def test_simple_math_with_randon_generated_by_nasa(nasa_random_generator):
  nasa_random_generator.configure_mock(return_value=42)

  add_and_sum_with_rnd(3, 9).should.be.equals(54)

#Mocking an internal dependency
@patch('cmath.mailer.send')
def test_simple_math_that_sends_email(mailer_mock):
  add_and_sends_email(3, 9)

  mailer_mock.assert_called_once_with(
          to='master@math.com',
          subject='Complex addition',
          body='The result was 12')

Make sure you

  1. Are using virtualenv for better lib version managment
  2. Have installed pytest, sure and mock
  3. Git cloned the project above to understand it.

Will we only create and use dynamic languages in the future?

Since I’ve playing with some dynamic languages (Ruby and Clojure), I have been thinking about why would anybody create a new static typed language?! And I didn’t get the answer.

I started programming in Visual Basic and I taste its roots, which are almost all full of procedure commands (bunch of do, goto and end), then I moved to C#, sharper it changes the end’s for }’s and give us a little more power based on some premises: we can treat two different things in the same way, polymorphism. The last static language, but not the least, I used (and I use it) Java, abusing of his new way of treating a set of things equality, the interfaces and using its “powers” on reflections.

Although when I started to use Ruby I saw that I could treat a group of things equality without doing any extra work. I still need to code models and composed types, even though we can create or change them dynamically using “real power” of metaprogramming.

When I start to study and apply the Clojure and its principles, my first reaction was the rejection, how can I go on without my formal objects, how can I design software without a model in the head and so on. I wasn’t thinking about how actually I do software, currently I use TDD to design software and I don’t think what models I need to have, I do think in terms of “what I want”. At minimum, Clojure make me think about, do we really need object to design software?! .  A three days ago I saw an amazing video about similar thoughts: Some thoughts on Ruby after 18 months of Clojure.

Summarising: With my limited knowledge of theses languages, let’s suppose we use a function (which we don’t have source code) and we want to do something before that function is executed (intercept) using: VB I’ll need to check every single piece of code which we call this function and call another one, in Java we can use a AOP framework, in Ruby we can use the spells of metaprogramming. It seems that some frameworks, patterns and extra work aren’t needed more because of this dynamic language evolution.

My conclusions using dynamic languages (Clojure/Ruby) for now it’s: I write less code and reuse them more easy, so I don’t see any reason to create/use a new static typed language, would you see any motivation to do that?

PS: When I use C# (.Net Framework 1.3 – 2.0) it was not so super cool as today.

Clojure resources

Always that I start to learn a new language, I promise to keep the best resources links I found, but it never works. This post suppose to be updated often. Any broken link or suggestion, just comment and I’ll try to fix, add or remove it.

Links, tutorials, guides, documentations, screencasts and etc.

  1. Clojure official site
  2. Installing Clojure, clojure-contrib and setup EMACS.
  3. Vim and “Slime”
  4. VIM for Clojure
  5. Quick-start with examples
  6. VIDEO – Great short introduction videos for Clojure!
  7. VIDEO – Introduction to logic programming with Clojure
  8. VIDEO – Clojure for Java Programmers 1 of 2
  9. VIDEO – Functional programming by UCBerkeley 
  10. VIDEO – Great tutorial for Clojure focused on concurrency
  11. Try code clojure online
  12. Leiningen tutorial for beginners
  13. Midje – A test framework for Clojure
  14. Clojars – Community repository for open source clojure libraries

Books

Testing JavaScript with Jasmine and Jessie and node.js

Testing
Testing JS, can you imagine?

Javascript

JavaScript is a prototype-based, object-oriented scripting language that is dynamic, weakly typed and has first-class functions. It is also considered a functional programming language like Scheme and OCaml because it has closures and supports higher-order functions. Cool language isn’t? A new feeling inside me tells me that everything (related to I.T.) I would like to learn I should start writing a tests. However I thought that JavaScript test was stucked on Alert windows and then I found out Jasmine framework.

How could we write tests to JavaScript?

Using Jasmine in a BDD style.

describe("Jasmine", function() {
  it("makes testing JavaScript awesome!", function() {
    expect(yourCode).toBeLotsBetter();
  });
});

To run it and get feedback you have some options whose I take two: seeing the html file on browser and seeing the terminal result as rspec way. We will do the second way. For that we will use: node.js, npm and jessie.

node.jsevented i/o server (and also you can use it as an interpreter) for javascript vm (specifically v8)
Installing node.js

git clone --depth 1 git://github.com/joyent/node.git
cd node
git checkout v0.4.11 #opt, note that master is unstable.
export JOBS=2 #opt, sets number of parallel commands.
mkdir ~/local
./configure --prefix=$HOME/local/node
make
make install
echo 'export PATH=$HOME/local/node/bin:$PATH' >> ~/.profile
echo 'export NODE_PATH=$HOME/local/node:$HOME/local/node/lib/node_modules' >> ~/.profile
source ~/.profile

npm – node package manager, as its own name suggest. You can use it to install and publish your node programs. It manages dependencies and does other cool stuff.
Installing npm.

curl http://npmjs.org/install.sh | sh

jessie – Jessie is a Node runner for Jasmine. It was created to provide better reporting on failures, more modular design, easier creation of formatters and optional syntactic sugar.
Installing jessie.

npm install jessie

With all these binaries on your path you can just run your tests from terminal typing:

jessie folder_with_specs/ -f nested

In fact I’ve been using the gitub and jessie to learn and apply javascript.

RSpec and Watir to test web applications

Testing is cool

Software testing

Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to, the process of executing a program or application with the intent of finding software bugs (errors or other defects) – Wikipedia

The main intend of this post is, introduce you to UI tests over some ruby toys. In fact you could create an entire project (new) in ruby just to test your legacy web project. It’s cool, you can learn new language and work for the improvement of your legacy product. If you are totally new for ruby maybe a ruby overview can help you. (or might confuse you more)

Installing ruby, watir and rspec

Instead of installing the ruby directly, we are going to install the RVM (Ruby Version Manager) to then install any ruby we need. The steps described here were made on Ubuntu 11.04. On your terminal do the magic to install RVM.

bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)
echo ‘[[ -s “$HOME/.rvm/scripts/rvm” ]] && . “$HOME/.rvm/scripts/rvm” # Load RVM function’ >> ~/.bash_profile
source .bash_profile

And from now on, your life will be better on ruby interpreters versions. Let’s install the ruby 1.9.2. (terminal again)

rvm install 1.9.2

And if we want to see the rubies installed on our machine?

rvm list

And now, how can we chose one ruby to work on the terminal session?

rvm use 1.9.2

For the test purpose we will use Watir and RSpec, great tools for testing, make fun with BDD and the best thing is install them it’s very easy.

gem install watir-webdriver
gem install rspec

Hands-on

Since we have all things installed, we can move for the example. The feature I want to test is the search system of  Amazon. Being more precise, I want to search for ‘Brazil’ and see if the ‘Brazil on the Rise’ is within the results as I want to be sure when I search for ‘semnocao‘ the Amazon doesn’t provide any result. Now, we can write the spec.

require 'amazon_page'

describe AmazonPage do
 before(:each) do
   @page = AmazonPage.new
 end
 after(:each) do
   @page.close
 end
 it "should show 'Brazil on the Rise' when I query for [Brazil]" do
  @page.query 'Brazil'
  @page.has_text('Brazil on the Rise').should == true
 end
 it "should bring no result when I search for [semnocao]" do
  @page.query 'semnocao'
  @page.results_count.should == 0
 end
end

The specification is very simple, it will create a page before each test calling and close the page after each test calling. There is only two tests: test when you search for Brazil  and  when you search for semnocao. We will design the tests using page object pattern. The class bellow is the page which represents the Amazon page and all testable behaviors should be inside of it.

require 'watir-webdriver'

class AmazonPage
 def initialize
  @page = Watir::Browser::new :firefox
  @page.goto 'http://www.amazon.com'
 end
 def close
  @page.quit
 end
 def query(parameter)
  @page.text_field(:id=>'twotabsearchtextbox').set parameter
  @page.send_keys :enter
 end
 def has_text(text)
  @page.text.include? text
 end
 def results_count
    if @page.text.include? 'did not match'
     0
    else
     @page.div(:id=>'resultCount').text.split(' ')[5].gsub(',','').to_i
    end
 end
end

To run this you just need to type on your terminal.

rspec spec/

The final code can be downloaded or viewed at github.

Additions

  • We could improve our story readibility  with Cucumber.
  • We could send the browser execution to an Xvfb server. (A.K.A. running headless) The browser pops up really bothers me.
  • We could integrate it with our CI.
  • We could design a base Page class for provide common operations as mixin or something

ps: the post was very inspired by KK post and Saush.