Olympic Games Rio 2016


Motivated by a friend, we’ll share bits of our experience during the Olympic Games Rio 2016. Before starting, I would like to clarify that Globo.com only had rights for streaming the content to Brazil.

We used around 5.5 TB of memory with 1056 CPU’s across two PoP’s located on the southeast of the country.

Screen Shot 2016-08-23 at 3.03.30 PM
Audience during the game BRA x SWE.


Not so long; I’ll read it

The live streaming infrastructure for the Olympics was an enhancement iteration over the previous architecture for FIFA 2014 World Cup.

Untitled Diagram (4)

The ingest point receives an RTMP input using nginx-rtmp and then forwards the RTMP to the segmenter. This extra layer provides mostly scheduling, resource sharing and security.

The segmenter uses EvoStream to generate HLS in a known folder watched by a python daemon and then this daemon sends video data and metadata to a cassandra cluster, which is used mostly as a queue.

Now let’s move to the user point of view. When the player wants to play a video, it needs to get a video chunk, requesting a file from our front-end, which provides caching, security, load balancing using nginx.

Network tip:

Modern network cards offers multiple-queues: pin each queue, XPS, RPS to a specific cpu.


When this front-end does not have the requested chunk it goes to the backend which uses nginx with lua to generate the playlist and serve the video chunks from cassandra.

Caching tip:

Use RAM to cache: a dual layer caching solution, caching the hot content (most current) on tmpFS and the colder content (older) on disk might decrease the CPU load, disk IOPS and response time.

You can find a more detailed view about the nginx usage at a two part article posted at nginx.com: caching and micro-services and a summary from Juarez Bochi.

This is just a macro view, for sure we also had to provide and scale many micro services to offer things like live thumb, electronic program guide, better usage of the ISP bandwidth, geofencing and others. We deployed them either on bare metal or tsuru.

In the near future we might investigate other adaptive stream format like dash, explore other kinds of input (not only RTMP), increase the number of bitrates, promote a better usage of our farm and distribute the content near of the final user.

Thanks @paulasmuth for pointing out some errors.

From LXC to docker-machine and cloudery

Attention: this post provides a very quick and simplistic (but functional) vision of the promised title.


In the beginning

Linux is a fantastic OS, it has more than we imagine and it still manages to get better. There is a feature called cgroups:

which provides a mechanism for easily managing and monitoring system resources, by partitioning things like cpu time, system memory, disk and network bandwidth, into groups, then assigning tasks to those groups

Let’s say we created a cgroup with: 50% of cpu, 20% memory, 2% of disk and a virtual network with 100% of bandwidth, now we can run our application under that cgroups restrictions.

Another cool feature of Linux is LXC (linux-containers):

which combines kernel’s cgroups and support for isolated namespaces to provide an isolated environment for applications

Now we’re able to provide a Linux machine capable of running multiple applications that run in isolation (like if there was an isolated OS for each application). This sounds like something we achieved with virtualization (app-level, os-level, cpu-level and so on) but faster and cheaper and without the overhead of running multiple kernels.



Docker is:

an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on Linux. This is what Docker is but remember, it is not perfect.

The highlighted part is very interesting, docker will provide you a layer of abstraction that allows you to create and deploy your application within a container (an isolated, resource managed place to run processes) in a standardized way.

Docker machine, compose and so on

Life almost always get easier with abstractions, we (developers) don’t worry about how disks works (drivers) or even how a package left your pc and hit another one (we should know how this works :P). Our productivity had increased a lot since we relied on these abstractions.

And this is the same for the docker ecosystem, as we start to use it more often. We create best practices, solve issues with workarounds and etc, some of these will become part of the docker solution.

  • docker-machine: An application needs a machine to run regardless if it’s local, physical, virtual or in the cloud.
  • docker-compose: An application needs a way to declare its dependencies, either packages or distinct services like datastore.

Step 0: get ready

  1. If you’re on MacOS/Windows you’ll need to install VirtualBox or VMWare
  2. If you’re on MacOS/Windows install docker toolbox otherwise apt-get them all

Step 1: create the app

Let’s say we’ll create a rails 4 application with mongo.

Step 2: declare the app and its dependencies

We declare our dependencies by using two files: docker-compose.yaml and Dockerfile. In the Dockerfile we’ll describe how our machine should be (aka: all need packages and stuffs).

Then we can move to its broad services dependencies, like database or even web server. We’ll use mongo as datastore and nginx as the web server.

Step 3: deploy it locally

We need to create a machine for it and then we need to run it.

Step 4: deploy in the cloud

The same way we created a machine to run our app locally ,we can create any number of machines to run this application, even in cloud environment such as digitalocean, aws, azure, google and etc.

That’s it 🙂 for a more explained rails app docker workflow read this great post or yet a fresh new example of docker-compose.yaml.

// TODO: some things

Let’s suppose we just created a staging environment and another developer come to help us, it seems that there is no an official way to share our created machine (amazon, google app engine, azure, digital ocean…) with team members. There are some workarounds but it’ll be nice to see this becoming a feature.


  • Useful commands to troubleshooting, exploration and debug:
    • To enter on a machine: $ docker-machine ssh staging (either local or cloud)
    • To enter on a container: $ docker-compose run db bash (either local or cloud)
    • To list files within a container: $ docker-compose run db ls -lah data/db
    • To edit/add/remove data on mongo: $ mongo –host DOCKER_IP
  • If you face any error like E: Failed to fetch … during the docker-compose build try it again
  • If you face any error like “Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded” during any deployment, try to download docker-toolbox again and install it.

Google is your friend.

presentation – Live Video Platform for FIFA World Cup

In this talk, we will describe globo.com’s live video stream architecture, which was used to broadcast events such as the FIFA World Cup (with peak of 500K concurrent users), Brazilian election debates (27 simultaneous streams) and BBB (10 cameras streaming 24/7 for 3 months) .

NGINX is one of the main components of our platform, as we use it for content distribution, caching, authentication, and dynamic content. Besides our architecture, we will also discuss the Nginx and Operational System tuning that was required for a 19Gbps throughput in each node, the open source Cassandra driver for Nginx that we developed, and our recent efforts to migrate to nginx-rtmp.

presentation QCon 2015 – ptBR



In this presentation you’ll see how we developed (what we used) the live video platform for the FIFA World Cup 2014. It shows how we made it scalable using lots of open source solutions.

Keywords: linux, cassandra, nginx, redis, BGP, logstash, graphite, python, ruby, lua

How To Optimize Nginx Configuration for HTTP/2 TLS (SSL)


http/2 over tls with nginx is already a reality, how can we achieve the best performance of it? check the example configuration.


We all know that http/2 is right here and although it doesn’t impose the TLS usage, the major browsers already took their side (a.k.a only supporting http/2 over TLS).

The support for http/2 was released with nginx 1.9.5 (except for “Server Push”). But isn’t HTTPS a lot slower than good old HTTP? Well, this is not easy to answer but we can fine tune nginx to do much better than the default configuration.

I really believe that the biggest fight is against latency not CPU load, the tips you’ll see here are mostly about reducing RTT in order to decrease latency.

tls ssl does not need cpu

Before we move on to the practical tips, let’s revise the simple tasks you must to do first:

  1. upgrade to the latest kernel (3.7+)
  2. upgrade to the latest openssl (1.0.1j)
  3. upgrade to the latest nginx (1.9.5)

These tips above will get you a lot of improvements but let’s go to the optimization tips:

TLS session resumption


When you want to use HTTPS, your browser needs to negotiate the session (certificate, cipher, hash algorithm, tls version, key …), in a very simplistic way it does follow the steps:

  1. Establish a TCP connection (SYN, SYN/ACK, ACK)
  2. Negotiate and establish the TLS session

When you leave the site and come back later, the browser will need to renegotiate the session. TLS session resumption is the technique to partially skip this negotiation by persisting the session for later usage.

The left graph represents an over simplified version of a full TLS handshake (skipping TCP handshake) and on the right side you can see how TLS resumption works, the point is to skip RTT.

tls negotiation  tls negotiation resumption


If we skip part of the session negotiation we’ll delivery fast content.


We do have two ways of solving this issue: saving the session (TLS) on server (session cache) or preferable on client (session ticket).

session cache

ssl_session_cache   shared:SSL:10m;
ssl_session_timeout 1h;

In this case when client tries to reconnect, the server will try to recovery past persisted session skipping partially the negotiation. With this shared session (of 10m), nginx will be able to handle 10 x 4000 sessions and the sessions will be valid for 1 hour.

However, there are problems with this approach:

  1. sessions are stored on the server;
  2. the “shared session” is saved in each server, so multiple nginx’s will not share the same session;

For the second problem, the great project openresty is about to release a new feature (ssl_session_store_by_lua) which will enable us to save these sessions in a “central” repository (like redis).

session ticket

# $> openssl rand 48 > file.key
ssl_session_tickets on; 
ssl_session_ticket_key file.key;

In this case, the server will create a ticket and send it to the client, when the client tries to connect again it’ll use the ticket and the server will just resume the session.

Nginx comes with session tickets enabled by default but if you will deploy your application in more than one box (bare metal, cloud, virtual machines, containers …) you’ll also need to specify the same key (used to create the tickets), you should rotate this key often.

Although this approach is much better than session cache, not all browsers support this so you might need to offer both solutions.

TLS false start


How about to have the same benefit (skipping RTT) as in TLS resumption but when the browser first negotiate with the server?! This is possible by using and enforcing the forward secrecy.

Instead of waiting the last handshake step from server, the browser will already send the data (request) and the server will reply with the data (response). This technique is known as TLS false start.

tls negotiation  tls false start


Less RTT means faster site/video/image/data to final users.


This is possible because we can extend the tls protocol by instrument it with specific ciphers.

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;

OCSP stapling and certificate chain


Creating trust is a hard task to do and part of the responsibilities of TLS is to enforce it. In order to establish this trust the browser (or you OS) needs to at least have one point to trust.

In a very simplistic way, your browser believes that http://www.example.com is which it claims to be based on a chain of trust, it checks by:

  1. looking at its certificate and then checking if its signature is valid (checking all the certificates until the ROOT)
  2. looking if the certificate is not revoked by either searching it in CRL (certificate revocation list) or issuing a new request OCSP (Online Certificate Status Protocol)

tls chain

Both steps might force your browser to do more RTT, if your server doesn’t provide the intermediate certificates the browser will need to download them and it might even request an OSCP (requiring more RTT: DNS, TCP_HANDSHAKE, TLS_HANDSHAKE).


Less RTT means faster site/video/image/data to final users 2.


You can concatenate your certificate with its chain (except ROOT, it’s not necessary, in fact the browser won’t trust your ROOT) then avoiding extra RTT to download the missing certificates.

$ cat mysite.cert ca1.cert > full.cert
ssl_certificate /path/to/full.cert

You can set up nginx to avoid OCSP by stapling (this “staple” is digitally signed which makes possible for your browser to check its authenticity) the OCSP response on your server then you will avoid extra RTT for OCSP.

ssl_stapling on; 
ssl_stapling_verify on; 
ssl_trusted_certificate /path/to/cas.pem

TLS record size optimization


Although the maximum size of a TCP packet is 64K, a new TCP connection starts with much less than this maximum.

And each TLS record can hold at maximum 16K (which is the default size for nginx), summing up this size plus the headers of tcp and ip the server might need to make 2 RTT to serve the first bytes. And that’s not cool.

TCP is great but it has limitations, it is not ideal for all kinds of applications and there is even “quic” efforts to make web faster with experimentations using UDP instead of TCP.

Since we’ve reached our current speed limits, light speed, (who knows what “quantum” can do) we’re moving to avoid extra RTT.

*you can’t use QUIC on nginx yet.


Less RTT means faster site/video/image/data to final users. 🙂 Again!!!


There is a tradeoff here, you can either chose throughput (TLS record size to max) or latency (a small record size). It would be great if nginx could offer an adaptive option, starting small (4K, to speed up the first bytes) and after 1 minute or 1MB it increases for 16K.

ssl_buffer_size 16k; #for throughput, video applications
#ssl_buffer_size 4k; for quick first byte delivery

HSTS (HTTP Strict Transport Security)


HSTS “converts” your site to a strict HTTPS-only, it eliminates unnecessary HTTP-to-HTTPS redirects by shifting this responsibility to the client, most of the browsers support it. Even if you forgot to change http for https the browser will do that for you.


Redirects means more RTT, yeah I know it’s getting repetitive but it’s to reduce latency.


A simple http header to instruct the browser.

add_header Strict-Transport-Security "max-age=31536000; includeSubdomains"; 
#'max-age' values specifies how long browser should follow this rule.

Why Chrome doesn’t show/accept http2?

Users of the Google Chrome web browser are seeing some sites that they previously accessed over HTTP/2 falling back to HTTP/1. You can check what, why and how at nginx site.


It’s all about making web faster avoiding RTT (later I’ll post tips specific to http/2), so here’s a check list:

  • Upgrade to the latest: kernel, openssl and nginx.
  • Use TLS resumption and TLS false start
  • `cat` your certificate with the intermediates
  • Think about the best size (hard) for you TLS record
  • Enforce HSTS 😉

Here’s the full config example:

# command to generate dhparams.pen
# openssl dhparam -out /etc/nginx/conf.d/dhparams.pem 2048

limit_conn_zone $binary_remote_addr zone=conn_limit_per_ip:10m;
limit_req_zone $binary_remote_addr zone=req_limit_per_ip:10m rate=5r/s;
limit_req_status 444;
limit_conn_status 503;

proxy_cache_path /var/lib/nginx/proxy levels=1:2 keys_zone=backcache:8m max_size=50m;
proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
proxy_cache_valid 404 1m;

upstream app_server {
  server unix:/tmp/unicorn.myserver.sock fail_timeout=0;

server {
  listen 80;
  server_name *.example.com;
  limit_conn conn_limit_per_ip 10;
  limit_req zone=req_limit_per_ip burst=10 nodelay;
  return 301 https://$host$request_uri$is_args$args;

server {
  listen 443;
  server_name _;

  limit_conn conn_limit_per_ip 10;
  limit_req zone=req_limit_per_ip burst=10 nodelay;

  ssl on;

  ssl_stapling on;
  ssl_stapling_verify on;
  ssl_trusted_certificate /etc/nginx/conf.d/ca.pem;

  ssl_certificate /etc/nginx/conf.d/ssl-unified.crt;
  ssl_certificate_key /etc/nginx/conf.d/private.key;
  ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
  ssl_dhparam /etc/nginx/conf.d/dhparams.pem;
  ssl_prefer_server_ciphers on;
  ssl_session_cache shared:SSL:10m;
  ssl_session_timeout 10m;

  root /home/deployer/apps/example.com/current/public;

  gzip_static on;
  gzip_http_version   1.1;
  gzip_proxied        expired no-cache no-store private auth;
  gzip_disable        "MSIE [1-6]\.";
  gzip_vary           on;

  client_body_buffer_size 8K;
  client_max_body_size 20m;
  client_body_timeout 10s;
  client_header_buffer_size 1k;
  large_client_header_buffers 2 16k;
  client_header_timeout 5s;

  add_header Strict-Transport-Security "max-age=31536000; includeSubdomains"; 

  keepalive_timeout 40;

  location ~ \.(aspx|php|jsp|cgi)$ {
    return 404;

  location ~* ^/assets/ {
    root /home/deployer/apps/example.com/current/public;
    # Per RFC2616 - 1 year maximum expiry
    # http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
    expires 1y;
    add_header Cache-Control public;
    access_log  off;
    log_not_found off;

    # Some browsers still send conditional-GET requests if there's a
    # Last-Modified header or an ETag header even if they haven't
    # reached the expiry date sent in the Expires header.
    add_header Last-Modified "";
    add_header ETag "";

  try_files $uri $uri/index.html $uri.html @app;

  location @app {
    proxy_set_header X-Url-Scheme $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # enable this if you forward HTTPS traffic to unicorn,
    # this helps Rack set the proper URL scheme for doing redirects:
    proxy_set_header X-Forwarded-For-Forwarded-Proto $https;

    proxy_set_header Host $host;
    proxy_redirect off;
    proxy_pass http://app_server;

  error_page 500 502 503 504 /500.html;
  location = /500.html {
    root /home/deployer/apps/example.com/current/public;

With this configuration I was able to get an A+ at SSLlabs, this is a useful tool where you can check what you need to do to make your SSL site better, it gives you specific tips (apache, nginx, IIS).

tls a+
I strongly recommend you:

Put the bricks together on gnu linux

Why Linux?

It is easy, flexible, very secure, updated, “free”, open source… and the list goes on however what makes me more happy on this little world *unix it is its flexibility.

The power

Let start with a simple command, like ps, which can show you a list of current process and its details.

ps -aux

Now, I would like to just shows the output lines which has usr on it. To do that I will use the output of ps -aux command as input to grep filter the data. On linux you can pass the output of a command to another by using a pipe. So we can redirect the last output as input to another one.

ps -aux | grep usr

Now we have the output from ps filtered by only lines which contains ‘usr’. But what if want only the process id’s. There is a program for that too, the cut.

ps -aux | grep usr | cut -d " " -f 8

This program cut is simple creating fields (-f) delimited (-d) by a space (” “) and I ask the program to pick the field 8, which is the process ids. But if you notice there is some lines without number. The next step is remove this lines without values. We can do it by using sed program.

ps -aux | grep usr | cut -d " " -f 8 | sed '/^$/d'

Now our output has only lines with numbers, sed program receives a simple regex and says delete this pattern. Nice but I just need those numbers ending with 2. We can grep again.

ps -aux | grep usr | cut -d " " -f 8 | sed '/^$/d' | grep $2

I also want to sum all this numbers and show on the screen the result. To achieve that we are gonna use awk program.

ps -aux | grep usr | cut -d " " -f 8 | sed '/^$/d' | grep $2 | awk '{ sum += $1; print "+" $1} END {print "_____" ; print sum}'

The first code (surrounded by brackets) will create a variable called sum and it also prints plus and each first argument passed to it. This first block will be called for every line we pass (processed by all the programs we did it) and it follows by END that will print a line and finally prints the sum of all this stuffs.


You can use programs connect them produce output that only one program can not do. PS: I know I could too all this with less programs/commands but the intend here was not teach linux/GNU programs it was show you how powerful can be linux.

Bonus round

You can put this output to a file just using the redirect to a file. ( > or >> to append)

ps -aux | grep usr | cut -d " " -f 8 | sed '/^$/d' | grep $2 | awk '{ sum += $1; print "+" $1} END {print "_____" ; print sum}' > file.txt