Friday, December 5, 2014

Docker: Containers != Images


So, you're learning Docker, but every video you watch and blog you read uses the terms containers and images seemingly interchangeably. They say things like, "Docker allows you to share your container throughout your environments as an immutable application." However, this isn't accurate.

You can't actually share a container at all. The container only exists as an instance on a node. This instance can be running or stopped, but it is an instance none-the-less. You can "export" a container to a tar file, but when you "import" it back in, it's an image. That's because it's not an instance. A container is an instance of an image. The image is what you share.

An image is actually one or many layers of images, including the actual image. So, you can have a base image that consists only of that single image, or you can have an application with dozens of images. Each command in a Dockerfile actually creates a new image underlying the final image. These images will always remain with the final image unless you "export" a container, which then flattens all of the images into one.

The benefit of this layering is that you can base all of your apps on the same base image and that base image only exists once on the system, which helps with storage capacity. The images are also uploaded to Docker Hub, but the base images can be skipped if the Hub already holds a reference to that image. This will speed deployments, as only the changes need to be pulled down.

Containers are the instant state of these images, and are able to consume very little space as they are simply copies of the images. The only space they consume is what is added during operation.

Images are the saved state of a filesystem composed of layers of filesystem states. If a file is uploaded in one image layer, then that file will always exist as part of that image layer and thus the final image will be that much larger. So, if you add a large tar and unpack it and then delete it in a different command, then there really is no reason to delete it, because the file will still exist in the layering and consume that amount of space. It is best to combine commands on a single line and make other optimization decisions when it comes to importing data that won't need persisting.

I hope this helps differentiate between containers and images as I found them a bit confusing when I first started learning Docker in the early days.

Monday, December 1, 2014

Docker: Adding HAProxy and Fig to my docker websites


In a previous post, I discussed moving my websites into Docker containers with their own separate httpd servers from their previous setup as virtual hosts on a single httpd server. This post will discuss integrating HAProxy and Fig into my installation. This will allow for load balancing, proper routing, and easy deployments.

To add HAProxy, I simply used the library haproxy image. You will just need to create your own Dockerfile and copy in your haproxy.cfg as the instructions state from the link. The first part of the config file is fairly standard:

1:  global  
2:   log 127.0.0.1 local0  
3:   log 127.0.0.1 local1 notice  
4:   chroot /var/lib/haproxy  
5:   user haproxy  
6:   group haproxy  
7:     
8:    
9:  defaults  
10:   log global  
11:   mode http  
12:   option httplog  
13:   option dontlognull  
14:   option forwardfor  
15:   option http-server-close  
16:   timeout connect 5000  
17:   timeout client 50000  
18:   timeout server 50000  
19:   errorfile 400 /etc/haproxy/errors/400.http  
20:   errorfile 403 /etc/haproxy/errors/403.http  
21:   errorfile 408 /etc/haproxy/errors/408.http  
22:   errorfile 500 /etc/haproxy/errors/500.http  
23:   errorfile 502 /etc/haproxy/errors/502.http  
24:   errorfile 503 /etc/haproxy/errors/503.http  
25:   errorfile 504 /etc/haproxy/errors/504.http  
26:   stats enable  
27:   stats uri /haproxy?stats  

It is the rest of the file that does the work for us. I have multiple websites served from the same server, and each website is also accessed by multiple domains (i.e. cafezvous.com is also cafezvous.net). There may be a less verbose way of doing what I've done, but I haven't found that method. We need to declare our frontend and bind it to port 80 on any incoming IP address. We then define ACL's (Access Control Lists) for each domain and associate them with the appropriate generic host. We then associate each host with the backend cluster that will serve the calls to that website. We finally use cafezvous as the default cluster if no ACL is matched.

29:  frontend http-in  
30:      bind *:80  
31:    
32:      # Define hosts  
33:      acl host_cafezvous hdr(host) -i cafezvous.com  
34:      acl host_cafezvous hdr(host) -i cafezvous.co  
35:      acl host_cafezvous hdr(host) -i cafezvous.info  
36:      acl host_cafezvous hdr(host) -i cafezvous.org  
37:      acl host_cafezvous hdr(host) -i cafezvous.net  
38:      acl host_cafezvous hdr(host) -i www.cafezvous.com  
39:      acl host_cafezvous hdr(host) -i www.cafezvous.co  
40:      acl host_cafezvous hdr(host) -i www.cafezvous.info  
41:      acl host_cafezvous hdr(host) -i www.cafezvous.org  
42:      acl host_cafezvous hdr(host) -i www.cafezvous.net  
43:      acl host_dbdevs hdr(host) -i dbdevs.com  
44:      acl host_dbdevs hdr(host) -i dbdevs.co  
45:      acl host_dbdevs hdr(host) -i dbdevs.info  
46:      acl host_dbdevs hdr(host) -i dbdevs.org  
47:      acl host_dbdevs hdr(host) -i dbdevs.net  
48:      acl host_dbdevs hdr(host) -i www.dbdevs.com  
49:      acl host_dbdevs hdr(host) -i www.dbdevs.co  
50:      acl host_dbdevs hdr(host) -i www.dbdevs.info  
51:      acl host_dbdevs hdr(host) -i www.dbdevs.org  
52:      acl host_dbdevs hdr(host) -i www.dbdevs.net  
53:      acl host_danpluslaura hdr(host) -i danpluslaura.com  
54:      acl host_danpluslaura hdr(host) -i danpluslaura.co  
55:      acl host_danpluslaura hdr(host) -i danpluslaura.info  
56:      acl host_danpluslaura hdr(host) -i danpluslaura.org  
57:      acl host_danpluslaura hdr(host) -i danpluslaura.net  
58:      acl host_danpluslaura hdr(host) -i www.danpluslaura.com  
59:      acl host_danpluslaura hdr(host) -i www.danpluslaura.co  
60:      acl host_danpluslaura hdr(host) -i www.danpluslaura.info  
61:      acl host_danpluslaura hdr(host) -i www.danpluslaura.org  
62:      acl host_danpluslaura hdr(host) -i www.danpluslaura.net  
63:    
64:      ## figure out which one to use  
65:      use_backend cafezvous_cluster if host_cafezvous  
66:      use_backend dbdevs_cluster if host_dbdevs  
67:      use_backend danpluslaura_cluster if host_danpluslaura  
68:    
69:      default_backend cafezvous_cluster  

Normally our backends would be associated to some IP address, but we don't know what the IP address of a container will be until it's created. We could wait until all of the servers are started to then add each IP address and port individually, but that doesn't seem reasonable or easy. Also, we'd be creating a new container for each restart. We could also map each website to a particular port on the host and then update this with the host IP address/localhost and port. The problem isn't as bad with this solution, but this won't scale well. Plus, the smart guys at Docker have thought of a clever way to associate an IP with a container. They create an entry in /etc/hosts when the --link command is used to link containers together. So, the cafezvous container can be referenced by cafezvous with a trailing port number. And dbdevs is referenced the same way. All of them can continue to use the common port of 80 and this can be deployed anywhere without checking to make sure a port isn't already mapped to the host.

71:  backend cafezvous_cluster  
72:      balance leastconn  
73:      option httpclose  
74:      option forwardfor  
75:      cookie JSESSIONID prefix  
76:      server node1 cafezvous:80 cookie A check  
77:    
78:  backend dbdevs_cluster  
79:      balance leastconn  
80:      option httpclose  
81:      option forwardfor  
82:      cookie JSESSIONID prefix  
83:      server node1 dbdevs:80 cookie A check  
84:    
85:  backend danpluslaura_cluster  
86:      balance leastconn  
87:      option httpclose  
88:      option forwardfor  
89:      cookie JSESSIONID prefix  
90:      server node1 danpluslaura:80 cookie A check  

Now we can start our containers:

 $ docker run -dtP --name dbdevs barkerd427/dbdevs  
 ed2ac1134324a7dda48d20567efb52e051d57f844c89b2a98a220a1c8b297a74  
 $ docker run -dtP --name cafezvous barkerd427/cafezvous  
 ec7bfaf06985ace3f617ffc466c93eabfe4cfca916ff8c71c4a125e5d0a7dfae  
 $ docker run -dtP --name danpluslaura barkerd427/danpluslaura  
 8ea8b1261d2ec1e5268e55f7504e4fab308cdaea1a5301e382ccb5de0e0718ee  
 $ docker run -dtP --name haproxy --link cafezvous:cafezvous --link dbdevs:dbdevs --link danpluslaura:danpluslaura barkerd427/haproxy  
 7971754d55d10972370b401c8061218bf53b495aafdc82d366f9232aaa192a03  
 $ docker ps -a -s  
 CONTAINER ID    IMAGE                         COMMAND               CREATED          STATUS          PORTS                                                NAMES         SIZE  
 7971754d55d1    barkerd427/haproxy:0.1        "bash /haproxy-start  18 seconds ago   Up 6 seconds    0.0.0.0:49161->443/tcp, 0.0.0.0:49162->80/tcp  haproxy       0 B  
 8ea8b1261d2e    barkerd427/danpluslaura:0.20  "httpd -DFOREGROUND"  2 minutes ago    Up 2 minutes    0.0.0.0:49158->80/tcp                             danpluslaura  2 B  
 ec7bfaf06985    barkerd427/cafezvous:0.4      "httpd -DFOREGROUND"  2 minutes ago    Up 2 minutes    0.0.0.0:49157->80/tcp                             cafezvous     2 B  
 ed2ac1134324    barkerd427/dbdevs:0.4         "httpd -DFOREGROUND"  3 minutes ago    Up 2 minutes    0.0.0.0:49156->80/tcp                             dbdevs        2 B  

This is a bit cumbersome to have to add all of these links individually, especially as the infrastructure becomes more complex. So, we'll go ahead and add a simple Fig configuration file to the mix and get all of these commands condensed into one. Here's the configuration fig.yml file:

 haproxy:  
  image: barkerd427/haproxy
  ports:
   - "80:80"
  links:
   - cafezvous  
   - dbdevs  
   - danpluslaura  
 cafezvous:   
  image: barkerd427/cafezvous
  ports:  
   - "80"  
 dbdevs:   
  image: barkerd427/dbdevs
  ports:  
   - "80"  
 danpluslaura:   
  image: barkerd427/danpluslaura
  ports:  
   - "80"  

We explicitly call out the port mapping for HAProxy to the host, but we leave the others to be assigned by Docker. We don't really care what the ports are on the host for this purpose, but we do need to know that the container port 80 is mapped generically. We also explicitly call out the links that HAProxy needs to the other container names. Now when I run fig up -d, everything will start in the correct order and run in the background. If you don't add the -d, then everything will shutdown when that session ends.

To run fig up, you need to be in the directory with the fig.yml file or reference that file with the -f or --file option. The project name defaults to the directory name, but this can be overridden with -p or --project-name. You can also attach to fig to see the output of the containers by using fig logs.

 $ fig up -d  
 Creating websitefig_danpluslaura_1...  
 Creating websitefig_dbdevs_1...  
 Creating websitefig_cafezvous_1...  
 Creating websitefig_haproxy_1...  
 $ fig ps  
      Name                  Command              State   Ports  
 -------------------------------------------------------------------------------------  
 websitefig_cafezvous_1     httpd -DFOREGROUND   Up      0.0.0.0:49171->80/tcp  
 websitefig_danpluslaura_1  httpd -DFOREGROUND   Up      0.0.0.0:49169->80/tcp  
 websitefig_dbdevs_1        httpd -DFOREGROUND   Up      0.0.0.0:49170->80/tcp  
 websitefig_haproxy_1       bash /haproxy-start  Up      443/tcp, 0.0.0.0:80->80/tcp  
 $ docker ps -a -s  
 CONTAINER ID    IMAGE                         COMMAND               CREATED             STATUS             PORTS                           NAMES                      SIZE  
 81ee5adf1aa3    barkerd427/haproxy:0.1        "bash /haproxy-start  About a minute ago  Up 59 seconds      443/tcp, 0.0.0.0:80->80/tcp  websitefig_haproxy_1       0 B  
 7df01566911e    barkerd427/cafezvous:0.4      "httpd -DFOREGROUND"  About a minute ago  Up 59 seconds      0.0.0.0:49171->80/tcp        websitefig_cafezvous_1     2 B  
 0f7c971a41aa    barkerd427/dbdevs:0.4         "httpd -DFOREGROUND"  About a minute ago  Up About a minute  0.0.0.0:49170->80/tcp        websitefig_dbdevs_1        2 B  
 e02f0cdc37b1    barkerd427/danpluslaura:0.20  "httpd -DFOREGROUND"  About a minute ago  Up About a minute  0.0.0.0:49169->80/tcp        websitefig_danpluslaura_1  2 B  
 $ fig logs  
 Attaching to websitefig_cafezvous_1, websitefig_dbdevs_1, websitefig_danpluslaura_1  
 danpluslaura_1 | [Mon Dec 01 14:30:59.367252 2014] [mpm_event:notice] [pid 1:tid 140438754260864] AH00489: Apache/2.4.10 (Unix) configured -- resuming normal operations  
 danpluslaura_1 | [Mon Dec 01 14:30:59.367462 2014] [core:notice] [pid 1:tid 140438754260864] AH00094: Command line: 'httpd -D FOREGROUND'  
 dbdevs_1    | [Mon Dec 01 14:30:59.708416 2014] [mpm_event:notice] [pid 1:tid 140529524463488] AH00489: Apache/2.4.10 (Unix) configured -- resuming normal operations  
 dbdevs_1    | [Mon Dec 01 14:30:59.708595 2014] [core:notice] [pid 1:tid 140529524463488] AH00094: Command line: 'httpd -D FOREGROUND'  
 cafezvous_1  | [Mon Dec 01 14:31:00.034189 2014] [mpm_event:notice] [pid 1:tid 140501628454784] AH00489: Apache/2.4.10 (Unix) configured -- resuming normal operations  
 cafezvous_1  | [Mon Dec 01 14:31:00.034919 2014] [core:notice] [pid 1:tid 140501628454784] AH00094: Command line: 'httpd -D FOREGROUND'  
 cafezvous_1  | 172.17.0.25 - - [01/Dec/2014:14:31:15 +0000] "GET / HTTP/1.1" 200 18045  
 danpluslaura_1 | 172.17.0.25 - - [01/Dec/2014:14:31:19 +0000] "GET / HTTP/1.1" 200 24280  
 dbdevs_1    | 172.17.0.25 - - [01/Dec/2014:14:31:26 +0000] "GET / HTTP/1.1" 200 11154  

We can then bring everything down by using the command fig stop. We can restart them all by using fig up again, which will actually reuse the containers from the previous run unless there has been an update. In fact, you don't even have to call fig stop to make an update. Simply run fig up and containers with an update will be updated.

Currently this is a good solution for my often unvisited websites, but if I made a serious push to make a business out of these sites, then this solution would not suffice. Currently, I cannot add another website container for cafezvous to scale its ability to handle additional workloads. This is something that is definitely needed, and I will post my experiences in the future with Serf and hopefully Kubernetes. These systems will allow for dynamically growing my services if needed.

Docker: How I dockerized my websites


The Problem:


My current system for deploying my pretty awful websites is by using Chef. I'm not a web developer, and I haven't spent much time developing these websites, so don't judge me too much on them. I dabble with them every couple of months to get them closer to something good. They don't get much traffic, so I can run all three on one very tiny AWS instance. I'm still using the free tier, so who knows where I'll go once that's done. Actually, this brings me to my decision for my current setup. I originally hosted all three sites on my server at home, but power outages had become a problem.

However, I had no reliably consistent way of deploying these three servers to another server without taking a rather significant amount of time and effort. I had been using Chef at my day job, and I thought this would be a great opportunity to learn more and complete a task that needed to be finished quickly, as one of the websites was for my wedding and people would soon be needing to access it for RSVP's and other information. I couldn't have it going down for half the day whenever the power went out. I had already gotten complaints when my ISP randomly changed my IP address which I thought was static.

So, I endeavored to setup a deployment method that would allow for fast redeployments and code updates. I wanted this to be a cross-platform implementation, but it never quite got there. I was running Debian at home and RHEL at work, so I went with RHEL as it would be more beneficial for my knowledge base at work. Debian uses apt-get, so I had originally developed my Chef cookbook/roles with apt-get in mind. It didn't take much to switch to yum and the epel repo, but it did take extra effort that I wish hadn't been needed.

This Chef setup did allow for multiple websites to be deployed at once using a single httpd server with multiple virtual hosts configured. Updates could be made when one of the website's code had changed. Everything could be initiated with a simple chef-solo run, but this wasn't portable enough. I still had to manage the dependent cookbooks, but Berkshelf made this much easier. I also had to restart the single running httpd server when any website was updated, which creates some risks if the updates I'm making bring the httpd server down. I didn't have a sandbox environment that I could use to ensure everything would work on an identical AMI. In fact, I didn't even have another RHEL or Centos box that closely resembled the production AMI. This is not the way to do it.

The Solution (the part you actually care about):


That is when I discovered Docker. We had been discussing it at work for awhile before it was officially released as a 1.0 version. I had even created Jenkins slave containers to more efficiently utilize some of our resources. I had also setup a private registry at work so we could share our images when the time came to ramp up our Docker development. However, it hadn't occurred to me that I could run my servers using Docker containers. I had originally thought about just using Chef to provision a container with the exact setup I was already running. That's lazy and really doesn't fit the purpose of Docker.

Chef can be utilized to manage your Docker configuration on a node, but containers should be immutable once created. Containers should also be used in composition of an app or system similar to what developers already do with classes and libraries. If a container is running syslog, then you're probably doing it wrong (for the record, I haven't gotten to the point where I have a container for logging). My goal with Docker was to separate the concerns of my current setup so that I could update the code in one website without needing to restart all the servers. I also wanted an easier way to manage my deployment and upgrades.

I chose to separate each website into its own container running its own httpd server. This allowed a simpler configuration without virtual hosts and a single directory within the container where the code is stored. I also chose to use haproxy to route my website traffic to the appropriate container. To do this, I had to use links between the containers. I'll show how to manually do this below along with the better way using fig.

First, we need to create our website containers. With Chef, I had to install git or curl so that I could download the entire website repo during the Chef run. That's not a huge deal, but I didn't need either on my production machine. This isn't needed on the container, but it can be installed if you really want to. I chose to clone locally and then COPY the contents into the container.

The Dockerfile for each website is the same except the website name:

danpluslaura.com
1:  FROM httpd:2.4  
2:    
3:  MAINTAINER Daniel Barker (barkerd@dbdevs.com)  
4:    
5:  COPY ./danpluslaura/ /usr/local/apache2/htdocs  
6:  COPY ./httpd.conf /usr/local/apache2/conf/httpd.conf  

dbdevs.com
1:  FROM httpd:2.4  
2:    
3:  MAINTAINER Daniel Barker (barkerd@dbdevs.com)  
4:    
5:  COPY ./dbdevs/ /usr/local/apache2/htdocs  
6:  COPY ./httpd.conf /usr/local/apache2/conf/httpd.conf  

cafezvous.com
1:  FROM httpd:2.4  
2:    
3:  MAINTAINER Daniel Barker (barkerd@dbdevs.com)  
4:    
5:  COPY ./cafezvous/ /usr/local/apache2/htdocs  
6:  COPY ./httpd.conf /usr/local/apache2/conf/httpd.conf  

Note that I have also included a custom httpd.conf file, but this is only needed so I can remove the www from the website address and set the ServerName:

191:  ServerName cafezvous.com:80  
...    
504:  RewriteEngine on  
505:  Options +FollowSymlinks  
506:  RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]  
507:  RewriteRule ^(.*)$ http://cafezvous.com$1 [R=301,L]  

With this setup, the only images that are needed for each different website are the same as the original implementation (the website files and the special configuration), except I can now run three separate httpd servers. This setup also allows me to develop locally using Docker's -v option to mount those files on my host system and have the httpd server use those files. This allows for the container to immediately see changes to my code during development. I could also use this method with a data container for production deployments, but that really wouldn't be very useful for this instance.

Now you may be asking how I plan to route each website to the appropriate container since they can't all run on port 80, and that's probably how I'll want visitors to access them. I could probably run an httpd server on the host system or another container to route each site to a particular port, but that isn't really a good use of httpd and is more suited to an actual proxy. I chose to use haproxy to route my website traffic, but I'll leave that for another blog post. The current setup will allow you to test your websites locally with hardly any code. If you want to update to a newer version of httpd, then you simply change the FROM command and rebuild and push your images. I won't cover this process here, as the Docker docs cover the basics very well and are updated regularly.

Note: I could have installed curl or git or some other program to get my code into the container, but I thought that would just create more images with more space taken up by code that would never be needed again. The only problem with this is ensuring you have the correct version locally. Therefore, I may have done this differently if it were a corporate environment where I could download a tar and unpack it in a single RUN command to eliminate most of the overhead. This would also allow for versioning to be maintained in source control.