After migrating quite a few legacy applications from unmanaged EC2 instances into Elastic Beanstalk recently I noticed a few issues deploying code to the environment while under load.

eb-graph-before

A lot of investigation later I discovered that at the point Elastic Beanstalk was deploying the code, Apache started spinning up a lot more child processes and the instances were maxing out the CPU. Once the instances were added back into the elb it took a minutes for the CPU usage to drop to a level where they could serve a decent amount of traffic again, which is why the application response time went sky high.

One way Amazon suggests to deploy new application versions is outlined in this guide: Deploying Versions with Zero Downtime. Something I was concerned about with using this approach is the new instances would not be ‘warmed up’ enough to take all of the traffic immediately.

I wanted to find out exactly what was causing the problem, and it seemed to lie in the way Elastic Beanstalk switched code from /var/app/ondeck/ to /var/app/current/. When the deployment happened it issues a mv command on the /var/app/current/ directory to back it up before moving /var/app/ondeck/ into it’s place. After some further research a solution was to do something undocumented (and probably not really recommended), replace the deploy script: /opt/elasticbeanstalk/hooks/appdeploy/enact/01_flip.sh, which is what ElasticBeanstalk uses to run the deployment commands.

Warning, this solution is not a documented and definitely not a supported Amazon AWS solution, so use with care.

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/enact/01_flip.sh":
    mode: "000755"
    owner: root
    group: root
    encoding: plain
    content: |
        #!/usr/bin/env bash

        set -xe

        EB_APP_STAGING_DIR=$(/opt/elasticbeanstalk/bin/get-config  container -k app_staging_dir)
        EB_APP_DEPLOY_DIR=$(/opt/elasticbeanstalk/bin/get-config  container -k app_deploy_dir)

        if [ -d $EB_APP_DEPLOY_DIR ]; then
          cp -r $EB_APP_DEPLOY_DIR $EB_APP_DEPLOY_DIR.old
        fi

        rsync -r --delete $EB_APP_STAGING_DIR/* $EB_APP_DEPLOY_DIR/

        nohup rm -rf $EB_APP_DEPLOY_DIR.old >/dev/null 2>&1 &

After making this change and running a deployment with roughly 10k requests per second I didn’t see a repeat of the issue I was having before, so it seemed to do the trick.

eb-graph-after