After migrating quite a few legacy applications from unmanaged EC2 instances into Elastic Beanstalk recently I noticed a few issues deploying code to the environment while under load.

A lot of investigation later I discovered that at the point Elastic Beanstalk was deploying the code, Apache started spinning up a lot more child processes and the instances were maxing out the CPU. Once the instances were added back into the elb it took a minutes for the CPU usage to drop to a level where they could serve a decent amount of traffic again, which is why the application response time went sky high.
One way Amazon suggests to deploy new application versions is outlined in this guide: Deploying Versions with Zero Downtime. Something I was concerned about with using this approach is the new instances would not be ‘warmed up’ enough to take all of the traffic immediately.
I wanted to find out exactly what was causing the problem, and it seemed to lie in the way Elastic Beanstalk switched code from /var/app/ondeck/ to /var/app/current/. When the deployment happened it issues a mv command on the /var/app/current/ directory to back it up before moving /var/app/ondeck/ into it’s place. After some further research a solution was to do something undocumented (and probably not really recommended), replace the deploy script: /opt/elasticbeanstalk/hooks/appdeploy/enact/01_flip.sh, which is what ElasticBeanstalk uses to run the deployment commands.
Warning, this solution is not a documented and definitely not a supported Amazon AWS solution, so use with care.
files:
"/opt/elasticbeanstalk/hooks/appdeploy/enact/01_flip.sh":
mode: "000755"
owner: root
group: root
encoding: plain
content: |
#!/usr/bin/env bash
set -xe
EB_APP_STAGING_DIR=$(/opt/elasticbeanstalk/bin/get-config container -k app_staging_dir)
EB_APP_DEPLOY_DIR=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_dir)
if [ -d $EB_APP_DEPLOY_DIR ]; then
cp -r $EB_APP_DEPLOY_DIR $EB_APP_DEPLOY_DIR.old
fi
rsync -r --delete $EB_APP_STAGING_DIR/* $EB_APP_DEPLOY_DIR/
nohup rm -rf $EB_APP_DEPLOY_DIR.old >/dev/null 2>&1 &
After making this change and running a deployment with roughly 10k requests per second I didn’t see a repeat of the issue I was having before, so it seemed to do the trick.
