1

Fixing a mysterious .ebextensions command time out (AWS Elastic Beanstalk)

 2 years ago
source link: https://blog.jakubholy.net/2015/07/29/fixing-a-mysterious-ebextensions-command-time-out-aws-elastic-beanstalk/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Fixing a mysterious .ebextensions command time out (AWS Elastic Beanstalk)

July 29, 2015

Our webshop, nettbutikk.netcom.no, runs on AWS Elastic Beanstalk and we use .ebextensions/ to customize the environment. I have been just trying to get Gor running on our leader production instance to replay some traffic to our staging environment so that we get a much richer feedback from it. However the container_command I used caused the instance to time out and trash the environment, against all reason. The documentation doesn't help and troubleshooting this is hard due to lack of feedback and time-consuming. Luckily I have arrived to a solution.

This is the working solution:

files: /opt/gor: source: "https://s3-eu-west-1.amazonaws.com/elasticbeanstalk-eu-west-1-<our-id>/our_fileserver/gor" authentication: S3Access mode: "000755" owner: root group: root # Script to start Gor in the background # Beware: We need to intercept port 8080 b/c 80 is redirected there via iptables /opt/gor-in-background: mode: "000755" owner: root group: root content: | #!/usr/bin/env bash pidof /opt/gor || nohup /opt/gor --input-raw :8080 --output-http 'https://our-staging-server|1' >/dev/null 2>&1 </dev/null &# Only container_commands can access env variables configured in EB (ENV)# Start gor, limit it to copy max 1 req / seccontainer_commands: "Start Gor on Prod leader": command: test "$ENV" = "production" && /opt/gor-in-background >/dev/null 2>&1 # ! w/o the redirect it will time-out leader_only: true ignoreErrors: true # returns an "error" when not in Prod

(The S3 private bucket access is a story of its own, requiring and addition of AWS::CloudFormation::Authentication and a change of the bucket policy.)

The key points are starting gor in the background with & and, the magic ingredient that took me so long to figure out, the redirection of the command's output to /dev/null (the redirection inside the script likely doesn't need to be there with respect to this problem; I have it because I don't want any output to accumulate on the disk).

I do not know if I need to redirect both stdin and stderr of /opt/gor-in-background and why I need to do it but without it I got the infamous

[time N+1] INFO Command execution completed on all instances. Summary: [Successful: 1, TimedOut: 1]. 

[time N] WARN The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own): [i-1e35c2b3].

and the instance continued to time out even when I tried to re-deploy a working version and never managed to deliver logs.

Troubleshooting tip: Clone the target env a few times and use those to test changes multiple times and multiple changes in parallel to speed up the process.

Summary

If you (container) command leads to a time out, try to redirect its stdout and/or stderr to /dev/null.

Thanks to João Abrantes for the redirection idea!

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK