Recasting an Analytic Pipeline with Cloud Formation

Our existing vulnerability prioritization process (VulnPryer) pipeline is heavily based on a number of AWS technologies, most notably Data Pipeline and OpsWorks. Both of these services have had numerous updates over 2015, with Data Pipeline getting improved tagging support and OpsWorks rolling out the much anticipated support for Chef 12. My team’s Kanbanboard has had a card to update our flow for these new features for some time. I finally tackled this as what I thought would be a quick project but which quickly became a much more involved learning opportunity.

I love infrastructure-as-code. The capability to fully automate builds is vital to allow my small team to be able to maintain the ever increasing number and sophistication of our analytic flows, allowing us to focus on the consumption of data rather than the mechanisms by which data is created. The incumbent version of our VulnPryer flow uses a small boto2 based script to build both the scheduling system in Data Pipeline as well as the main worker instances via OpsWorks. To take advantage of tagging in Data Pipeline would require us to use boto3, which would mean a pretty major rewrite. Rather than investing the work to create a new version that would work only for this process, I decided to move to AWS Cloud Formation, allowing any user with the appropriate permissions to create a version of the stack with only supplying the minimum necessary parameters.

Key Learnings

Hand-rolling JSON files is straight out of a Lovecraft story - The number of times I hand stack creates failed because of a misplaced comma or the wrong level of nested curly brackets vs. square brackets is more numerous than all the grains of sand. JSONLint and the aws cloudformation validate-template command were my best friends here.
Creating custom Lambda-backed Cloud Formation resources - The default Cloud Formation action for creating OpsWorks instances is to immediately launch them. As I wanted my OpsWork stack to be created in a dormant state for later starting via Data Pipeline, this wouldn’t do. I wound up using AWS Lambda to create a custom resource which created OpsWorks instances in a stopped state.
Getting up close and personal with IAM - When working with the IAM web console, some aspects of the complexities of IAM are hidden from you. The gotchya here is that instance profiles are a separate thing from roles and that both are required in different parts of the AWS API to properly allow services and instances to have the proper permissions.
S3 bucket policies can be tricky - While I’m conceptually familiar with the two-stage check of IAM user policies and then S3 bucket policies, it always seems that S3 bucket policies require more fiddling than they should. Different clients call the API in different ways which makes writing rules more complex than I prefer.
Python object system - I hadn’t had the opportunity to work with Python classes in a substantial way. Creating the Lambda resource via extension of Elelsee’s nifty helper object required me to up my game in understanding Python’s take on objects. I find the Python method a lot simpler than the choose-your-own-adventure (but choose wisely!) that exists in Rstats.
Using dots in your S3 buckets is an anti-pattern - This deserves a write up in itself. Let’s just say that new users should stay the heck away from using periods in their S3 bucket names unless they have a compelling reason (S3 static website hosting being the only one I know of)

Outcome

What started out as a small screen full of JSON wound up being approximately 700 lines of parameters and resources, plus another 100+ lines of python code for the Lambda function. Now with a single aws cloudformation create-stack command I can spin up a dedicated S3 bucket for my app (with mandatory encryption at rest and access control), a Chef 12 OpsWorks stack to do the analytic process, a scheduled Data Pipeline to run and monitor the flow (with CloudWatch logging and SNS notifications), and tailored IAM users and roles to provide just the right level of permissions to run everything.

What’s Next

While I’ve worked with Cloud Formation before, this wound up being a much more ambitious project than I anticipated. It was great to get more familiar with the AWS API (practice with the aws cli and Cloud Formation are great experiences to learn how bits of the Amazon ecosystem fit together). At the same time, I’m thoroughly convinced (mostly by point 1) that Cloud Formation is not the right long term tool for creating, documenting, and maintaining our cloud infrastructure. While we’ll continue to write some more templates to document some currently hand-rolled bits of our environment, there’s a card named “Investigate Terraform” on our board that’s crying out for some time…