My spousal unit’s workplace recently launched an internal data challenge focused on a public OSHA data set. As a side project, I’ve casually explored the data, mostly as a means to provide any tips I could to my partner as well as an opportunity to explore parts of the AWS stack (which my wife’s workplace is heavily promoting) which I’ve not yet directly used.

One such service is AWS Machine Learning (AWS ML). AWS ML is a largely turnkey solution for creating basic prediction models on data, walking the end user through the data cleaning, model training and evaluation, and deployment steps. I’ve seen the service pop up a number of times recently and while it looked intriguing, I was skeptical of the offering as dangerously black box. While a more in depth analysis of those concerns will have to wait for a future post, one use of AWS ML that interested me was coupling AWS ML with AWS Cognito, the Amazon mobile identity solution, to power a web app to make real-time predictions (first seen via a presentation from Christopher Crosbie). This I had to try for myself!

OSHA Challenge Sample App

I’ve created a sample applicationhosted on GitHub Pages which:

  • Uses AWS Cognito to get a temporary and limited privilege set of credentials
  • Uses the temporary credentials to verify if the AWS ML model is ready to serve real-time predictions (normally shut off to conserve costs)
  • If the real-time endpoint is not ready, the prediction function is disabled and the Start Endpoint function is enabled.
  • The Start Endpoint will spin up the prediction endpoint and set a 3 minute refresh on the page.
  • Once the endpoint is ready, the user can fill in sample parameters into the form and click on the Predict button to get the results of the model.

The model itself is a toy example that uses all the inspection features in the OSHA public data set  and attempts to predict the letter grade (A, B, C, or D) of the final inspection outcome based upon factors such as company industry, company location, union status, etc. The currently loaded model is super trivial and in no way accurate. This is more a proof of concept of tying the components together in prelude to creating and evaluating a proper model.

This has been a fun side project where I’ve leveraged Terraform to create infrastructure (RedShift clusters, IAM roles, storage buckets, logging, etc.). It’s been some time since I’ve needed to do any web or JavaScript work and the opportunity to brush off just enough JS and Bootstrap to get a minimum viable product running was good exercise (JavaScript may be eating the world, but it gives me indigestion). I have a future to do of standing up a Lambda function triggered via CloudWatch Alarm to shut down the endpoint after a period of inactivity. In the mean time, feel free to try this out with the caveats that this is neither an accurate nor appropriate model.

Overall, this tying in of web front end to a (mostly*) easily deployed back-end is a powerful combination. Almost all of my previous work has been in batch predictions or in models deployed via Shiny. It’s nice to have exposure to other deployment models.

* AWS ML’s only supported US region is us-east-1, and neither Terraform nor CloudFormation current support AWS ML or AWS Cognito