Part of my monthly financial reconciliation involves pulling down the latest sale and rental valuation estimates of my house from Zillow. Living in one of the more dynamic real estate markets in the US means there’s change in these estimates on a frequent basis. I’ve evolved this process slowly over time to be easier to use and more data rich. What first started out as a manual lookup on the Zillow site along with some stare-and-compare summaries became a PowerShell cmdlet using the Zillow API to grab all the relevant numbers and neatly format them. This worked well for a long time, but I wasn’t performing historical trending and the Zillow-powered graph doesn’t have a lot of fidelity. I wanted to get both the most likely estimates as well as the high/low (confidence intervals) values as well as shifting from a pull model to a push environment where I could be notified as frequently changes occur. Time to break out a bit of Python and code something up for a cloud-deployed solution!
My solution starts off with a CloudWatch Events rule set for a daily invocation of a Lambda function. When called, the Lambda function checks environment variables for a S3 location of a historical CSV file and reads that file into a Pandas dataframe. More environment variables are used to find the Zillow API key and the target property ID to track. A call to the Zillow API fetches the current valuations and compares the date of update from the response to the most recent data in the CSV. If the API data is newer, the CSV is updated and saved to S3 and a message is published to a SNS topic for push notifications. If there is no new data, the function silently exits.
Build and Deployment
There are two GitHub repositories involved. The first repository contains the Lambda function code and uses Travis CI to build the deployment artifact. This is simply downloading the various dependencies (mainly pandas, BeautifulSoup, and requests) and building a ZIP file in the appropriate format for consuming by Lambda. This artifact is automatically pushed to a artifact staging bucket in my AWS account.
The second repository is a small bit of Terraform code (with help from Terragrunt) for configuring the infrastructure itself. Even for this tiny project there is still a fair amount of configuration to accomplish. The destination bucket for the CSV needs to be created and access permissions set up. The IAM role for executing the Lambda function needs specific permissions to the S3 location, the ability to push to the SNS topic, as well as the core Lambda execution and logging permissions. The aforementioned SNS topic must be created and the CloudWatch Events rule has to be created with the appropriate targets to trigger the daily process. Some of this could be done via shared privileges, but I prefer to keep my accounts and set up as compartmentalized as practical. This isolates concerns across multiple processes, improves visibility for both security and billing, and has the nice benefit that allows others to grab this code and tweak/deploy as they see fit without being tied to the specifics of my cloud set up.
Even though a teeny little project, I hadn’t done much AWS development on this particular desktop machine before and there were some nits that came up during the development process. In no particular order:
- Lambda Python support - My rage meter is filled every time I create a new Lambda function and see that Python support is still stuck on 2.7. C# gets added to the supported environments before a modern Python 3 implementation?!
- Lambda Package management - Getting dependencies available for Lambda means vendoring packages and bundling them up. Not a big issue, but when those dependencies include compiled code, that’s a hassle as my development environment is Windows and the Lambda runtime is a containerized version of AWS Linux. Fortunately the Ubuntu-based Travis CI containers seem close enough to work.
- Vagrant - My tests of Python 2.7 and vendored dependencies used a Vagrant box. When I rebuilt my development environment, I picked up the current 1.9.3 build of Vagrant. It turns out this has a fatal networking issue. I was able to work around it, though these sorts of Windows-Vagrant interactions are distressingly common. Vagrant as a project doesn’t get nearly the love I’d like to see. I refuse to even mention the sad state of vagrant-aws.
- Makefiles - The Travis CI build process uses a Makefile to create the deployment ZIP file. I rarely author Makefiles and have to relearn the syntax almost every project. Just can’t seem to make that info stick in my head.
This solution has only been live for a couple days so no fun plots or analysis at this time. I’m looking forward to firing up some AWS S3 integration in R and working with this data set as it develops.Yes, this is a flagrant violation of the XKDC “Is It Worth It?” guidance, but I needed a distraction project to take me away from obsessing about a nagging injury that’s been keeping me from running. Not everything needs to be efficient.