I’m a bit of an odd ball among many of my data science and devops colleagues for running Windows as my primary desktop OS. While my earliest experiences were on Apple systems, many of the organizations I’ve been a part of are traditional Windows shops. Understanding the challenges and opportunities these organizations face in applying data-driven practices to Windows-based technology stacks is incredibly valuable to me. In recent years, Microsoft themselves has embraced and adopted (née“extend”) many of the practices that first came from Unix shops. Among these include the surprise purchase in 2015 of Revolution Analytics, followed quickly by the announcement that support for R functions would be build into SQL Server, then the release of R Tools for Visual Studio. I finally sat down and did a first quick tour of these exciting new offerings.
My guide for this set up was this handy tutorial on creating a predictive model in SQL Server for the number of car rentals occurring on a given day, given features of snow, holiday, day of week, etc. The model itself very straightforward, with both a linear regression and a decision tree built, even going so far as to plot out residuals as a means of selecting the best fit model. This is more than I would typically expect for a database-focused how to, so I took this as a good sign that this would be a useful walk-through.
The tutorial does a fine job getting things up and running. The biggest frustration I had was from the installation process itself and some of the baggage associated with the development environment. My preferences for IDEs run towards the minimal. The more an IDE stays out of my way, the better. Visual Studio (2015 Community Edition, in my case) feels much more like a serious IDE than I typically enjoy. I have Visual Studio installed mainly to scratch my curiosity, and I have numerous extensions loaded up, including both the AWS and Azure extensions, the Python extension, and the R Tools environment. Similarly, the process of installing SQL Server and the R feature was not complicated, but it was slow and required multiple reboots of my Windows 10 environments to finally get things up and running. Getting all these pieces required multiple launches of web browsers to get some web stub, then running the individual installer, then waiting for all the configuration to take place. Even on relatively powerful machines with SSD drives, getting just one of my systems up on all the components and updates took most of an evening.
Installation gripes aside, the core functionality looks quite solid. Microsoft has taken the Revolution products and put them fully in to both Visual Studio and SQL Server 2016. I’ve often heard of some of the enhancements in Revolution’s commercial offerings in terms of parallelization, but hadn’t had the chance to work with them. The code in the tutorial, while using the unfamiliar Revolutions extensions, was easy to understand and performed in an intuitive fashion.
One design choice I really appreciate is the way that R scripts can be embedded as external processes in SQL stored procedures. As stored procedures, the R functions can get access to entire SQL dataframes with a flexible parameters scheme defined by the function author. This feels much more intuitive and powerful than the R integration in Tableau, with limited script authoring functionality and a parameter passing process that has never felt natural to me.