As I close out my personal financial ledgers for the year, I once again utter curses towards my firm’s 401(k) provider. Rather than just sitting and subjecting my ever-tolerant spouse to the strange sputtering that comes from my office, I’ll strive to be a bit more productive and describe the challenges I have with Guideline and how these concepts of transparency and control apply to the data science work I perform during business hours.

Disclaimer (a.k.a. Record Scratch)

There’s a lot to like with Guideline. Targeting small firms, they have a low-cost, turn-key offering that takes care of a lot of the headaches involved with administering a 401(k) plan. They have a great portfolio of low cost index funds and (generally) ascribe to passive index investing as the best past for participants – which I endorse. Their web site is modern and has good multi-factor authentication without requiring some closed source commercial authenticator nonsense. That’s all great stuff. That doesn’t mean they’re above criticism though. Most of my complaints are on issues of transparency and empowering control for sophisticated users, topics which have good connections with data science.


Guideline’s transaction level reporting is very limited. Getting the details of the dates of transaction, the securities involved, whether the funds are pre- or post-tax, and whether they are employee or employer contributions is very difficult to get. For much of this, some clever reverse engineering of quarterly statements is necessary to understand what’s going on in your financial data. This is my number one complaint and it drives me into a phrenzy of denied OCD impulses everytime I try to reconcile my accounts.

Applying this concept of transparency to data science and research, I and the rest of my team have the privilege to work with some terrific partners. Through these connections, we have access to fascinating data sets that reveal much about the state and practice of The Cyberz(tm). While this data is both valuable and proprietary – and we take great care to protect all data in our trust – when it comes to the reports we write with this data, we strive to be as transparent as possible with the sources, size, shape, and yes, even the limitations of the data. Our reports are often targeting busy professionals but we work to include enough substance that someone interested in the details can dive in to the ways and means of the analysis instead of just relying upon something that could just as well be a proclamation. This provides confidence in the research and the results, but also allows readers to form questions (and often answers) that aren’t directly present in the narrative of the report itself. Sometimes the knowledge of how a finding needs to be adjusted to the special circumstances of your organization can be just as important as the finding itself.


Guideline uses much of the language of Vanguard’s founder, John C. Bogle, and the Bogleheads for its philosophy and recommendations on prudent investing. Key tenants on this outlook involve passive index funds that focus on time-in-the-market, cost efficiency, and a overall asset allocation that tracks the market. This goes a bit off the rails once rebalancing comes in to play. If you have funds outside of your 401(k), such as an IRA, a rollover account, or even just a taxable investment account, the rules of setting a percentage allocation on one portion of your investment portfolio no longer work effectively. This results in a lot of micromanaging in order to carry out the same goal. The controls present in the platform are not sufficient for the needs of the consumer.

The theme of control comes to play in data science as well. Once a report or data product is released, the ability to control how that is received by readers is vanishingly small. The previous principle of transparency is often the best defense against miss-interpretation and miss-application. While control may be impossible, trying to be as clear as possible about extensions of work, different ways of interpreting and application, are important concepts when crafting the tone and phrasing of an analysis’s message. This an area I find especially challenging – getting out of my own headspace and into the perspective of someone that’s not spent the weeks, months, or sometimes years with a topic and the supporting data and trying to anticipate where their questions, needs and to meet them where they are. It’s also one of the more rewarding aspects not just of the analysis and writing processes, but in the conversations and discussions that come after the release of a work.

I love those conversations that take place weeks or months after the initial release of a work. Seeing where some research may have informed thinking or influenced actions is both directly rewarding and informative for future work. Now, if I could only influence a bit more strongly how my darned portfolio is allocated…