A stream-based recommendation engine

Published: 2016-08-08 by Lars thoughtsarchitecture

Analyzing a constant stream of new data in close-to-real time, has been difficult to do with traditional data warehousing techniques. Lately, a new stream-based approach is gaining momentum with tools like Apache Flink. I have been interested in this field for some time now, and over the past few months I have started developing a new recommendation engine based on these techniques.

The stream-based approach builds on the assumption that every computation can be done by only working with a small window of the entire data set at a time. This allows for a much more scalable implementation where high-throughput, high-volume data can be analyzed with low latency. As a simple example, think about how you can instantly calculate the average of a huge data stream if you know only the sum and the count of all existing data points and the value of the next data point: average = (sum+value) / (count+1). You can read about this stream-based approach in more detail here. My experiments have confirmed that this is a viable approach for building a general recommendation engine.

McCloud River

This approach is generic in the sense that it can be applied to many different real-world domains. For example: recommendation of energy saving initiatives for large organizations; improving your chances on the online dating scene; better product recommendations for complex product categories; recommending learning content for improving your own professional skills. I looked into the latter domain, and I feel this is worth exploring as a potential business model. While I am aware that one or more pivots may lie ahead, for my first shot I have chosen the learning domain.

Today, individual learning efforts are often driven by externally-imposed curricula, be it in a corporate setting or in degree-earning programs. These are typically one-size-fits-all approaches that don't always fit very well with the actual needs of each individual.

I believe this field would benefit hugely if the roles were turned around, essentially putting the user in the driver's seat, providing them with transparency on their current skill level and the estimated effect of different learning content. The recommendation engine can then present a highly personalized stack of recommended learning content that the user can freely choose from. Initially the algorithms will be based on professional assumptions as to what content objects will improve what capabilities. As the solution evolves and more data is captured, the algorithms will become increasingly intelligent and founded on the users' actual results and achievements. I call this business idea "Triggerz".

To make it valuable as a recommendation engine, Triggerz will be based on partnerships with existing content providers. Think Khan Academy, Coursera, and similar MOOC, and any other providers of online accessible learning or evaluation tools. Triggerz will not on its own provide any teaching programs or software for online learning but become an independent gateway to the relevant content – across content types and content providers.

As an analogy, Triggerz will be the Hipmunk or Spotify of learning opportunities: In a highly sophisticated way, Triggerz can prioritize content from across a range of providers and point the users to the most relevant items.

I am starting a company, Triggerz ApS, to bring this idea to market. The aim is to develop a Minimum Viable Product in the very near future, and to implement the software with a pilot client hopefully by early 2017. I am looking for highly skilled collaborators (not employees) in the fields of UX design, back-end development, business analysis in the learning domain, content partnerships, and business development.

I am thrilled, and looking forward to this venture. If you have any kind of feedback, questions or ideas, please let me know. Stay tuned!

Discuss on Twitter