Getting Started with Machine Learning Foundation of Systems of Intelligence

Introduction

As Systems of Intelligence mature they will have machine learning at their core.  That core is what will enable the application to anticipate and influence the end-user at the point of interaction in e-commerce.  That same core keeps a real-time fraud prevention application up-to-date without requiring human intervention.

But getting that to that level of maturity will require customers to embark on a journey with small steps.  This research note will impart some best practices gleaned from leading edge customers at the (See complete interviews at Data Science Summit) in San Francisco this summer.

  1. With machine learning, the available data determines functionality.
  2. Prioritizing projects requires two roles with one the source of expected ROI and the other estimating technical feasibility.
  3. In order to make the predictions most effective, data scientists need to understand how their models fit into users’ end-to-end workflow.

With machine learning, the available data determines functionality

 As long as SoI are so custom made, the enterprise will be responsible for supplying much of the core raw material, the data.  This has to come from formerly silo’d applications, partners, and syndicated data.  Data scientists need to make sense out of all this data  by extracting the signal from the noise.  Only then can the analytic application functionality be constructed on top of this raw material.

 In other words, SoI are all about “bending” the application functionality around the available data.  If the data is incomplete, you can only deliver so much functionality.  Implementing Systems of Record was very different. They came with all the functionality out-of-the-box.  But customers had to “bend” their own business practices to fit the application’s built-in business processes.

Prioritizing projects requires two roles with one the source of expected ROI and the other estimating technical feasibility

Applications aren’t tackled with big bang projects like in the SoR era.  Custom built applications get built one project at a time.

 What’s best addressed first, a high impact project or one that is low risk?  The answer actually has more finesse than the question.  Prioritizing projects needs both a moderately tech-aware business sponsor as well as a data scientist.

 It might seem sensible to let data scientists prioritize the projects since they’re closest to the data and can best tell how much effort it would take to deliver any particular solution.  But they tend to be drawn to the most challenging problems.

 A business sponsor can best estimate not only the ROI, but the potential business impact.  You can only really estimate an ROI on a fraud prevention application if you can estimate how much fraud is taking place – beyond what you are currently capturing.

 The key take-away is to do an inventory of your data assets, described in the first section, and then a roadmap prioritized by a collaboration of business sponsors and data scientists.

In order to make the predictions most effective, data scientists need to understand how their models fit into users’ end-to-end workflow

 Deploying a model that a data scientist created or updated to an operational application requires more than management tools.  Even if there isn’t a human in the loop of operational application, the model has to fit an existing workflow with minimal disruption.

That part of the process is more in the realm of user experience or user interface design.  After the analytic model is properly integrated with the operational application, the deployment management tools are responsible for making it easy to keep the model up to date.

Tools such as Spark make it easier to update models on a laptop and move them unmodified to a production cluster.  An interchange standard called PMML makes it possible to iteratively design models in one tool and deploy them to entirely different operational applications.  Cloudera is leading work on Apache Oryx which makes it easier to update models and deploy them really quickly.  It’s actually built on Spark and Kafka.

Further along in the customer journey and after the initial SoI are in place, enterprises need to rethink the processes or products they automated with the predictive models in the first place.

Action Item

Much work around SoI’s is heavily custom.  But unlike SoR, where all the functionality comes in the box, SoI’s can only go as far as their data will take them.  So enterprises must inventory and integrate their data from internal and external sources.  That determines what analytic functionality is possible.

Rolling out the applications is anything but the big bang SoR projects.  So prioritizing what gets built depends on a combination of ROI and technical feasibility.  Doing this with high fidelity requires both a business sponsor for the ROI and a data scientists for the feasibility.

Finally, putting the predictive models into production in operational applications requires one soft skill and one tool.  The soft skill is an awareness of the user’s or application’s workflow.  The disruption, at least in the early stages, must be small.  New tools are making it easier to accelerate the process of updating the models in one tool and deploying them into production in a different application.

Print Friendly, PDF & Email

Research Agenda