Subject matter experts in the next decade: part I
Subject Matter Experts (SMEs) play an important role in the software development process because they provide information to programmers on what needs to be developed. SMEs can be business owners, stakeholders, or key users on the client end, or permanent members of the development team acting as domain experts. Minimally, their role consists in identifying, under the form of domain rules, the processes, operations, and constraints that need to be implemented. SMEs with modelling skills add more value to the project by helping elaborate the functional requirements, use case scenarios, and data models that support the implementation on the technical side.
The rise and fall of SMEs
Over the past thirty years, two successive revolutions in the software development industry enhanced the role of SME in computerization. In the late 1980s, as the capacity of computers increased, a demand appeared for more sophisticated programs. Assembly and procedural languages could technically process thousands of lines of code, but human barriers impeded the development of more complex software. The first revolution was the massive adoption of Object-Orienting Programming (OOP) in the early 1990s. New features such as data abstraction, encapsulation, modularity, polymorphism, and inheritance helped programming teams to write well-arranged and re-usable code. The second revolution was the rapid diffusion of the Agile approach in the 2000s. New organizational principles promoting the collaboration between cross-functional teams made it possible to accommodate continuously changing requirements, as well as close, daily cooperation between domain experts and developers.
These two revolutions were different in essence, the former being of a technological nature, the latter of a methodological kind, but they both contributed to the development of multi-layer software providing richer functionality. In practice, it became possible to isolate the domain logic from the rest of the program (system infrastructure, database, component connectivity, and interfaces). Because the domain logic encoded real-world rules determining how data should be created, displayed, stored, and updated, its emergence as a concept properly speaking enabled SMEs to take an active role in the software development cycle. Simply stated, the domain logic was the co-responsibility of SMEs and software programmers.
During the past 10 years, I met with hundreds of software development and R&D teams, and I can tell you that something big is coming, something that will completely change the way we design and develop software. Obviously I’m talking here about machine learning. The purpose of machine learning is to make programs capable of learning to do things without the help of any rules. To this regard, SMEs as defined above are likely to be left behind, but I’ll come to that later. First, let’s have a look at two notorious achievements in machine learning that illustrate that programs can perform exceedingly well without domain knowledge.
Without domain knowledge, everything works just fine
My first example is AlphaGo, Google DeepMind’s artificial intelligence program who just scored four victories out of five games against Lee Sedol, a 9-dan professional Go player, one the world’s best players. There is no hard-coded rules in AlphaGo, except of course the rules of play. As you probably know, AlphaGo is a neural network, or to be more exact, two neural networks bound together. One of these neural networks, called value network, calculates the probability of winning for a given board configuration. The other one, called policy network, selects the most appropriate move for a given board configuration.
I watched the last of the five games last Tuesday live on Youtube, and I found the pre-show extremely interesting. Michael Redmond, the official commentator of the match (a 9-dan professional player himself), and Chris Garlock, managing editor of the American Go e-Journal, interviewed three key members of the AlphaGo development team: David Silver, the AlphaGo project team leader, Arthur Guez, responsible for the value network, and Laurent Sifre, responsible for the policy network. It was fun, and at the same time scientifically amazing, to watch three talented machine learning practitioners explain to two immensely gifted (yet stunned) Go players that there was no expertise in Go within the AlphaGo team, and no domain knowledge in the program itself. You can watch this 15-mn interview here (the interview starts at 10:00):
StartFragment
My second example is an automatic drug discovery program developed in 2012 by a team of computer scientists and mathematicians: Geoff Hinton, Ruslan Salakhutdinov, George Dahl, Navdeep Jaitly, and Christopher Jordan-Squire. None of them had any background in chemistry, biology, or life sciences, but they developed in 2 weeks a deep learning application that won the Merck Molecular Activity Challenge.
The objective of the competition, organised by Kaggle (a community of machine learning practitioners), was to identify the best statistical techniques for predicting biological activities of different molecules, given multiple descriptors generated from their chemical structures. What is remarkable here is that their program, in which no domain knowledge was injected, performed better than other programs developed with the help of experts in the field. In their interview published on Kaggle’s blog, the team explained: “Since our goal was to demonstrate the power of our models, we did no feature engineering and only minimal preprocessing. The only preprocessing we did was occasionally, for some models, to log-transform each individual input feature/covariate. Whenever possible, we prefer to learn features rather than engineer them.”
Speaking of Kaggle and domain-independent algorithms, if you haven’t seen it already, you may find some really interesting stuff in The Wonderful and Terrifying Implications of Computers That Can Learn by Jeremy Howard. This exciting talk emphasizes that machine learning is generic in essence:
By the way, what do we mean exactly by feature engineering? Feature engineering is a process, often expert-guided, that would consist in injecting domain knowledge in the data to achieve better performance. Incidentally, it could also deteriorate the performance should the features prove irrelevant or cause overfitting issues. There is a fully automated alternative to expert-guided feature engineering based on unsupervised discovery techniques (and by definition, this alternative does not require the presence of an expert in the room).
Tomorrow, I'll review some techniques to inject domain knowledge efficiently from a machine learning perspective. I'll also explore how SMEs could contribute to domain knowledge injection.
EndFragment
EndFragment
EndFragment