Subject matter experts in the next decade: part II
Subject Matter Experts (SMEs) play an important role in the software development process because they provide information to programmers on what needs to be developed. Yesterday I claimed that machine learning works just fine without domain knowledge. But I left two questions hanging in the air. How does machine learning perform with domain knowledge injection? How can SMEs help in the process?
Domain knowledge comes in many flavors, and may be represented by means of multiple formalisms. It is often ill structured, and as such difficult to incorporate into learning systems. I have one particular example in mind of machine learning application that would benefit from domain knowledge injection. Imagine that you wanted to develop a system capable of understanding phrases in Arabic. You could hire a machine learning practitioner specialised in Natural Language Understanding (NLU) with no knowledge in Arabic at all, and reasonably hope for a pretty decent result. Or you could hire a NLU specialist who is also a native Arabic speaker and raise your expectations. Indeed, this second NLU specialist would not only develop statistical models (as the first one would), he/she would also develop rule-based grammars that would help bootstrap the statistical models. Additionally, he/she would have enough native expertise to validate the quality of the data labels in the training sets, and to improve the annotation process.
If you really want to implement domain knowledge
From a general standpoint, and here I’m not talking about NLU specifically but about machine learning at large, I have three straightforward methods in mind that could be used to improve the learning process via domain knowledge injection.
The first method would be to preprocess the training data, and would consist of selecting, cleaning, and transforming the original data.
The second method would be to add artificially generated data so as to bring variations of relevant examples not covered in the original data sets.
The third method would be to inject extra information in the data by assigning weights to clusters of data points.I know by experience that machine learning practitioners who are familiar with their application domain can implement any of these three methods to achieve domain knowledge injection. But I do not see how SMEs who are not machine learning practitioners could help in this process.
Automate the automation
Not only is machine learning the biggest technological revolution in IT history, but it is also coming with drastic organisational changes that will impact the SMEs’ role in two ways. First, machine learning can do fine with or without domain knowledge. Sometimes it does better with domain knowledge, but not always as I have illustrated yesterday in two notorious examples. Second, when machine learning brings domain knowledge into the equation, it is sometimes by means of direct modifications within the statistical pipeline, and sometimes via the development of ad-hoc rule-based mechanisms that require careful integration with the statistical core.
One way or the other, this is a job for machine learning practitioners with high skills in statistical modelling and data structures. Simply stated, software programmers will now closely work with machine learning practitioners. The latter will develop models and prototypes (for example by binding exploratory Python code with open-source libraries), while the former will rewrite the final code for performance optimization purposes (for example in Java or C#) and develop clean coupling schemas with the rest of the architecture.
You may think that traditional SMEs with no skills in machine learning will adapt to this upcoming revolution, but most of them won’t. The required level of expertise in probabilistic algorithms and matrix computation is simply too high. And even those who will stay in the race are at risk of being relegated to a supportive role, either in the early stage of the development process (such as assessing the quality of the training data sets) or in the final testing phase (such as validating the resulting performance).
As IT practitioners, we are facing a double revolution. Not only will machine learning enable us to automate things that we believed impossible, but it will also change the way we address automation itself from an organizational standpoint.
EndFragment
EndFragment