Part 4: Experimenting to the Bitter End

Creating a Predictive AI Model

Uwe Weinreich, the author of this blog, usually coaches teams and managers on topics related to strategy, innovation and digital transformation. Now he is seeking a direct confrontation with Artificial Intelligence.

The outcome is uncertain.

Stay informed via Twitter or linkedIn.

Already published:

1. AI and Me – Diary of an experiment

2. Maths, Technology, Embarrassement

3. Learning in the deep blue sea - Azure

4. Experimenting to the Bitter End

5. The difficult path towards a webservice

6. Text analysis demystified

7. Image Recognition and Surveillance

8. Bad Jokes and AI Psychos

9. Seven Management Initiatives

10. Interview with Dr. Zeplin (Otto Group)

By now everything is working. The task now is to establish a predictive service with the help of Machine Learning, which can say, based on data input, whether someone probably has diabetes or not.

This feels awkward. Do I really want an algorithm to make a pronouncement on this subject? It will also soon become apparent that in the context of this subject the term “predictive“ is used very generously. Predictive, in my opinion, would be if the system could tell me the probability that I'll get diabetes in the next 10 years. But no, that's not the aim here. Predictive, in this context, means that data which don't point overtly to the label “diabetic“ can generate a statistical conclusion regarding whether the subject is diabetic. It's also a prediction which follows the principle: "If it looks like a duck, walks like a duck and quacks like a duck, it must be a duck." Of course I'll only be sure after I've examined a tissue sample to see if it contains the duck genome. I believe some surprises may be in store...

dug

Supervised Machine Learning

It all begins by allowing Azure's AI to learn from datasets for which it is already known who is a diabetic and who is not. To this end, prepared datasets are uploaded and visually brought together in the Azure interface.

In order to be able to test whether the created model functions reliably, we need to work with test datasets. Here the datasets are split 70% : 30% at random: 70% for the learning and the rest for the testing.

Of course nothing happens automatically from this point. One must let the system know how it should build a model. In our case, initially, it's a regression analysis. The complete process looks like this:

azure-04

Logistical regression is a classic and widely used method for establishing interrelations in datasets. The analysis simply calculates whether the characteristics of different variables are simultaneously present. The more similar, the stronger the interrelation.

The best source of mixed-up nonsense

This procedure is useful und grounded in solid mathematics, in order to examine whether supposed interrelations exist. It looks very different, however, if one goes looking for previously unimagined coherences. That can conjure up big surprises. One stumbles upon hitherto overlooked connections, for example the many statistical links between nutrition and health. Every week something new comes along in this field.

In the last few years, through the implentation of Big Data Analysis and AI, another aspect has arisen: statistiscal nonsense, or, in the jargon, artefacts. If you have a great deal of data, it's almost unavoidable that there will be some similarities which one can uncover with the help of regression analysis or other statistical methods. Thereby results are produced which are admittedly statistically highly significant, but are quite obvious logical fallacies. These fallacies are also not always as easy to recognise as in the example below, focusing on the alleged telekinetic talents of Nicolas Cage. (see tylervigen.com/spurious-correlations):

eroneousStats

If the nonsense isn't so obvious, and one also assumes a causal chain of events (which is statistically almost never proven), one can then issue press releases which will immediately hoodwink sensation-seeking journalists.

Seek Until You Find

OK then, logistical regression didn't exactly cover itself in glory in this experiment, which led to the implementation of a second procedure: a decision tree. This produced better results, as shown by the graphic: the regression (blue line) didn't differentiate nearly as sharply between the two groups as the decision tree (red line). You can recognise this from the number of datasets with false positives and false negatives. Both values should ideally be zero. Then a perfectly right-angled curve would result.

azure-07

At this stage a further principle of Big Data Analysis, or rather Machine Learning to build models, becomes clear: we use new procedures until something delivers an acceptable result. There's nothing wrong with this, but the possibility of finding artefacts (the statistical nonsense mentioned above) naturally increases.

So what's the right procedure?

I don't want to deny that Machine Learning and AI have, by using these methods, led to enormous gains in knowledge and functionality, which could never have been achieved otherwise. The advances in picture and voice recognition have been enormous. Mistake recognition and protection against cyber attacks also work much better with well trained AI. It is inevitable to deploy these methods with care, however. This requires attention to be paid to the following points:

Quality Of Data: If you put rubbish in, you'll get rubbish out.
Quantity and Breadth of datasets: Lots of mistakes happen when there is insufficient data to input, or data which only represents a small part of the population.
Clean Testing: Once a model has been created, it must be validated: not only once, but again and again. In cases of doubt, adjust the model to fit reality, rather than forcing reality to conform to the model.
Theory-based Model-building: Sadly there is a trend to simply build models from interrelations and use them unquestioningly. But a clean process demands a model supported by theory. All cointerrelations and predictions delivered by a model should in theory be fail-safe. Of course, analysis can also lead to the alteration and further development of existing theories. A procedure which is completely divorced from theory, and thus human understanding, can easily lead to artefacts. From there, it's only a short step to, merely as a consequence of noting the number of deaths by drowning, more films being produced starring Nicolas Cage. And none of us wants that...

⬅ previous blog entry next blog entry ⮕