Wolf in husky fur? AI and cybersecurity
If you take a critical look behind the scenes of the advertising messages, you first notice that the term "Artificial Intelligence" is used very liberally in the marketing departments. "AI-powered" products usually use only one aspect of AI, namely machine learning.
Machine learning, however, is neither particularly new nor innovative in the field of cybersecurity.
For more than 10 years, anti-malware vendors have been using machine learning to analyze vast numbers of new malware variants samples and - now fully automated - generate detection signatures.
Machine learning algorithms have been used in SPAM and phishing detection for 20 years now - although not exclusively.
It is important to understand that all of these application areas are generally not about "deep learning" - i.e., the use of multilayer artificial neural networks. These are still far too memory- and CPU-hungry for use on server or client systems whose main application area is not the neural network.
The machine learning algorithm for cybersecurity does not exist: machine learning is very well suited to operate in a narrowly defined task domain.
Cybersecurity, and even a small slice like endpoint security, covers a wide range of possible attack vectors and methods. There is no "one-size-fits-all solution" from the AI bag of tricks here.
Machine learning algorithms get better and better "by themselves" over time? It is true that machine learning becomes better and better with large amounts of qualified data - in other words, it "learns".
By qualified data is meant that, in addition to the actual data, the algorithm also needs the information whether, for example, these files are infected or harmless, or whether an e-mail is ham or spam.
This means that, as a rule, the algorithms cannot be trained by the customer alone, because very few "normal users" are able, for example, to distinguish a malware-infected file from a clean file - at least as long as the malware (e.g. ransomware) has not become active.
Can AI solutions already replace classic security solutions today? Only the very, very brave - or very reckless - should back this horse.
Deterministic methods such as classic IP filters and/or pattern matching methods are still far superior for the vast majority of application fields, both in terms of performance and accuracy AI solutions. Depending on the area of application, it is also possible and sensible to weigh up the use of deterministic methods in the blacklist or whitelist procedure.
Do Machine Learning and Big (Training) Data necessarily improve the results? The quality of the results of a machine learning-based classifier, especially in the area of deep learning, depends not only on the algorithm, but also - or even more crucially - on the data with which it was trained.
Unfortunately, we users and end users cannot watch the cyber-security machine-learning algorithm make decisions. If we could do so, as scientists at the University of Washington have, we might discover cases like the one of the husky that was mistakenly identified as a wolf.
The reason for this was that most of the images of wolves used to train the system showed wolves in the snow. The visualization of the decision basis of the algorithm consequently showed that the animal on the only played a minor role in the decision. The presence of snow was decisive.