Every day more than 350,000 new types of malware are unleashed on the internet. The scale of the problem is so massive, it is no longer enough to have traditional anti-virus software, solely defending against known threats.
Zero-day threats are particularly hard to spot because the very fact that they will not have been seen before means they will not match any known malware signatures. New, advanced protection technologies, using artificial intelligence, are needed. Leading the charge is machine learning – an area of computer science used successfully in image recognition, searching and decision-making. An ML model, which has been fed updated virus definitions on a weekly basis, will capture the latest previously unknown versions of malware, plugging the gap where traditional signature-match solutions fail.
Today, ML can thrive off a wide range of data on host, network and cloud-based anti-malware components, training itself with better accuracy than ever before. This is crucial because malware is growing in sophistication as well as scope, and the risk of it inflicting huge operational and reputational issues on an organisation continues to rise. Frequently it will hide undetected inside business networks for longer than any retention policy, seeking out and infecting all backups, making malware-free recoveries hugely challenging. Even with the very best security countermeasures in place, organisations cannot afford to rule out the prospect that one day soon their live environment will be compromised. Once that happens, it quickly becomes a race against time to detect and halt an attack – before the impact becomes catastrophic.
In today’s on-demand world, making a speedy recovery is of paramount importance – but the race may already be lost if backup data is compromised too. This is where machine-learning has a vital role to play.
What exactly is machine learning and how does it work?
According to the classic definition given by Arthur Samuel, a pioneer in artificial intelligence, machine learning is a set of methods that gives computers ‘the ability to learn without being explicitly programmed’. In other words, a machine learning algorithm discovers and formalises the principles that underlie the data it sees. In doing so, it recognises patterns in a large amount of data.
With this knowledge, the algorithm can ‘reason’ the properties of previously unseen samples. In malware detection, a previously unseen sample could be a new file. Its hidden property could be malware or benign. The machine learning found in a lot of anti-malware software tries to learn which files are malicious and which are benign, based on databases of both malicious and benign code.
The AI involved makes decisions about whether or not analysed code is harmful based on a series of traits – some of which rank higher than others. Security tools typically trap threats by matching malware signatures to databases of known harmful code, but more sophisticated threats avoid signature detection. Malicious authors have quickly realised they can wreak havoc by writing single-use malware, never seen before by the security community. To combat this, all good anti-malware software these days employs types of heuristic algorithms. Good heuristics can prevent zero-day attacks, and a fine example of heuristic technology is machine-learning malware analysis. Malware is evolving rapidly, so the algorithms must evolve rapidly as well. It’s a constant, ongoing process.
How important is machine learning as an additional layer of protection?
Organisations are now deploying ML to detect and remove malware after every backup from servers, laptops and the cloud. Ringfencing backup data in this way provides additional protection – and much needed peace of mind.No CEO or head of IT wants to be left waiting nervously for confirmation that backups are in a safe state. So when the future of a business rests firmly on an organisation’s capability to restore mission-critical files, ML can help provide that extra reassurance.
The National Cyber Security Centre strongly recommends deploying a multi-layer security strategy as the best way to thwart the increasing number of attacks that target both primary and backup copies of data. As organisations consider products to keep networks safe, security features that utilise machine learning and artificial intelligence should be high up on the list.
Why is machine learning more prevalent now?
At the beginning of last year, the digital universe consisted of an estimated 44 zettabytes of data – by 2025, more than 10 times that amount is expected to be created EVERY 24 hours. The capacity to collect and filter huge sums of information is already too cumbersome for even a large workforce to undertake. However, this age of ‘big data’ and massive computing allows artificial intelligence to learn through brute force.
Machine-learning anti-malware software can never be client-driven, because even the PCs and mobile devices of the largest corporations are only exposed to small, limited samples of malware. Proper ML requires ‘big data’ processing and cloud-based systems – and it is deployed a lot more frequently these days because effective technology is much cheaper. Now that cloud servers are more available, ML malware analysis is more accessible too.
Is machine learning coming up with new ways of hunting malware?
Machine learning has a variety of approaches that it takes to a solution rather than a single method. Another way in which ML enables improved detection, is by hunting malware based on behaviour modelling. Bad-behaviour modelling looks at actions such as accessing saved passwords, local documents, browsing history, or contacts.
This limits malware detection tools to acting only on what they are programmed to do, whereas hunting models using good-behaviour modelling are much harder to circumvent. For instance, machine learning will determine when an employee is most likely to log in to a network or access certain file shares.
So anything outside the norm will be flagged up, such as when:
- An employee or device transfers huge volumes of data.
- A connection is made to another network or device outside normal use or normal hours.
- An employee uses programs or tools that do not fit with their remit e.g. a finance worker runs a network scan late at night.
- An employee or device uses an excessive amount of computer resources such as CPU, GPU, or memory.
- Human error is responsible for accidentally deleting data in a way that is out of context for normal behaviour.
For machine learning to be effective, good-behaviour modelling requires the capturing, analysis, and processing of massive amounts of data – and cloud-based services have made the processing power to do that far more affordable.
What does the future hold?
While the threat of malware is constantly evolving, the ML to combat it is too. Redstor’s malware detection for backup data utilises the latest technology.
When customers purchase automated malware detection as an added feature, every backup from a server, laptop and any other end-point machine or device will be checked for files that resemble malware in appearance or behaviour. This provides a powerful additional layer of protection that complements existing antivirus software. Users have nothing to configure, install or upgrade, there is no impact on internal resources and Redstor preserves the sanctity of customer data, which is encrypted at source, in transit and at rest. When a suspicious file is detected, a notification then gives the user the option to delete the file, revert to a previous safe version, mark it as safe or leave it in quarantine.
Redstor is available worldwide through a network of resellers. For further information please visit Redstor.