Tag Archives: algorithms

Deep learning rises: New methods for detecting malicious PowerShell – Microsoft Security

Scientific and technological advancements in deep learning, a category of algorithms within the larger framework of machine learning, provide new opportunities for development of state-of-the art protection technologies. Deep learning methods are impressively outperforming traditional methods on such tasks as image and text classification. With these developments, there’s great potential for building novel threat detection methods using deep learning.

Machine learning algorithms work with numbers, so objects like images, documents, or emails are converted into numerical form through a step called feature engineering, which, in traditional machine learning methods, requires a significant amount of human effort. With deep learning, algorithms can operate on relatively raw data and extract features without human intervention.

At Microsoft, we make significant investments in pioneering machine learning that inform our security solutions with actionable knowledge through data, helping deliver intelligent, accurate, and real-time protection against a wide range of threats. In this blog, we present an example of a deep learning technique that was initially developed for natural language processing (NLP) and now adopted and applied to expand our coverage of detecting malicious PowerShell scripts, which continue to be a critical attack vector. These deep learning-based detections add to the industry-leading endpoint detection and response capabilities in Microsoft Defender Advanced Threat Protection (Microsoft Defender ATP).

Word embedding in natural language processing

Keeping in mind that our goal is to classify PowerShell scripts, we briefly look at how text classification is approached in the domain of natural language processing. An important step is to convert words to vectors (tuples of numbers) that can be consumed by machine learning algorithms. A basic approach, known as one-hot encoding, first assigns a unique integer to each word in the vocabulary, then represents each word as a vector of 0s, with 1 at the integer index corresponding to that word. Although useful in many cases, the one-hot encoding has significant flaws. A major issue is that all words are equidistant from each other, and semantic relations between words are not reflected in geometric relations between the corresponding vectors.

Contextual embedding is a more recent approach that overcomes these limitations by learning compact representations of words from data under the assumption that words that frequently appear in similar context tend to bear similar meaning. The embedding is trained on large textual datasets like Wikipedia. The Word2vec algorithm, an implementation of this technique, is famous not only for translating semantic similarity of words to geometric similarity of vectors, but also for preserving polarity relations between words. For example, in Word2vec representation:

Madrid – Spain + Italy ≈ Rome

Embedding of PowerShell scripts

Since training a good embedding requires a significant amount of data, we used a large and diverse corpus of 386K distinct unlabeled PowerShell scripts. The Word2vec algorithm, which is typically used with human languages, provides similarly meaningful results when applied to PowerShell language. To accomplish this, we split the PowerShell scripts into tokens, which then allowed us to use the Word2vec algorithm to assign a vectorial representation to each token .

Figure 1 shows a 2-dimensional visualization of the vector representations of 5,000 randomly selected tokens, with some tokens of interest highlighted. Note how semantically similar tokens are placed near each other. For example, the vectors representing -eq, -ne and -gt, which in PowerShell are aliases for “equal”, “not-equal” and “greater-than”, respectively, are clustered together. Similarly, the vectors representing the allSigned, remoteSigned, bypass, and unrestricted tokens, all of which are valid values for the execution policy setting in PowerShell, are clustered together.

Figure 1. 2D visualization of 5,000 tokens using Word2vec

Examining the vector representations of the tokens, we found a few additional interesting relationships.

Token similarity: Using the Word2vec representation of tokens, we can identify commands in PowerShell that have an alias. In many cases, the token closest to a given command is its alias. For example, the representations of the token Invoke-Expression and its alias IEX are closest to each other. Two additional examples of this phenomenon are the Invoke-WebRequest and its alias IWR, and the Get-ChildItem command and its alias GCI.

We also measured distances within sets of several tokens. Consider, for example, the four tokens $i, $j, $k and $true (see the right side of Figure 2). The first three are usually used to represent a numeric variable and the last naturally represents a Boolean constant. As expected, the $true token mismatched the others – it was the farthest (using the Euclidean distance) from the center of mass of the group.

More specific to the semantics of PowerShell in cybersecurity, we checked the representations of the tokens: bypass, normal, minimized, maximized, and hidden (see the left side of Figure 2). While the first token is a legal value for the ExecutionPolicy flag in PowerShell, the rest are legal values for the WindowStyle flag. As expected, the vector representation of bypass was the farthest from the center of mass of the vectors representing all other four tokens.

Figure 2. 3D visualization of selected tokens

Linear Relationships: Since Word2vec preserves linear relationships, computing linear combinations of the vectorial representations results in semantically meaningful results. Below are a few interesting relationships we found:

high – $false + $true ≈’ low
‘-eq’ – $false + $true ‘≈ ‘-neq’
DownloadFile – $destfile + $str ≈’ DownloadString ‘
Export-CSV’ – $csv + $html ‘≈ ‘ConvertTo-html’
‘Get-Process’-$processes+$services ‘≈ ‘Get-Service’

In each of the above expressions, the sign ≈ signifies that the vector on the right side is the closest (among all the vectors representing tokens in the vocabulary) to the vector that is the result of the computation on the left side.

Detection of malicious PowerShell scripts with deep learning

We used the Word2vec embedding of the PowerShell language presented in the previous section to train deep learning models capable of detecting malicious PowerShell scripts. The classification model is trained and validated using a large dataset of PowerShell scripts that are labeled “clean” or “malicious,” while the embeddings are trained on unlabeled data. The flow is presented in Figure 3.

Figure 3 High-level overview of our model generation process

Using GPU computing in Microsoft Azure, we experimented with a variety of deep learning and traditional ML models. The best performing deep learning model increases the coverage (for a fixed low FP rate of 0.1%) by 22 percentage points compared to traditional ML models. This model, presented in Figure 4, combines several deep learning building blocks such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). Neural networks are ML algorithms inspired by biological neural systems like the human brain. In addition to the pretrained embedding described here, the model is provided with character-level embedding of the script.

Figure 4 Network architecture of the best performing model

Real-world application of deep learning to detecting malicious PowerShell

The best performing deep learning model is applied at scale using Microsoft ML.Net technology and ONNX format for deep neural networks to the PowerShell scripts observed by Microsoft Defender ATP through the AMSI interface. This model augments the suite of ML models and heuristics used by Microsoft Defender ATP to protect against malicious usage of scripting languages.

Since its first deployment, this deep learning model detected with high precision many cases of malicious and red team PowerShell activities, some undiscovered by other methods. The signal obtained through PowerShell is combined with a wide range of ML models and signals of Microsoft Defender ATP to detect cyberattacks.

The following are examples of malicious PowerShell scripts that deep learning can confidently detect but can be challenging for other detection methods:

Figure 5. Heavily obfuscated malicious script

Figure 6. Obfuscated script that downloads and runs payload

Figure 7. Script that decrypts and executes malicious code

Enhancing Microsoft Defender ATP with deep learning

Deep learning methods significantly improve detection of threats. In this blog, we discussed a concrete application of deep learning to a particularly evasive class of threats: malicious PowerShell scripts. We have and will continue to develop deep learning-based protections across multiple capabilities in Microsoft Defender ATP.

Development and productization of deep learning systems for cyber defense require large volumes of data, computations, resources, and engineering effort. Microsoft Defender ATP combines data collected from millions of endpoints with Microsoft computational resources and algorithms to provide industry-leading protection against attacks.

Stronger detection of malicious PowerShell scripts and other threats on endpoints using deep learning mean richer and better-informed security through Microsoft Threat Protection, which provides comprehensive security for identities, endpoints, email and data, apps, and infrastructure.

Shay Kels and Amir Rubin
Microsoft Defender ATP team

Additional references:

Go to Original Article
Author: Microsoft News Center

The uphill battle of beating back weaponized AI

Artificial intelligence isn’t just for the law-abiding. Machine learning algorithms are as freely available to cybercriminals and state-sponsored actors as they are to financial institutions, retailers and insurance companies.

“When we look especially at terrorist groups who are exploiting social media, [and] when we look at state-sponsored efforts to influence and manipulate, they’re using really powerful algorithms that are at everyone’s disposal,” said Yasmin Green, director of research and development at Jigsaw, a technology incubator launched by Google to try to solve geopolitical problems.

Criminals need not develop new algorithms or new AI, Green said at the recent EmTech conference in Cambridge, Mass. They can and are exploiting what is already out there to manipulate public opinion.

The good news about weaponized AI? The tools to combat these nefarious efforts are also advancing. One promising lead, according to Green, is bad actors don’t exhibit the same kinds of online behavior that typical users do. And security experts are hoping to exploit the behavioral “tells” they’re seeing — with the help of machines, of course.

Variations on weaponized AI

Cybercriminals and internet trolls are adept at using AI to simulate human behavior and trick systems or peddle propaganda. The online test used to tell humans from machines, CAPTCHA, is continuously bombarded by bad guys trying to trick it.

In an effort to stay ahead of cybercriminals, CAPTCHA, which stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart, has had to evolve, creating some unanticipated consequences, according to Shuman Ghosemajumder, CTO at Shape Security in Mountain View, Calif. Recent data from Google shows that humans solve CAPTCHAs just 33% of the time. That’s compared to state-of-the-art machine learning optical character recognition technology that has a solve rate of 99.8%.

“This is doing exactly the opposite of what CAPTCHA was originally intended to do,” Ghosemajumder said. “And that has now been weaponized.”

He said advances in computer vision technology have led to weaponized AI services such as Death By CAPTCHA, an API plug-in that promises to solve 1,000 CAPTCHAs for $1.39. “And there are, of course, discounts for gold members of the service.”

A more aggressive attack is credential stuffing, where cybercriminals use stolen usernames and passwords from third-party sources to gain access to accounts.

Sony was the victim of a credential-stuffing attack in 2011. Cybercriminals culled a list of 15 million credentials stolen from other sites and then tested if they worked on Sony’s login page using a botnet. Today, an outfit by the good-guy-sounding name of Sentry MBA — the MBA stands for Modded By Artists — provides cybercriminals with a user interface and automation technology, making it easy to test the veracity of stolen usernames and passwords and to even bypass security features like CAPTCHAs.

“We see these types of attacks responsible for tremendous amounts of traffic on some of the world’s largest websites,” Ghosemajumder said. In the case of one Fortune 100 company, credential-stuffing attacks made up more than 90% of its login activity.

Shuman Ghosemajumder, EmTech, Shape Security, credential-stuffing attacks
Shuman Ghosemajumder shares a snippet of traffic from a Fortune 100 retailer. ‘We see that on a 24/7 basis, more than 90% of the login activity was coming from credential-stuffing attacks,’ he said.

Behavioral tells in weaponized AI

Ghosemajumder’s firm Shape Security is now using AI to detect credential-stuffing efforts. One method is to use machine learning to identify behavioral characteristics that are typical of cybercriminal exploits.

When cybercriminals simulate human interactions, they will, for example, move the mouse from the username field to the password field quickly and efficiently — in an unhumanlike manner. “Human beings are not capable of doing things like moving a mouse in a straight line — no matter how hard they try,” Ghosemajumder said.

Jigsaw’s Green said her team is also looking for “technical markers” that can distinguish truly organic campaigns from coordinated ones. She described state-sponsored actors who peddle propaganda and attempt to spread misinformation through what she called “seed-and-fertilizer campaigns.”

The goal of these state-sponsored campaigns is to plant a seed in social conversations and to have the unwitting masses fertilize that seed for it to actually become an organic conversation.
Yasmin Greendirector of research and development, Jigsaw

“The goal of these state-sponsored campaigns is to plant a seed in social conversations and to have the unwitting masses fertilize that seed for it to actually become an organic conversation,” she said.

“There are a few dimensions that we think are promising to look at. One is the temporal dimension,” she said.

Looking across the internet, Jigsaw began to understand that coordinated attacks tend to move together, last longer than organic campaigns and pause as state-sponsored actors waited for instructions on what to do. “You’ll see a little delay before they act,” she said.

Other dimensions include network shape and semantics. State-sponsored actors tend to be more tightly linked together than communities within organic campaigns, and they tend to use “irregularly similar” language in their messaging.

The big question is can behavioral tells — identified by machines and combined with automated detection — be used to effectively identify state-sponsored campaigns? No doubt, time will tell.

Go Universal! Now your ad campaigns can reach users across Microsoft premium surfaces like MSN, Outlook and Skype

Here’s your New Year’s gift from Windows Store team. The “Promote Your App” ad campaigns that delivered ad creatives in other apps in the Windows eco-system just got a major update.

We are happy to announce the launch of universal campaigns that will deliver your ads across Microsoft premium surfaces such as MSN.com, Outlook.com, and Skype, as well as games like the Microsoft solitaire collection.

Why should I use universal campaigns?

The first and the most important reason is wider reach. Microsoft premium surfaces such as MSN, Outlook, Skype and Solitaire collection are used by millions of users daily, and now your ad campaigns have a chance to showcase your awesome app when they interact on these surfaces.

Second, your ad campaigns will now get a very wide variety of touch points. Studies have shown that media-mix models can fetch your company multi-fold revenues. In the context of app promotions, it is the app installs and app engagements that can grow multi-fold.

MSN is primarily used in the context of news and entertainment, while Skype is used for personal communication and Outlook is used for email. Many Windows customers spend a fair amount of time playing popular games like Microsoft Solitaire. Your ad can now capture the attention of users in all these different situations, as well as improve recall and conversion rates.

Finally, the allocation of budget among the Microsoft surfaces is driven by powerful machine-learning algorithms. These algorithms start with finding the right set of users who should be exposed to your ad campaign, and then measure the effectiveness of each surface to adjust the target user profiles and budget allocations. Rest assured, these algorithms are working to give you the best possible ROI.

How do I get started?

  1. All campaigns that use auto-targeting will be universal campaigns by default. To use auto-targeting, make sure Automatic is selected for the Audience section of your campaign settings in the Dev Center dashboard.
  1. If you wish to create a manually targeted campaign, make sure you choose Universal for the Surface setting in the Audience section of your campaign.

Give it a try!  If you’ve got suggestions for how we can make these features even more useful, please let us know at Windows Developer Feedback.