Tag Archives: chips

New Spectre variants earn $100,000 bounty from Intel

Researchers found new speculative execution attacks against Intel and ARM chips, and the findings earned them a $100,000 reward under Intel’s bug bounty.

The new methods are themselves variations on Spectre v1 — the bounds check bypass version of Spectre attacks — and are being tracked as Spectre variants 1.1 and 1.2.

The new Spectre 1.1 has also earned a new Common Vulnerabilities and Exposures (CVE) number, CVE-2018-3693, because it “leverages speculative stores to create speculative buffer overflows” according to Vladimir Kiriansky, a doctoral candidate in electrical engineering and computer science at MIT, and Carl Waldspurger of Carl Waldspurger Consulting.

“Much like classic buffer overflows, speculative out-of-bounds stores can modify data and code pointers. Data-value attacks can bypass some Spectre v1 mitigations, either directly or by redirecting control flow. Control-flow attacks enable arbitrary speculative code execution, which can bypass fence instructions and all other software mitigations for previous speculative-execution attacks. It is easy to construct return-oriented-programming gadgets that can be used to build alternative attack payloads,” Kiriansky and Waldspurger wrote in their research paper. “In a speculative data attack, an attacker can (temporarily) overwrite data used by a subsequent Spectre 1.0 gadget.”

Spectre 1.2 does not have a new CVE because it “relies on lazy enforcement” of read/write protections.

“Spectre 1.2 [is] a minor variant of Spectre v1, which depends on lazy PTE enforcement, similar to Spectre v3,” the researchers wrote. “In a Spectre 1.2 attack, speculative stores are allowed to overwrite read-only data, code pointers and code metadata, including v-tables [virtual tables], GOT/IAT [global offset table/import address table] and control-flow mitigation metadata. As a result, sandboxing that depends on hardware enforcement of read-only memory is rendered ineffective.”

As the research paper from Kiriansky and Waldspurger went live, Intel paid them a $100,000 bug bounty for the new Spectre variants. After the initial announcement of the Spectre and Meltdown vulnerabilities in January 2018, Intel expanded its bug bounty program to include rewards of up to $250,000 for similar side-channel attacks.

I expect that more variants of Spectre and/or Meltdown will continue to be discovered in the future.
Nick BilogorskiyCybersecurity strategist, Juniper Networks

Nick Bilogorskiy, cybersecurity strategist at Juniper Networks, also noted that the research into these new Spectre variants was partially funded by Intel.

“When implemented properly, bug bounties help both businesses and the research community, as well as encourage more security specialists to participate in the audit and allow CISOs to optimize their security budgets for wider security coverage,” Bilogorskiy wrote via email. “These bugs are new minor variants of the original Spectre variant one vulnerability and have similar impact. They exploit speculative execution and allow speculative buffer overflows. I expect that more variants of Spectre and/or Meltdown will continue to be discovered in the future.”

ARM and Intel did not respond to requests for comment at the time of this post. ARM did update its FAQ about speculative processor vulnerabilities to reflect the new Spectre variants. And Intel published a white paper regarding bounds check bypass vulnerabilities at the same time as the disclosure of the new Spectre variants. In it, Intel did not mention plans for a new patch but gave guidance to developers to ensure bounds checks are implemented properly in software as a way to mitigate the new issues.

Advanced Micro Devices was not directly mentioned by the researchers in connection with the new Spectre variants, but Spectre v1 did affect AMD chips. AMD has not made a public statement about the new research.

TLBleed attack can extract signing keys, but exploit is difficult

An interesting, new side-channel attack abuses the Hyper-Threading feature of Intel chips and can extract signing keys with near-perfect accuracy. But both the researchers and Intel downplayed the danger of the exploit.

Ben Gras, Kaveh Razavi, Herbert Bos and Cristiano Giuffrida, researchers at Vrije Universiteit’s systems and network security group in Amsterdam, said their attack, called TLBleed, takes advantage of the translation lookaside buffer cache of Intel chips. If exploited, TLBleed can allow an attacker to extract the secret 256-bit key used to sign programs, with a success rate of 99.8% on Intel Skylake and Coffee Lake processors and 98.2% accuracy on Broadwell Xeon chips.

However, Gras tweeted that users shouldn’t be too scared of TLBleed, because while it is “a cool attack, TLBleed is not the new Spectre.”

“The OpenBSD [Hyper-Threading] disable has generated interest in TLBleed,” Gras wrote on Twitter. “TLBleed is a new side-channel in that it shows that (a) cache side-channel protection isn’t enough: TLB still leaks information; (b) side-channel safe code that is constant only in the control flow and time but not data flow is unsafe; (c) coarse-grained access patterns leak more than was previously thought.”

Justin Jett, director of audit and compliance for Plixer LLC, a network traffic analysis company based in Kennebunk, Maine, said TLBleed is “fairly dangerous, given that the flaw allows for applications to gain access to sensitive memory information from other applications.” But he noted that exploiting the issue would prove challenging.

“The execution is fairly difficult, because a malicious actor would need to infect a machine that has an application installed that they want to exploit. Once the machine is infected, the malware would need to know when the application was executing code to be able to know which memory block the sensitive information is being stored in. Only then will the malware be able to attempt to retrieve the data,” Jett wrote via email. “This is particularly concerning for applications that generate encryption keys, because the level of security that the application is trying to create could effectively be reduced to zero if an attacker is able to decipher the private key.”

Intel also downplayed the dangers associated with TLBleed; the company has not assigned a CVE number and will not patch it.

“TLBleed uses the translation lookaside buffer, a cache common to many high-performance microprocessors that stores recent address translations from virtual memory to physical memory. Software or software libraries such as Intel Integrated Performance Primitives Cryptography version U3.1 — written to ensure constant execution time and data independent cache traces should be immune to TLBleed,” Intel wrote in a statement via email. “Protecting our customers’ data and ensuring the security of our products is a top priority for Intel, and we will continue to work with customers, partners and researchers to understand and mitigate any vulnerabilities that are identified.”

Jett noted that even if Intel isn’t planning a patch, it should do more to alert customers to the dangers of TLBleed.

“Intel’s decision to not release a CVE number is odd at best. While Intel doesn’t plan to patch the vulnerability, a CVE number should have been requested so that organizations could be updated on the vulnerability and software developers would know to write their software in a way that may avoid exploitation,” Jett wrote. “Without a CVE number, many organizations will remain unaware of the flaw.”

The researchers plan to release the full paper this week. And, in August, Gras will present on the topic at Black Hat 2018 in Las Vegas.

Microsoft unveils Project Brainwave for real-time AI – Microsoft Research

Hot Chips Stratix 10 boardBy Doug Burger, Distinguished Engineer, Microsoft

Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave.  I’m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency.  Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users.

The Project Brainwave system is built with three main layers:

  1. A high-performance, distributed system architecture;
  2. A hardware DNN engine synthesized onto FPGAs; and
  3. A compiler and runtime for low-friction deployment of trained models.

First, Project Brainwave leverages the massive FPGA infrastructure that Microsoft has been deploying over the past few years.  By attaching high-performance FPGAs directly to our datacenter network, we can serve DNNs as hardware microservices, where a DNN can be mapped to a pool of remote FPGAs and called by a server with no software in the loop.  This system architecture both reduces latency, since the CPU does not need to process incoming requests, and allows very high throughput, with the FPGA processing requests as fast as the network can stream them.

Second, Project Brainwave uses a powerful “soft” DNN processing unit (or DPU), synthesized onto commercially available FPGAs.  A number of companies—both large companies and a slew of startups—are building hardened DPUs.  Although some of these chips have high peak performance, they must choose their operators and data types at design time, which limits their flexibility.  Project Brainwave takes a different approach, providing a design that scales across a range of data types, with the desired data type being a synthesis-time decision.  The design combines both the ASIC digital signal processing blocks on the FPGAs and the synthesizable logic to provide a greater and more optimized number of functional units.  This approach exploits the FPGA’s flexibility in two ways.  First, we have defined highly customized, narrow-precision data types that increase performance without real losses in model accuracy.  Second, we can incorporate research innovations into the hardware platform quickly (typically a few weeks), which is essential in this fast-moving space.  As a result, we achieve performance comparable to – or greater than – many of these hard-coded DPU chips but are delivering the promised performance today.

Intel Stratix 10

At Hot Chips, Project Brainwave was demonstrated using Intel’s new 14 nm Stratix 10 FPGA.

Third, Project Brainwave incorporates a software stack designed to support the wide range of popular deep learning frameworks.  We already support Microsoft Cognitive Toolkit and Google’s Tensorflow, and plan to support many others.  We have defined a graph-based intermediate representation, to which we convert models trained in the popular frameworks, and then compile down to our high-performance infrastructure.

We architected this system to show high actual performance across a wide range of complex models, with batch-free execution.  Companies and researchers building DNN accelerators often show performance demos using convolutional neural networks (CNNs).  Since CNNs are so compute intensive, it is comparatively simple to achieve high performance numbers.  Those results are often not representative of performance on more complex models from other domains, such as LSTMs or GRUs for natural language processing.  Another technique that DNN processors often use to boost performance is running deep neural networks with high degrees of batching.  While this technique is effective for throughput-based architectures—as well as off-line scenarios such as training—it is less effective for real-time AI.  With large batches, the first query in a batch must wait for all of the many queries in the batch to complete.  Our system, designed for real-time AI, can handle complex, memory-intensive models such as LSTMs, without using batching to juice throughput.

At Hot Chips, Eric Chung and Jeremy Fowers demonstrated the Project Brainwave system ported to Intel’s new 14 nm Stratix 10 FPGA.

Even on early Stratix 10 silicon, the ported Project Brainwave system ran a large GRU model—five times larger than Resnet-50—with no batching, and achieved record-setting performance.  The demo used Microsoft’s custom 8-bit floating point format (“ms-fp8”), which does not suffer accuracy losses (on average) across a range of models.  We showed Stratix 10 sustaining 39.5 Teraflops on this large GRU, running each request in under one millisecond.  At that level of performance, the Brainwave architecture sustains execution of over 130,000 compute operations per cycle, driven by one macro-instruction being issued each 10 cycles.  Running on Stratix 10, Project Brainwave thus achieves unprecedented levels of demonstrated real-time AI performance on extremely challenging models.  As we tune the system over the next few quarters, we expect significant further performance improvements.

We are working to bring this powerful, real-time AI system to users in Azure, so that our customers can benefit from Project Brainwave directly, complementing the indirect access through our services such as Bing.  In the near future, we’ll detail when our Azure customers will be able to run their most complex deep learning models at record-setting performance.  With the Project Brainwave system incorporated at scale and available to our customers, Microsoft Azure will have industry-leading capabilities for real-time AI.

Related: