Tag Archives: history

The Napkin Disrupted: Meet Ink to Code, a Microsoft Garage Project – Microsoft Garage

Urban legend has it that some of the greatest ideas in history started with a napkin. The Gettysburg Address, the poem that gave way to the U.S. National Anthem, and the premise of the Harry Potter series—each were reportedly born into the world through the medium of sketches on scrap paper—and when app creators put pen to paper for their ideas, this is often a canvas of choice. While rapid prototyping with the napkin and the whiteboard holds its charms, less charming is the prospect of translating quick sketches into working code.

Last summer, a group of Garage interns tackled this problem by creating a prototype of their own: meet Ink to Code, a Microsoft Garage project, now available in the United States and Canada. Ink to Code is a Windows app that enables developers to draw wire frame sketches and export them into Visual Studio, expediting the process of prototyping Universal Windows Platform (UWP) and Android user interfaces.

The Garage Internship takes a unique, entrepreneurial spin on the traditional big tech model; rather than embedding with a full-time organization, students work in groups of 5-6 as a distinct team, building their own, standalone project. Microsoft product groups vie for intern teams to work on proposed projects by pitching opportunities to interns at the start of the internship. This summer at the Microsoft New England Research and Development facility (fondly known as NERD) located in Cambridge, MA, 6 interns found their passion in the pitch for Ink to Code and signed up to work with the Xamarin team sponsoring the idea. 5 more interns studying at MIT joined the Garage team to continue working on the project.

Building a Better Napkin

Ink to Code Guide Feature Screenshot
Ink to Code captures sketches of basic visual elements and translates them into the beginnings of an app in Visual Studio

The sponsoring team and interns were both motivated by a desire to modernize the brainstorming and prototyping process from using napkin and white board sketches, to an experience that is more automated and cohesive with the Visual Studio suite. “We’ve all been in that situation as developers,” notes Alex Corrado, a Senior Software Engineer on the Xamarin Designer team, and one of the originators of the project. “Getting your ideas for a new app or feature onto paper is one of the fastest, most natural parts of the brainstorming. But then, you ultimately need to turn that sketch into code and sooner than you know it, 10, 20, 30 iterations of a sketch really add up.” The team turned to the Smart Ink built into UWP to preserve the natural desire to sketch, while bridging the gap between analog and digital with a companion app for Visual Studio. In the Windows 10 Fall Creators Update, Smart Ink improves ink recognition with AI. The Ink to Code team leveraged this machine learning technology to save months on development time.

Ink to Code translates common design symbols into the beginnings of an app in Visual Studio. The first version supports basic app visual elements including labels, text fields, text paragraphs, images, and buttons. While Ink to Code can’t bring a full app vision to life, it significantly cuts down on creating the basic foundation of the app with the power of automation. Perhaps even more valuable is the way it enables developers and designers to collaborate differently. Ink to Code can be used as a more productive canvas in brainstorm meetings, or even more significantly, as a tool that can bridge the gap between collaborators with different levels of design or technical knowledge.

A Prototype for Prototypes

A core part of the Garage intern experience is conducting customer development and research, and the Ink to Code team worked with internal developers and designers to get feedback on their prototype. Today, the sponsoring Xamarin team releases the app to drastically expand the pool of feedback. Alex also shares, “Our goal is to hear from a wide variety of app creators, so we know what people like most and what we should add.”

“Developers are crazy diverse, and no experience could serve them 100% on day 1, but their feedback can help us get closer, faster,” adds George Matthews a Senior Program Manager in the Garage as well as a key originator of Ink to Code. The gut reaction of any app creator is to make sure your project is polished and perfect before shipping it, especially when releasing to an audience of your developer peers. The Ink to Code team is embracing the mindset of getting feedback early, and developing with the customer and for the customers.  George continues, “The feedback from our first customers will really help us stack rank our backlog.”

To check out Ink to Code and feed into the future direction of the project, download it at the Microsoft Store and share your thoughts via in-app feedback or UserVoice. Ink to Code is best with Visual Studio 2017.

Welcome to 2018, the year of AI – Asia News Center

By Ralph Haupter, President, Microsoft Asia. This article was originally posted on LinkedIn.

If the history of human advancement has taught us one thing it is this: genuine step-change progress does not occur because of a single technology breakthrough, but a combination of multiple complementary factors coming together at the same time.

The Industrial Revolution, which began in the UK around 1760, was driven by an amalgamation of steam power, improvements in iron production and the development of the first machine tools.

Similarly, the PC revolution of the early 1970’s was the outcome of simultaneous advancements in micro-processing, memory storage, software programming and other factors.

Now, as we enter 2018, we are at the cusp of a new revolution, one that will ultimately transform every organisation, every industry and every public service across the world.

I’m referring, of course, to Artificial Intelligence – or AI – and I believe 2018 is the year that this will start to become mainstream, to begin to impact many aspects of our lives in a truly ubiquitous and meaningful way.

AI: Over 65 Years In The Making
The concept of AI is not new. In fact, it stretches back to 1950 when early computing pioneer Alan Turing famously posed the question “Can Machines Think?” and it would be another 6 years, in 1956, before the term “artificial intelligence” was first used.

So it has taken nearly 70 years for the right combination of factors to come together to move AI from concept to an increasingly ubiquitous reality. And there are three innovation trends driving its acceleration and adoption right now.

  • The first is Big Data. The explosion of Internet-connected devices, sensors and objects has expanded exponentially the amount of data the world is now producing. In this increasingly digital era, data is the “new oil”– a source of value and sustainable competitive advantage.
  • The second factor is ubiquitous and powerful Cloud computing. Today, anyone with an idea and a credit card can access the same computing power that, traditionally, only global multinationals or governments have possessed. Cloud computing is democratizing technology and accelerating innovation on a global scale.
  • The third factor driving AI capabilities is breakthroughs in software algorithms and Machine Learning that can identify sophisticated patterns implicit within the data itself. If data is the new oil, Machine Learning is, perhaps, the new combustion engine.

So, it is this combination of powerful industry trends, all maturing at the same time, that is accelerating – and democratizing – AI today.

AI Everywhere
My colleague Harry Shum, who leads our AI & Research Group, refers to the way in which AI will impact our lives as an “invisible revolution”. What he means is that, increasingly, AI will be everywhere—powering your online recommendation engine, acting as a virtual assistant chatbot for your bank account or travel agent, personalizing your newsfeed or guarding your credit card against fraud. AI will be more pervasive – and yet less invasive – than any previous technology revolution.

In particular, AI will be embedded seamlessly into existing, well-established products and services to enhance their capabilities. Let me share a simple example of how AI is helping me work more effectively today. I travel frequently and am often required to present to multinational, multilingual audiences during business trips across Asia Pacific. Now, with a small piece of AI technology called Microsoft Presentation Translator, I can help overcome any language barriers as PowerPoint can show real-time subtitles in more than 60 languages, simultaneously as I speak, during my presentations.

In business, AI will be used by most companies for at least some part of the value chain either in research and development, design, logistics, manufacturing, servicing or customer engagement. In fact, leading IT industry analyst IDC believes that by next year, 40% of digital transformation initiatives globally will be supported by AI capabilities[1].

And you do not need to be a start-up or hi-tech company to embrace the possibilities of AI, just have the vision and commitment to make it happen. Take, for example, Mitsubishi Fuso Truck and Bus Corporation (MFTBC), an 85-year-old Japanese auto manufacturer, which has given itself just two years to become a “100% digital operation” – complete with cloud-based capabilities in AI, the Internet of Things (IoT), and Mixed Reality (MR).

One of the initiatives they have recently implemented is an AI-powered chatbot where all its 10,000 employees can access information and assistance they need in a faster, more intuitive and reliable way. This significantly reduces the time employees spend on learning the Intranet site navigation, searching for information or calling each other for help. The company is now planning to extend chatbot technology to boost customer services, productivity, and maintenance across the whole company.

AI in 2018
As we stand at the cusp of the new year, I see four key AI developments happening over the next 12 months:

  1. Mass adoption of AI starts from 2018:AI adoption is set to soar in 2018 and beyond when organizations start to see clear benefits being reaped by AI innovators such as MFTBC. IDC forecast that worldwide AI revenues will surge past US$46 billion in 2020[2]. Closer to home, AI investment in Asia Pacific is predicted to grow to US$6.9 billion by 2021, expanding rapidly by 73% (CAGR)
  2. Ubiquitous Virtual Assistants: We will begin to see the adoption of broad-scale AI in the form of conversational AI chatbots in both consumer and business scenarios. In fact, Gartner predicts that by 2020 more than 85% of customer interactions with the enterprise will be managed without a human interaction and AI will be the key technology deployed for customer service[3].
  3. Democratizing data and decision-making: In a world where more data exists than ever before, the ability to deliver meaningful business insights from that data to the maximum number of relevant employees becomes of paramount importance. AI will be the key technology for making that happen by bringing together data from employees, business apps, and the world.
  4. Building trusted foundations for AI:There will be increasingly more discussions at governmental and industrial levels to create formal governance and regulations in the usage of AI. We saw these discussions with the onset of eCommerce and the advent of cloud technologies. It is critical for transparent public-private conversations to take place as they will shape how AI can benefit economies and societies in a fair, transparent and trusted way.

The future of AI burns brightly and I see 2018 as the year that will establish a solid foundation for the mass adoption of this exciting and vital technology.


[1] IDC Reveals Worldwide Digital Transformation Predictions, Nov 2017

[2] IDC, Worldwide Spending on Cognitive and Artificial Intelligence Systems Forecast to Reach $12.5 Billion This Year, According to New IDC Spending Guide, April 2017

[2] https://www.forbes.com/sites/ciocentral/2011/09/27/customers-dont-want-to-talk-to-you-either/#ff504bc70dcd

Unclench fists, open arms: Dirk Hohndel on VMware open source shift

For much of its 19-year history VMware has been stubbornly proprietary in its approach to technology. But over the past year, the company has softened its hard-core stance, and is more receptive to embrace open source.

Behind this reformation is Dirk Hohndel, VMware’s vice president and chief open source officer. Since his arrival a little over a year ago, Hohndel has convinced the company to pour more financial and human resources into internal and external open source projects, and work more closely with industry organizations such as the Linux Foundation, of which he is a board member.

Hohndel sat down with senior executive editor Ed Scannell to discuss the evolved VMware open source approach, its relationship with corporate and third-party open source developers, and the future of containers and OpenStack.

Take me through the thinking behind the evolution of VMware open source container products since Project Photon’s introduction two years ago.

Dirk Hohndel: After Photon we released the vSphere Integrated Containers solutions, which was a way to deploy containers as workloads in a vSphere environment. But it didn’t deal with the question of orchestration or provide a larger framework on top of it.

One of the key questions we asked ourselves: In the broader spectrum of customers changing the way they deploy and develop applications in-house, what is the best way to provide them with an environment in which IT basically gets out of the way? A way in which developers get the environment they need to create container-based applications?

Dirk Hohndel, VMware's vice president and chief open source officerDirk Hohndel

After some exploration we concluded that joining forces with Pivotal was the best solution in terms of helping customers get what they were asking for. So we created the Pivotal Container Service. By running Kubernetes on top of Bosh, on top of vSphere, gives them the APIs a developer is looking for.

What do you mean by ‘getting IT out of the way?’

Hohndel: Developers looking at container-based applications often say: ‘I just want to create applications and not have to worry about infrastructure.’ What customers are looking for and what the app developers are looking for is to have an infrastructure that to them looks the same whether they deploy it on their laptops, in their testing environments, to a public cloud or a private cloud. They don’t want to worry about how this is implemented on the back end.

We use a ton of open source projects for every product we do [and] when we fix bugs we contribute those back to the community. But we must balance that with the creation of our own open source products.
Dirk Hohndelchief open source officer, VMware

We are trying to provide an environment where developers get to use the APIs in the environment they are comfortable with, and get the IT people to provide it in a way that fits into their architecture that is scalable and secure and deals with the complexities of networking.

What does Pivotal’s technology give to your users and developers they didn’t have before?

Hohndel: Pivotal’s involvement is very much around this integration through Bosh and Kubo and into IaaS. If you look at Cloud Foundry, there is a very opinionated, very tightly managed vision of a cloud-native architecture. [Pivotal Container Service] is a more flexible, broader environment that doesn’t give you the carefully selected environment with services. But it gives you the same solid underpinning as provided by Bosh and vSphere and the flexibility of Kubernetes on top.

That is fundamentally why Google is involved in this project. To them Kubernetes is the interface they promote as the way to orchestrate containers both on premises and through the Google Cloud Engine, and then into the public cloud.

How much open source code is contributed by outside developers to projects like Project Photon versus VMware’s internal development?

Hohndel: It depends on the project. Open vSwitch, for example, is driven by many different contributors besides VMware. We actually moved that project to the Linux Foundation and it’s no longer a VMware-hosted project.

We have other projects that are at the halfway point [in terms of outside versus internal contribution] and I think Clarity is a good example of that. We also open sourced a project last year that allows you to develop HTML 5.0 user interfaces based on JavaScript. That project has good momentum, and a good number of outside contributors, but it is still predominantly a VMware project.

We have other projects where the number of outside contributions is smaller because of the specificity of the project, or its complexity. We have a fun little tool created in-house for our own development purposes called Chap. It analyzes un-instrumented core files, either live cores or cores from crashes, for leaks and memory usage and corruption. Contributions from the outside have been limited because it’s a narrow, intensely complicated developer tool.

How many VMware programmers contribute code to the community? Do you plan to increase that number of programmers?

Hohndel: Good question, because it raises the question of how should a software company actually interact with the open source community. We use a ton of open source projects for every product we do in some way, shape or form. When we fix bugs we contribute those back to the community. But we must balance that with the creation of our own open source products. Many of them are internal tools like Clarity or Chap, or they are related to products like VIC [vSphere Integrated Containers].

We have a couple of teams in each business unit focused on these tools and components. I’m sure that adds up to a few dozen, maybe 100 engineers who are working on open source projects. Then, there are the other teams working specifically on upstream projects because they are part of our products: think OpenStack, think Kubernetes and the next [Linux] kernel or other broad projects being used like GTK+ in the UI space. This is an area where I am actively hiring people for our cloud-native business unit.

What is the level of acceptance for OpenStack among your users? Some weeks it feels like it’s dying, others it seems to have a future.

Hohndel: I keep trying to grasp why people think it is dying. There are certain market segments where OpenStack is thriving and doing well. Certainly telcos seem to have coalesced around the OpenStack APIs.

A lot of companies that explore enterprise production environments based on OpenStack discover there is a clear tradeoff between Capex and Opex. So you can get OpenStack as an open source project for free, or you can use one of the available OpenStack distributions and use your capital expenditure. It does come at a significant operational cost. It is a very complex piece of software — it is actually many pieces of interdependent software — so setting it up and getting it to run on day one and especially on day two is very complex.

What is the long-range trajectory of OpenStack?

Hohndel: I don’t think anyone has a clear answer yet. In the telco sector, it is absolutely what people are looking for. In the enterprise we see some users that are happy and some that are disillusioned. I think the jury is out. We’re on the Gartner Hype Curve with OpenStack where Gartner calls the falling edge towards the trough of disillusionment [laughs]. So there is the peak of irrational expectations followed by the trough of disillusionment. And then at the end of the cycle you have the plateau of productivity.

Are there any open source synergies as you go forward with the VMware-AWS deal? AWS is an open source shop internally, but they sell a lot of proprietary software.

Hohndel: The technology they use for their services is certainly open source like their database service or their search service, but the underlying technical infrastructure we interact with, that is fairly proprietary technology. To me in the vSphere on AWS environment, I don’t think open source really has been a key player. It becomes much more interesting if you look at the services that have been provided on top of that. There, we are certainly collaborating with them on some of the same projects.

Ed Scannell is a senior executive editor with TechTarget. Contact him at escannell@techtarget.com.

Modernizing the DOM tree in Microsoft Edge

The DOM is the foundation of the web platform programming model, and its design and performance impacts the rest of the browser pipeline. However, its history and evolution is far from a simple story.

Over the past three years, we’ve embarked on an in-flight refactoring of the DOM in Microsoft Edge, with our eye on a modernized architecture offering better real-world performance and reduced complexity. In this post, we’ll walk you through the history of the DOM in Internet Explorer and Microsoft Edge, and the impact of our recent work to modernize the DOM Tree, which is already resulting in substantially improved performance in the Windows 10 Creators Update.

A diagram of the web platform pipeline, with the DOM Tree and cooperating components (CSS Cascade, DOM API & Capabilities, and Chakra JavaScript) highlighted.

What we think of as “the DOM” is really the cooperation of several subsystems. In Microsoft Edge, this includes JS binding, events, editing, spellchecking, HTML attributes, CSSOM, text, and others, all working together. Of these subsystems, the DOM “tree” is at the center.

Several years ago, we began a long journey to update to a modern DOM “tree” (node connectivity structures). By modernizing the core tree, which we completed in Microsoft Edge 14, we landed a new baseline and the scaffolding to deliver on our promise of a fast and reliable DOM. With Windows 10 Creators Update and Microsoft Edge 15, the journey we started is beginning to bear fruit.

Circular tree map showing "DOM Tree" at the center, surrounded by "JS Binding," "Editing," "Spellcheck," "Events," and "Attributes."

“The DOM” is really the cooperation of several subsystems that make up the web programming model.

We’re just scratching the surface, but want to take this opportunity to geek out a bit, and share some of the internal details of this journey, starting with the DOM’s arcane history and showcasing some of our accomplishments along the way.

The history of the Internet Explorer DOM tree

When web developers today think of the DOM, they usually think of a tree that looks something like this:

A diagram of a simple tree

A simple tree

However nice and simple (and obvious) this seems, the reality of Internet Explorer’s DOM implementation was much more complicated.

Simply put, Internet Explorer’s DOM was designed for the web of the 90s. When the original data structures were designed, the web was primarily a document viewer (with a few animated GIFs and other images thrown in). As such, algorithms and data structures more closely resembled those you might see powering a document viewer like Microsoft Word. Recall in the early days of the web that there was no JavaScript to allow scripting a web page, so the DOM tree as we know it didn’t exist. Text was king, and the DOM’s internals were designed around fast, efficient text storage and manipulation. Content editing (WYSIWYG) was already a feature at the time, and the manipulation paradigm centered around the editing cursor for character insertion and limited formatting.

A text-centric design

As a result of its text-centric design, the principle structure of the DOM was the text backing store, a complex system of text arrays that could be efficiently split and joined with minimal or no memory allocations. The backing store represented both text and tags as a linear progression, addressable by a global index or Character Position (CP). Inserting text at a given CP was highly efficient and copy/pasting a range of text was centrally handled by an efficient “splice” operation. The figure below visually illustrates how a simple markup containing “hello world” was loaded into the text backing store, and how CPs were assigned for each character and tag.

Diagram of the text backing store, with special positional placeholders for non-text entities such as tags and the insertion point.

The text backing store, with special positional placeholders for non-text entities such as tags and the insertion point.

To store non-textual data (e.g. formatting and grouping information), another set of objects was separately maintained from the backing store: a doubly-linked list of tree positions (TreePos objects). TreePos objects were the semantic equivalent of tags in HTML source markup – each logical element was represented by a begin and end TreePos. This linear structure made it very fast to traverse the entire DOM “tree” in depth-first pre-order traversal (as required for nearly every DOM search API and CSS/Layout algorithm). Later, we extended the TreePos object to include two other kinds of “positions”: TreeDataPos (for indicating a placeholder for text) and PointerPos (for indicating things like the caret, range boundary points, and eventually for “new” features like generated content nodes).

Each TreePos object also included a CP object, which acted as the tag’s global ordinal index (useful for things like the legacy document.all API). CPs were used to get from a TreePos into the text backing store, easily compare node order, or even find the length of text by subtracting CP indices.

To tie it all together, a TreeNode bound pairs of tree positions together and established the “tree” hierarchy expected by the JavaScript DOM as illustrated below.

Diagram showing the dual representation of the DOM as both text and (possibly overlapping) nodes

The dual representation of the DOM as both text and (possibly overlapping) nodes

Adding layers of complexity

The foundation of CPs caused much of the complexity of the old DOM. For the whole system to work properly, CPs had to be up-to-date. Thus, CPs were updated after every DOM manipulation (e.g. entering text, copy/paste, DOM API manipulations, even clicking on the page—which set an insertion point in the DOM). Initially, DOM manipulations were driven primarily by the HTML parser, or by user actions, and the CPs-always-up-to-date model was perfectly rational. But with rise of JavaScript and DHTML, these operations became much more common and frequent.

To compensate, new structures were added to make these updates efficient, and the splay tree was born, adding an overlapping series of tree connections onto TreePos objects. The added complexity helped with performance—at first; global CP updates could be achieved with O(log n) speed. Yet, a splay tree is really only optimized for repeated local searches (e.g., for changes centered around one place in the DOM tree), and did not prove to be a consistent benefit for JavaScript and its more random-access patterns.

Another design phenomenon was that the previously-mentioned “splice” operations that handled copy/paste, were extended to handle all tree mutations. The core “splice engine” worked in three steps, as illustrated in the figure below.

Timeline diagram of the splice engine algorithm

The splice engine algorithm

In step 1, the engine would “record” the splice by traversing the tree positions from the start of the operation to the end. A splice record was then created containing command instructions for this action (a structure re-used in the browser’s Undo stack). In step 2, all nodes (i.e., TreeNode and TreePos objects) associated with the operation were deleted from the tree. Note that in the IE DOM tree, TreeNode/TreePos objects were distinct from the script-referenced Element objects to facilitate overlapping tags, so deleting them was not a functional problem. Finally, in step 3, the splice record was used to “replay” (re-create) new objects in the target location. For example, to accomplish an appendChild DOM operation, the splice engine created a range around the node (from the TreeNode‘s begin TreePos to its end), “spliced” the range out of the old location, and created new nodes to represent the node and its children in the new location. As you can imagine, this created a lot of memory allocation churn, in addition to the inefficiencies of the algorithm.

No encapsulation

These are just a few of the examples of the complexity of the Internet Explorer DOM. To add insult to injury, the old DOM had no encapsulation, so code from the Parser all the way to the Display systems had CP/TreePos dependencies, which required many dev-years to detangle.

With complexity comes errors, and the DOM code base was a reliability liability. According to an internal investigation, from IE7 to IE11, approximately 28% of all IE reliability bugs originated from code in core DOM components. This complexity also manifested as a tax on agility, as each new HTML5 feature became more expensive to implement as it became harder to retrofit concepts into the existing architecture.

Modernizing the DOM tree in Microsoft Edge

The launch of Project Spartan created the perfect opportunity modernize our DOM. Free from platform vestiges like docmodes and conditional comments, we began a massive refactoring effort. Our first, and most critical target: the DOM’s core tree.

We knew the old text-centric model was no longer relevant; we needed a DOM tree that actually was a tree internally in order to match the expectations of the modern DOM API. We needed to dismantle the layers of complexity that made it nearly impossible to performance-tune the tree and the other surrounding systems. And finally, we had a strong desire to encapsulate the new tree to avoid creating cross-component dependencies on core data structures. All of this effort would lead to a DOM tree with the right model in place, primed and ready for additional improvements to come.

To make the transition to the modern DOM as smooth as possible (and to avoid building a new DOM tree in isolation and attempting to drop and stabilize untested code at the end of the project—a.k.a. the very definition of “big bang integration”), we transitioned the existing codebase in-place in three phases. The first phase of the project defined our tree component boundary with corresponding APIs and contracts. We chose to design the APIs as a set of “reader” and “writer” functions that operated on nodes. Instead of APIs that look like this:

parent.appendChild(child);
element.nextSibling;

our APIs looked like this:

TreeWriter::AppendChild(parent, child);
 TreeReader::GetNextSibling(element);

This API design discourages callers from thinking about tree objects as actors with their own state. As a result, a tree object is only an identity in the API, allowing for more robust contracts and hiding representational details, which proved useful in phase 3.

The second phase migrated all code that depended on legacy tree internals to use the newly established component boundary APIs instead. During this migration, the implementation of the tree API would continue to be powered by the legacy structures. This work took the most time and was the least glamorous; it took several dev-years to detangle consumers of the old tree structures and properly encapsulate the tree. Staging the project this way let us release EdgeHTML 12 and 13 with our fully-tested incremental changes, without disrupting the shipping schedule.

In the third and final phase, with all external code using the new tree component boundary APIs, we began to refactor and replace the core data structures. We consolidated objects (e.g., the separate TreePos, TreeNode, and Element objects), removed the splay tree and splice engine, dropped the concept of PointerPos objects, and removed the text backing storage (to name a few). Finally, we could rid the code of CPs.

The new tree structure is simple and straightforward; it uses four pointers instead of the usual five to maintain connectivity: parent, first-child, next, and previous sibling (last-child is computed as the parent’s first-child’s previous sibling) and we could hide this last-child optimization behind our TreeReader APIs without changing a single caller. Re-arranging the tree is fast and efficient, and we even saw some improvements in CPU performance on public DOM APIs, which were nice side-effects of the refactoring work.

Diagram of Microsoft Edge’s new DOM tree structure, showing all four possible pointers

Microsoft Edge’s new DOM tree structure, showing all four possible pointers.

With the new DOM tree, reliability also improved significantly, dropping from 28% of all reliability issues to just around 10%, and at the same time providing secondary benefits of reducing time spent debugging and improving team agility.

The next steps in the journey

While this feels like the end of our journey, in fact it’s just the beginning. With our DOM tree APIs in place and powered by a simple tree, we turned our attention to the other subsystems that comprise the DOM, with an eye towards two classes of inefficiencies: inefficient implementations inside the subsystems, and inefficient communication between them.

Circular tree map showing "DOM Tree" at the center, surrounded by "JS Binding," "Editing," "Spellcheck," "Events," and "Attributes."

The DOM tree is at the center of many cooperating components that make up the web programming model.

For example, one of our top slow DOM APIs (even after the DOM tree work) has historically been querySelectorAll. This is a general-purpose search API, and uses the selectors engine to search the DOM for specific elements. Not surprisingly, many searches involve particular element attributes as search criteria (e.g., an element’s id, or one of its class identifiers). As soon as the search code entered the attributes subsystem, it ran into a whole new class of inefficiencies, completely unrelated to those addressed by the new DOM tree.

For the attributes subsystem, we are simplifying the storage mechanism for element content attributes. In the early days of the web, DOM attributes were primarily directives to the browser about how to display a piece of markup. A great example of this is the colspan attribute:

<tr>
     <td colspan="2">Total:</td>
     <td>$12.34</td>
 </tr>

colspan has semantic meaning to the browser and thus has to be parsed. Given that pages weren’t very dynamic back then and attributes were generally treated like enums, IE created an attribute system that was optimized around eager parsing for use in formatting and layout.

Today’s app patterns, however, heavily use attributes like id, class, and data-*, which are treated less like browser directives and more like generic storage:

<li id="cart" data-customerid="a8d3f916577aeec" data-market="en-us">
     <b>Total:</b>
     <span class="total">$12.34</span>
 </li>

Thus, we’re deferring most work beyond the bare minimum necessary to store the string. Additionally, since UI frameworks often encourage repeated CSS classes across elements, we plan to atomize strings to reduce memory usage and improve performance in APIs like querySelector.

Though we still have plenty of work planned, with Windows 10 Creators Update, we’re happy to share that we’ve made significant progress!

Show me the money

Reliably measuring and improving performance is hard and the pitfalls of benchmarking are well documented. To get the most holistic view of browser performance possible, the Microsoft Edge team uses a combination of user telemetry, controlled measurement real-world scenarios, and synthetic benchmarks to guide our optimizations.

Venn Diagram with three labelled circles: "User telemetry," "Synthetic benchmarks," and "Performance Lab"

User telemetry “paints with a broad brush”, but by definition measures the most impactful work. Below is an example of our build-over-build tracking of the firstChild API across our user base. This data isn’t directly actionable, since it doesn’t provide all the details of the API call (i.e. what shape and size is the DOM tree) needed for performance tuning, but it’s the only direct measurement of the user’s experience and can provide feedback for planning and retrospectives.

Screen capture showing Build-to-build user performance telemetry for firstChild

Build-to-build user performance telemetry for firstChild (lower is better)

We highlighted our Performance lab and the nitty-gritty details of measuring browser performance a while ago, and while the tests themselves and the hardware in the lab has changed since then, the methodology is still relevant. By capturing and replaying real-world user scenarios in complex sites and apps like Bing Maps and Office 365, we’re less likely to overinvest in narrowly applicable optimizations that don’t benefit users. This graph is an example of our reports for a simulated user on Bing Maps. Each data point is a build of the browser, and hovering provides details about the statistical distribution of measurements and links to more information for investigating changes.

Screen capture showing build to build telemetry from the Performance Lab

A graph of build-over-build performance of a simulated user on Bing Maps in the Performance Lab

Our Performance lab’s fundamental responsibility is to provide the repeatability necessary to test and evaluate code changes and implementation options. That repeatability also serves as the platform for synthetic benchmarks.

In the benchmark category, our most exciting improvement is in Speedometer. Speedometer simulates using the TodoMVC app for several popular web frameworks including Ember, Backbone, jQuery, Angular, and React. With the new DOM tree in place and other improvements across other browser subsystems like the Chakra JavaScript engine, the time to run the Speedometer benchmark decreased by 30%; in the Creators update, our performance focus netted another improvement of 35% (note that Speedometer’s scores are a measure of speed and thus an inverse function of time).

Chart showing Microsoft Edge scores on the Speedometer benchmark over the past four release. Edge 12: 5.44. Edge 13: 37.83. Edge 14: 53.92. Edge 15: 82.67.

Release-over-release performance on the Speedometer benchmark (higher is better)

Of course the most important performance metric is the user’s perception, so while totally unscientific, we’ve been super excited to see others notice our work!

We’re not done yet, and we know that Microsoft Edge is not yet the fastest on the Speedometer benchmark. Our score will continue to improve as a side effect of our performance work and we’ll keep the dev community updated on our progress.

Conclusion

A fast DOM is critical for today’s web apps and experiences. Windows 10 Creators Update is the first of a series of releases focused on performance on top of a re-architected DOM tree. At the same time, we’ll continue to improve our performance telemetry and community resources like the CSS usage and API catalog.

We’re just beginning to scratch the surface of what’s possible with our new DOM tree, and there’s still a long journey ahead, but we’re excited to see where it leads and to share it with you! Thanks!

Travis Leithead & Matt Kotsenas, Program Managers, Microsoft Edge