CAMBRIDGE, Mass. — The U.S. government is adept at collecting massive amounts of data. Efforts to deploy data analytics in government agencies, however, can be weak and disorganized.
At some agencies, officials say there’s a lack of a cohesive system for government analytics and management.
“I recently learned that we have no real concept of data archiving, and data backup and protection,” said Bobby Saxon, CTO at the Centers for Medicare & Medicaid Services (CMS).
“We have archived everything in every place,” Saxon said. “It’s really just wasted data right now.”
Data analytics struggles
Speaking on a panel about data analytics in government at the annual MIT Chief Data Officer and Information Quality (CDOIQ) Symposium at the university’s Tang Center, Saxon spoke on the struggles his agency has with analytics.
While Saxon said he and his colleagues are working to improve the situation, currently the organization tends to rely on outside vendors to deal with difficult and pressing analytics problems.
“In the world of predictive analytics, typically the average vendor or subject expert will ask what are your questions, and go off and try to solve questions for you, and then ask if you have any more questions,” Saxon said.
Outside help costly
Ultimately, while government analytics problems tend to be fixed to some extent, the IT corrections solutions can take weeks, and often simply are too expensive in the long term, Saxon explained.
Bobby SaxonCTO, Centers for Medicare & Medicaid Services
In addition, employees aren’t learning additional data analytics in government techniques, and can’t immerse themselves in the problems at hand and actually be able to discover the root issues of what might be going wrong.
Panel moderator Mark Krzysko of the Department of Defense’s Office of the Under Secretary of Defense for Acquisition, Technology and Logistics, noted a similar problem in his agency.
Krzysko said he had honed a personal strategy in his early years with the agency: “Use the tools they’ve given you.”
When a data dilemma arose, often he might see employees making calls to the Air Force or the Army for answers, instead of relying on their own government analytics tools, he said.
The panel, “Data Analytics to Solve Government Problems,” was part of the 12th Annual MIT CDOIQ Symposium, held July 18 to 20.
CAMBRIDGE, Mass. — In the age of big data, the opportunities to change organizations by using data are many. For a newly minted chief data officer, the opportunities may actually be too vast, making focus the most essential element in the role of CDO.
“It’s about scope,” said Charles Thomas, chief data and analytics officer at General Motors. “You struggle if you do too many things.”
As chief data officer at auto giant GM, Thomas is focusing on opportunities to repackage and monetize data. He called it “whale hunting,” meaning he is looking for the biggest opportunities.
Thomas spoke as part of a panel on the role of CDO this week at the MIT Chief Data Officer and Information Quality Symposium.
At GM, he said, the emphasis is on taking the trove of vehicle data available from today’s highly digitized, instrumented and connected cars. Thomas said he sees monetary opportunities in which GM can “anonymize data and sell it.”
“Companies generate more data than they use, so someone has to approach it from an innovative perspective — not just for internal innovation, but also to be externally driving new income,” he said. “Someone has to [be] accountable for that. It has to be their only job.”
“A lot of industries are interested in how people move around cities. It’s an opportunity to sell [data] to B2B clients,” Thomas added.
Focus is also important in Christina Clark’s view of the role of CDO. But nurturing data capabilities across the organization is the initial prime area for attention, said Clark, who is CDO at industrial conglomerate General Electric’s GE Power subsidiary and was also on hand as an MIT symposium panelist.
Every company should get good at aggregating, analyzing and monetizing data, Clark said.
“You then look at where you want to focus,” she said. The role of CDO, she added, is likely to evolve according to the data maturity of any given organization.
Focusing on data areas in which an organization needs rounding out was also important to symposium panelist Jeff McMillan, chief analytics and data officer at Morgan Stanley’s wealth management unit, based in New York.
It’s about the analytics
“Organizations say, ‘We need a CDO,’ and then bring them in, but they don’t provide the resources they need to be successful,” he said. “A lot of people define the CDO role before they define the problem.”
It’s unwise to suggest a CDO can fix all the data problems of an organization, McMillan said. The way to succeed with data is to drive an understanding of data’s value as deeply into the organization as possible.
“That is really hard, by the way,” he added. At Morgan Stanley, McMillan said, his focus in the role of chief data officer has been around enabling wider use of analytics in advising clients on portfolio moves.
All things data and CDO
Since widely materializing in the aftermath of the 2008 financial crisis, the role of CDO has been seen largely as seeking consensus.
Compliance and regulation tasks have often blended in a broad job description that has come to include big data innovation initiatives. But individual executives’ refinements to chief data officer approaches may be the next step for the role of CDO, longtime industry observer and Babson College business professor Tom Davenport said in an interview.
“Having someone responsible for all things data is not a workable task. So, you really need to focus,” Davenport said. “If you want to focus on monetization, that’s fine. If you want to focus on internal enablement or analytics, that’s fine.”
The advice to the would-be CDO is not unlike that for most any other position. “What you do must be focused; you can’t be all things to all people,” Davenport said.
Industry experts said AWS has no need to build and sell a white box data center switch as reported last week but could help customers by developing a dedicated appliance for connecting a private data center with the public cloud provider.
The Information reported last Friday AWS was considering whether to design open switches for an AWS-centric hybrid cloud. The AWS switch would compete directly with Arista, Cisco and Juniper Networks and could be available within 18 months if AWS went through with the project. AWS has declined comment.
Industry observers said this week the report could be half right. AWS customers could use hardware dedicated to establishing a network connection to the service provider, but that device is unlikely to be an AWS switch.
“A white box switch in and of itself doesn’t help move workloads to the cloud, and AWS, as you know, is in the cloud business,” said Brad Casemore, an analyst at IDC.
What AWS customers could use isn’t an AWS switch, but hardware designed to connect a private cloud to the infrastructure-as-a-service provider, experts said. Currently, AWS’ software-based Direct Connect service for the corporate data center is “a little kludgy today and could use a little bit of work,” said an industry executive who requested his name not be used because he works with AWS.
“It’s such a fragile and crappy part of the Amazon cloud experience,” he said. “The Direct Connect appliance is a badly needed part of their portfolio.”
AWS could also use a device that provides a dedicated connection to a company’s remote office or campus network, said John Fruehe, an independent analyst. “It would speed up application [service] delivery greatly.”
Indeed, Microsoft recently introduced the Azure Virtual WAN service, which connects the Azure cloud with software-defined WAN systems that serve remote offices and campuses. The systems manage traffic through multiple network links, including broadband, MPLS and LTE.
Connectors to AWS, Google, Microsoft clouds
For the last couple of years, AWS and its rivals Google and Microsoft have been working with partners on technology to ease the difficulty of connecting to their respective services.
In October 2016, AWS and VMware launched an alliance to develop the VMware Cloud on AWS. The platform would essentially duplicate on AWS a private cloud built with VMware software. As a result, customers of the vendors could use a single set of tools to manage and move workloads between both environments.
A year later, Google announced it had partnered with Cisco to connect Kubernetes containers running on Google Cloud with Cisco’s hyper-converged infrastructure, called HyperFlex. Cisco would also provide management tools and security for the hybrid cloud system.
Microsoft, on the other hand, offers a hybrid cloud platform called the Azure Stack. The software runs on third-party hardware and shares its code, APIs and management portal with Microsoft’s Azure public cloud to create a common cloud-computing platform. Microsoft hardware partners for Azure Stack include Cisco, Dell EMC and Hewlett Packard Enterprise.
Microsoft and National Geographic are teaming up to support data scientists who are tackling the “world’s biggest challenges.” The two companies today announced the AI for Earth Innovation Grant program, a $1 million grant that’ll provide recipients financial assistance, access to AI tools and cloud services, and more to advance conservation research.
The grant program, which is accepting applications until October 8, will support between five and 15 projects in five core areas: agriculture, biodiversity, conservation, climate change, and water. In addition to funding, researchers will gain access to Microsoft’s AI platform and development tools, inclusion in the National Geographic Explorer community, and affiliation with National Geographic Labs, National Geographic’s research incubation and accelerator initiative.
“[I]n Microsoft, we found a partner that is well-positioned to accelerate the pace of scientific research and new solutions to protect our natural world,” Jonathan Baillie, chief scientist and executive vice president at the National Geographic Society, said in a statement. “With today’s announcement, we will enable outstanding explorers seeking solutions for a sustainable future with the cloud and AI technologies that can quickly improve the speed, scope, and scale of their work, as well as support National Geographic Labs’ activities around technology and innovation for a planet in balance.”
The aim is to make trained algorithms broadly available to the global community of environmental researchers, Lucas Joppa, Microsoft’s chief environmental scientist, said in a press release.
“Microsoft is constantly exploring the boundaries of what technology can do, and what it can do for people and the world,” Joppa said. “We believe that humans and computers, working together through AI, can change the way that society monitors, models, and manages Earth’s natural systems. We believe this because we’ve seen it — we’re constantly amazed by the advances our AI for Earth collaborators have made over the past months. Scaling this through National Geographic’s … network will create a whole new generation of explorers who use AI to create a more sustainable future for the planet and everyone on it.”
Selected recipients will be announced in December.
The AI for Earth Innovation Grant is an expansion of Microsoft’s AI for Earth program, announced in June 2017. In December, the Redmond company committed $50 million to an “extended strategic plan” that includes providing advanced training to universities and NGOs and the formation of a “multi-disciplinary” team of AI and sustainability experts.
Microsoft claims that in the past two years, the AI for Earth program has awarded more than 35 grants globally for access to its Azure platform and AI technologies.
Coinciding with its decision to eventually close its data center and migrate most of its workloads to the public cloud, the University of Notre Dame’s IT team switched to cloud-native data protection.
Notre Dame, based in Indiana, began its push to move its business-critical applications and workloads to Amazon Web Services (AWS) in 2014. Soon after, the university chose N2WS Cloud Protection Manager to handle backup and recovery.
Now, 80% of the applications used daily by faculty members and students, as well as the data associated with those services, lives on the cloud. The university protects more than 600 AWS instances, and that number is growing fast.
In a recent webinar, Notre Dame systems engineer Aaron Wright talked about the journey of moving a whopping 828 applications to the cloud, and protecting those apps and their data.
Wright said Notre Dame’s main impetus for migrating to the cloud was to lower costs. Moving services to the cloud would reduce the need for hardware. Wright said the goal is to eventually close the university’s on-premises primary data center.
“We basically put our website from on premises to the AWS account and transferred the data, saw how it worked, what we could do. … As we started to see the capabilities and cost savings [of the cloud], we were wondering what we could do to put not just our ‘www’ services on the cloud,” he said.
Wright said Notre Dame plans to move 90% of its applications to the cloud by the end of 2018. “The data center is going down as we speak,” he said.
Aaron Wrightsystems engineer, Notre Dame
As a research organization that works on projects with U.S. government agencies, Notre Dame owns sensitive data. Wright saw the need for a centralized backup software to protect that data, and found N2WS Cloud Protection Manager through AWS Marketplace. Wright could not find many good commercial options for protecting that cloud data.
“We looked at what it would cost us to build our own backup software and estimated it would cost 4,000 hours between two engineers,” he said. By comparison, Wright said his team deployed Cloud Protection Manger in less than an hour.
Wright said N2WS Cloud Protection Manager rescued Notre Dame’s data at least twice since the installation. One came after Linux machines failed to boot after application of a patch, and engineers restored data from snapshots within five minutes. Wright said his team used the snapshots to find and detach a corrupted Amazon Elastic Block Store volume, and then manually created and attached a new volume.
In another incident, Wright said the granularity of the N2WS Cloud Protection Manager backup capabilities proved valuable.
“Back in April-May 2018, we had to do a single-file restore through Cloud Protection Manager. Normally, we would have to have taken the volume and recreated a 300-gig volume,” he said. Locating and restoring that single file so quickly allowed him to resolve the incident within five minutes.
Big data continues to be a force for change. It plays a part in the ongoing drama of corporate innovation — in some measure, giving birth to the chief data officer role. But consensus on that role is far from set.
The 2018 Big Data Executive Survey of decision-makers at more than 50 blue-chip firms found 63.4% of respondents had a chief data officer (CDO). That is a big uptick since survey participants were asked the same question in 2012, when only 12% had a CDO. But this year’s survey, which was undertaken by business management consulting firm NewVantage Partners, disclosed that the background for a successful CDO varies from organization to organization, according to Randy Bean, CEO and founder of NewVantage, based in Boston.
For many, the CDO is likely to be an external change agent. For almost as many, the CDO may be a long-trusted company hand. The best CDO background could be that of a data scientist, line executive or, for that matter, a technology executive, according to Bean.
In a Q&A, Bean delved into the chief data role as he was preparing to lead a session on the topic at the annual MIT Chief Data Officer and Information Quality Symposium in Cambridge, Mass. A takeaway: Whatever it may be called, the chief data officer role is central to many attempts to gain business advantage from key emerging technologies.
Do we have a consensus on the chief data officer role? What have been the drivers?
Randy Bean: One principal driver in the emergence of the chief data officer role has been the growth of data.
For about a decade now, we have been into what has been characterized as the era of big data. Data continues to proliferate. But enterprises typically haven’t been organized around managing data as a business asset.
Additionally, there has been a greater threat posed to traditional incumbent organizations from agile data-driven competitors — the Amazons, the Googles, the Facebooks.
Organizations need to come to terms with how they think about data and, from an organization perspective, to try to come up with an organizational structure and decide who would be a point person for data-related initiatives. That could be the chief data officer.
Another driver for the chief data officer role, you’ve noted, was the financial crisis of 2008.
Bean: Yes, the failures of the financial markets in 2008-2009, to a significant degree, were a data issue. Organizations couldn’t trace the lineage of the various financial products and services they offered. Out of that came an acute level of regulatory pressure to understand data in the context of systemic risk.
Banks were under pressure to identify a single person to regulators to address questions about data’s lineage and quality. As a result, banks took the lead in naming chief data officers. Now, we are into a third or fourth generation in some of these large banks in terms of how they view the mandate of that role.
Isn’t that type of regulatory driver somewhat spurred by the General Data Protection Regulation (GDPR), which recently went into effect? Also, for factors defining the CDO role, NewVantage Partners’ survey highlights concerns organizations have about being surpassed by younger, data-driven upstarts. What is going on there?
Bean: GDPR is just the latest of many previous manifestations of this. There have been the Dodd-Frank regulations, the various Basel reporting requirements and all the additional regulatory requirements that go along with classifying banks as ‘too large to fail.’
That is a defensive driver, as opposed to the offensive and innovation drivers that are behind the chief data officer role. On the offensive side, the chief data officer is about how your organization can be more data-driven, how you can change its culture and innovate. Still, as our recent survey finds, there is defensive aspect, even there. Increasingly, organizations perceive threat coming from all kinds of agile, data-driven competitors.
Randy BeanCEO and founder, NewVantage
You have written that big data and AI are on a continuum. That may be worthwhile to emphasize, as so much attention turns to artificial intelligence these days.
AI has been around for decades. One of the reasons why it hasn’t gained traction is, in its aspects as a learning mechanism, it requires large volumes of data. In the past, data was only available in subsets or samples or in very limited quantities, and the corresponding learning on the part of the AI was slow and constrained.
Now, with the massive proliferation of data and new sources — in addition to transactional information, you also now have sensor data, locational data, pictures, images and so on — that has led to the breakthrough in AI in recent years. Big data provides the data that is needed to train the AI learning algorithms.
So, it is pretty safe to say there is no meaningful artificial intelligence without good data — without an ample supply of big data.
And it seems to some of us, on this continuum, you still need human judgment.
Bean: I am a huge believer in the human element. Data can help provide a foundation for informed decision-making, but ultimately it’s the combination of human experience, human judgment and the data. If you don’t have good data, that can hamper your ability to come to the right conclusion. Just having the data doesn’t lead you to the answer.
One thing I’d say is, just because there are massive amounts of data, it hasn’t made individuals or companies any wiser in and of itself. It’s just one element that can be useful in decision-making, but you definitely need human judgment in that equation, as well.
Rackspace’s latest service welcomes users’ legacy gear into Rackspace data centers and once in place, gives the vendor a golden opportunity to sell these customers additional services.
The Rackspace Colocation program primarily targets midsize and larger IT shops that want to launch their first cloud initiative, or sidestep the rising costs to operate their own internal data centers. Many of these IT shops have just begun to grapple with the realities of their first digital transformation projects. They must choose where to position key applications from private clouds to microservices that run on Azure and Google Cloud.
Some Rackspace users run applications on customized hardware and operating systems that are not supported by public clouds, while others have heavily invested in hardware and want to hold onto those systems for another five years to get the full value out of those systems, said Henry Tran, general manager of Rackspace’s managed hosting and colocation business.
Customers that move existing servers into Rackspace’s data centers gain better system performance from closer proximity to Rackspace’s infrastructure. This gives Rackspace a chance to upsell those customers add-on interconnectivity and other higher-margin services.
“[The Rackspace Colocation services program] is a way to get you in the door by handling all the mundane stuff, but longer term they are trying to get you to migrate to their cloud,” said Cassandra Mooshian, senior analyst at Technology Business Research Inc. in Hampton, N.H.
Green light for greenfield colocation services
There are still many enterprise workloads that run in corporate data centers, so there are a lot of greenfield opportunities to pursue in colocation services. Roughly 60% of enterprises don’t use colos today, and the colocation market should grow around 8% annually through 2021, said Dan Thompson, a senior analyst at 451 Research. “There is still a lot of headroom for companies to migrate to colocation and/or cloud,” he said.
Dan Thompsonanalyst, 451 Research
Other colocation service providers have expanded with various higher-margin cloud and other managed services, but Rackspace has chosen a different path.
“They’ve had hosting and cloud services for a while but are now moving in the direction of colocation,” 451 Research’s Thompson said. “This speaks loudly to the multi-cloud and hybrid cloud world we are living in.”
Rackspace’s acquisition of Datapipe in late 2017 initiated its march into colocation, with the ability to offer capabilities and services to Datapipe customers through Microsoft’s Azure Stack, VMware’s Cloud on AWS and managed services on Google’s Cloud platform. In return, Rackspace gained access to Datapipe’s colocation services and data centers to gain a market presence in the U.S. West Coast, Brazil, China and Russia.
Rackspace itself was acquired in late 2016 by private equity firm Apollo Global Management LLC, which gave the company some financial backing and freedom to expand its business.
Companies are dramatically changing the architectures of their private data centers in preparation for eventually running more business applications across multiple cloud providers.
The transformational changes underway include higher network speeds, more server diversity and an increase in software-defined storage, an IHS Markit survey of 151 North American enterprises found. The strategy behind the revamping is to make the data center “a first-class citizen as enterprises build their multi-clouds.”
Companies are increasing network speeds to meet an expected rise in data flowing through enterprise data centers. A total of 68% of companies are increasing the capacity of the network fabric while 62% are buying technology to automate the movement of virtual machines and support network virtualization protocols, London-based IHS reported. The three trends are consistent with the building of cloud-based data center architectures.
IHS also found 49% of the survey respondents planned to increase spending on switches and routers — virtual and physical — to keep up with traffic flow. The top five Ethernet switch vendors were Cisco, Dell, Hewlett Packard Enterprise, Juniper and Huawei.
Companies turning to containers in new data center architectures
Companies are also increasing the use of containers to run applications in cloud computing environments. The shift is affecting the use of hypervisors, which are the platforms for running virtual machines in data centers today. IHS expects the number of servers running hypervisors to fall from 37% to 30% by 2019.
“That’s a transition away from server virtualization potentially toward software containers,” IHS analyst Clifford Grossner said. “End users are looking to use more container-[based] software.”
IHS found 77% of companies planned to increase spending on servers with the number of physical devices expected to double on average. Enterprises plan to run hypervisors or containers on 73% of their servers by 2019, up from 70% today.
“We’re seeing that progression where more and more servers are multi-tenant — that is running multiple applications,” Grossner said. “We’re seeing the density being packed tighter on servers.”
High density and multi-tenancy on servers are also attributes of cloud-focused data center architectures.
The rise of the one-socket server
Whenever possible, companies are buying one-socket servers to lower capital expenditures. IHS expects the cheaper hardware to account for 9% of corporate servers by 2019 from 3% today.
“The one-socket server market is offering more powerful options that are able to satisfy the needs of more workloads at a better price point,” Grossner said.
Finally, IHS found an upswing in storage spending. Fully, 53% of companies planned to spend more on software-defined storage, 52% on network-attached storage and 42% on solid-state drives.
As enterprises rearchitect their data centers, they are also spending more on public cloud services and infrastructure. IDC expects spending on the latter to reach $160 billion this year, an increase of more than 23% over last year. By 2021, spending will reach $277 billion, representing an annual increase of nearly 22%.
As the amount of new data created is set to hit the multiple-zettabyte level in the coming years, where will we store it all?
With the release of LTO-8 and recent reports that total tape storage capacity continues to increase dramatically, tape is a strong option for long-term retention. But even tape advocates say it’s going to take a comprehensive approach to storage that includes other forms of media to handle the data influx.
Tape making a comeback?
The annual tape media shipment report released earlier this year by the LTO Program showed that 108,000 petabytes (PB) of compressed tape storage capacity shipped in 2017, an increase of 12.9% over 2016. The total marks a fivefold increase over the capacity of just over 20,000 PB shipped in 2008.
LTO-8, which launched in late 2017, provides 30 TB compressed capacity and 12 TB native, doubling the capacities of LTO-7, which came out in 2015. The 12 TB of uncompressed capacity is equivalent to 8,000 movies, 2,880,000 songs or 7,140,000 photos, according to vendor Spectra Logic.
“We hope now [with] LTO-8 another increase in capacity [next year],” said Laura Loredo, worldwide marketing product manager at Hewlett Packard Enterprise, one of the LTO Program’s Technology Provider Companies along with IBM and Quantum.
The media, entertainment and science industries have been traditionally strong users of tape for long-term retention. Loredo pointed to more recent uses that have gained traction. Video surveillance is getting digitized more often and kept for longer, and there is more of it in general. The medical industry is a similar story, as records get digitized and kept for long periods of time.
The ability to create digital content at high volumes is becoming less expensive, and with higher resolutions, those capacities are increasing, Quantum product and solution marketing manager Kieran Maloney said. So tape becomes a cost-efficient play for retaining that data.
Tape also brings security benefits. Because it is naturally isolated from a network, tape provides a true “air gap” for protection against ransomware, said Carlos Sandoval Castro, LTO marketing representative at IBM. If ransomware is in a system, it can’t touch a tape that’s not connected, making tapes an avenue for disaster recovery in the event of a successful attack.
“We are seeing customers come back to tape,” Loredo said.
Tape sees clear runway ahead
“There’s a lot of runway ahead for tape … much more so than either flash or disk,” said analyst Jon Toigo, managing partner at Toigo Partners International and chairman of the Data Management Institute.
Even public cloud providers such as Microsoft Azure are big consumers of tape, Toigo said. Those cloud providers can use the large tape storage capacity for their data backup.
Jon Toigochairman, Data Management Institute
However, with IDC forecasting dozens of zettabytes in need of storage by 2025, flash and disk will remain important. One zettabyte is equal to approximately 1 billion TBs.
“You’re going to need all of the above,” Toigo said. “Tape is an absolute requirement for storing the massive amounts of data coming down the pike.”
It’s not necessarily about flash versus tape or other comparisons, it’s about how best to use flash, disk, tape and the cloud, said Rich Gadomski, vice president of marketing at Fujifilm and a member of the Tape Storage Council.
The cloud, for example, is helpful for certain aspects, such as offsite storage, but it shouldn’t be the medium for everything.
“A multifaceted data protection approach continues to thrive,” Gadomski said.
There’s still a lot of education needed around tape, vendors said. So often the conversation pits technologies against each other, Maloney said, but instead the question should be “Which technology works best for which use?” In the end, tape can fit into a tiered storage model that also includes flash, disk and the cloud.
In a similar way, the Tape Storage Council’s annual “State of the Tape Industry” report, released in March, acknowledged that organizations are often best served by using multiple media for storage.
“Tape shares the data center storage hierarchy with SSDs and HDDs and the ideal storage solution optimizes the strengths of each,” the report said. “However, the role tape serves in today’s modern data centers is quickly expanding into new markets because compelling technological advancements have made tape the most economical, highest capacity and most reliable storage medium available.”
LTO-8 uses tunnel magnetoresistance (TMR) for tape heads, a switch from the previous giant magnetoresistance (GMR). TMR provides a more defined electrical signal than GMR, allowing bits to be written to smaller areas of LTO media. LTO-8 also uses barium ferrite instead of metal particles for tape storage capacity improvement. With the inclusion of TMR technology and barium ferrite, LTO-8 is only backward compatible to one generation. Historically, LTO had been able to read back two generations and write back to one generation.
“Tape continues to evolve — the technology certainly isn’t standing still,” Gadomski said.
Tape also has a clearly defined roadmap, with LTO projected out to the 12th generation. Each successive generation after LTO-8 projects double the capacity of the previous version. As a result, LTO-12 would offer 480 TB compressed tape storage capacity and 192 TB native. It typically takes between two and three years for a new LTO generation to launch.
In addition, IBM and Sony have said they developed technology for the highest recording areal density for tape storage media, resulting in approximately 330 TB uncompressed per cartridge.
On the lookout for advances in storage
Spectra Logic, in its “Digital Data Storage Outlook 2018” report released in June, said it projects much of the future zettabytes of data will “never be stored or will be retained for only a brief time.”
“Spectra’s projections show a small likelihood of a constrained supply of storage to meet the needs of the digital universe through 2026,” the report said. “Expected advances in storage technologies, however, need to occur during this timeframe. Lack of advances in a particular technology, such as magnetic disk, will necessitate greater use of other storage mediums such as flash and tape.”
While the report claims the use of tape for secondary storage has declined with backup moving to disk, the need for tape storage capacity in the long-term archive market is growing.
“Tape technology is well-suited for this space as it provides the benefits of low environmental footprint on both floor space and power; a high level of data integrity over a long period of time; and a much lower cost per gigabyte of storage than all other storage mediums,” the report said.
Imanis Data’s latest upgrade to its data management platform for big data applications includes improved ransomware protection, disaster recovery for Hadoop and DR testing.
Imanis manages and protects distributed databases such as Cassandra, Cloudera, Couchbase, Hortonworks and MongoDB, and cloud-native data for IoT and software-as-a-service applications. The startup, which changed its name from Talena in 2017, leaves backup and recovery of relationship databases to the data protection stalwarts such as Veritas, Dell EMC, IBM, Veeam and Commvault. It directly competes with smaller companies, mainly Datos IO, which Rubrik acquired in February 2018.
Imanis Data 3.3 — launched in June — strengthened the anomaly detection of the company’s ThreatSense anti-ransomware software, according to chief marketing officer Peter Smails. Smails said ThreatSense builds a baseline of normal data and periodically checks against it in order to find instances of ransomware encryption or mass deletion, whether malicious or accidental. In 3.3, the software has double the metrics from its previous build, allowing for higher granularity in the anomaly detection and lower false positive rates.
Also new in 3.3 is automated data recovery support for Hadoop users.
“We’re basically replicating data from one Hadoop cluster to another Hadoop cluster, at a different data center, and we can do it as aggressively as every 15 minutes,” said Jay Desai, Imanis’ vice president of products.
Smails said Imanis customers often treat data recovery testing as an afterthought. Imanis Data 3.3’s Recovery Sandbox feature can help fix that by allowing administrators to quickly and easily test if a backup is restorable, without disrupting the primary work environment.
Imanis Data 3.3 also brings a user interface refresh to the platform’s existing point-in-time recovery. Whimsically dubbed the “time-travel widget,” this tool allows users to simply search and navigate to system restore points.
Imanis Data received a $13.5 million Series B funding round earlier this year. Although founder and COO Nitin Donde at the time said there would only be a “modest investment” in research and development, the company released 3.3 just four months later.
Since the funding, there’s been a concerted effort to brand Imanis Data as a company focused on machine learning-based data management. Smails describes Imanis Data’s platform as data-aware, uniquely structured around machine learning and built for scale.
That data awareness is a key factor in what makes Imanis Data’s ransomware detection so powerful. George Crump, founder and president of IT analyst firm Storage Switzerland, describes how ransomware has become more insidious.
“In many cases, it lays dormant, so it’s actually backed up a few times,” Crump said. “Then, it encrypts slowly, like only 500 or 1,000 files a day. And that becomes very hard for a human to detect if that’s out of the ordinary. If you can, through machine learning, detect a much more ‘finer grain’ attack, you can pick it up the first time it happens.”