Tag Archives: rush

Use eseutil for Exchange database repair with care

by making a hasty attempt to get the platform back online without careful planning.

Don’t rush in and start immediate repairs with the command-line tool eseutil. While eseutil is a powerful tool for Exchange database repair work, use it wrong and it can make matters worse. Admins must understand the different functions of eseutil and when their use is appropriate.

Not every problem requires eseutil

Admins can use eseutil for several significantly different Exchange database repair procedures: to defrag a database, to recover damaged page files or to perform a roll-forward recovery of a database. The roll-forward option restores the backed-up data, then runs the transaction logs to recover the cached data.

But the best time to run eseutil depends on the circumstances. Use it in repair mode solely as a last resort when several things go wrong in the environment. If you can’t mount the database and restore backups, it might be your only option.

When database and streaming files don’t match

A failure can cause Exchange to dismount a database. This often happens when the streaming file (STM) and the database file (EDB) are not synchronized. When eseutil starts a database repair, it first checks that the STM file is in sync with the EDB file. If eseutil finds those two files do not match, it will error out.

By forcing eseutil to run an Exchange database repair despite this condition, the admin might lose all data held in the streaming file. The following command ignores the mismatch error and runs the repair:

eseutil /P .edb /I

This command has consequences. The STM file primarily holds user data from Post Office Protocol 3 (POP3) and Internet Message Access Protocol (IMAP) clients, so if all clients run Outlook, ignore an STM file mismatch. Conversely, if a large number of clients connect to Exchange servers with POP3 or IMAP, then forcing a repair though an STM file mismatch usually results in data loss.

Restore and roll forward

If the Exchange databases have proper backups, restore and roll forward a database rather than attempt a repair.

This process takes less time and comes with lower risk of data loss. By comparison, even a lightly corrupted database takes around an hour to repair each 5 GB of the database. With the size of databases in most production environments, that’s a significant time investment.

To perform a restore and roll forward, an admin needs two things: a good recent backup of the database and all the transaction logs created after that backup. If both conditions are met, run this command to restore the database and roll it forward:
eseutil /CC

Complete these Exchange database repair steps

Once eseutil completes the repair mode, there are still three tasks to execute before the admin can mount the database.

  1. Run eseutil /D (defrag) against the database.
  2. Run isinteg –fix, which uses another Exchange utility to check the integrity of the newly repaired and defragmented database.
  3. Back up the database.

Management will want the Exchange database mounted and operational as soon as possible, but admins shouldn’t skip these steps. While it’s possible to mount the database after the eseutil repair finishes, the database is not stable until you complete the first two steps, and it’s not safe until the backup is done.

Big data systems up ante on data quality measures for users

NEW YORK — In the rush to capitalize on deployments of big data platforms, organizations shouldn’t neglect data quality measures needed to ensure the information used in analytics applications is clean and trustworthy, experienced IT managers said at the 2017 Strata Data Conference here last week.

Several speakers pointed to data quality as a big challenge in their big data environments — one that required new processes and tools to help their teams get a handle on quality issues, as both the volumes of data being fed into corporate data lakes and use of the info by data scientists and other analysts grow.

“The more of the data you produce is used, the more important it becomes, and the more important data quality becomes,” said Michelle Ufford, manager of core innovation for data engineering and analytics at Netflix Inc. “But it’s very, very difficult to do it well — and when you do it well, it takes a lot of time.”

Over the past 12 months, Ufford’s team worked to streamline the Los Gatos, Calif., company’s data quality measures as part of a broader effort to boost data engineering efficiency based on a “simplify and automate” mantra, she said during a Strata session.

A starting point for the data-quality-upgrade effort was “acknowledging that not all data sets are created equal,” she noted. In general, ones with high levels of usage get more data quality checks than lightly used ones do, according to Ufford, but trying to stay on top of that “puts a lot of cognitive overhead on data engineers.” In addition, it’s hard to spot problems just by looking at the metadata and data-profiling statistics that Netflix captures in an internal data catalog, she said.

Calling for help on data quality

To ease those burdens, Netflix developed a custom data quality tool, called Quinto, and a Python library, called Jumpstarter, which are used together to generate recommendations on quality coverage and to set automated rules for assessing data sets. When data engineers run Spark-based extract, transform and load (ETL) jobs to pull in data on use of the company’s streaming media service for analysis, transient object tables are created in separate partitions from the production tables, Ufford said. Calls are then made from the temporary tables to Quinto to do quality checks before the ETL process is completed.

In the future, Netflix plans to expand the statistics it tracks when profiling data and implement more robust anomaly detection capabilities that can better pinpoint “what is problematic or wrong” in data sets, Ufford added. The ultimate goal, she said, is making sure data engineering isn’t a bottleneck for the analytics work done by Netflix’s BI and data science teams and its business units.

2017 Strata Data Conference in New York
Data quality in big data systems was among the topics discussed at the 2017 Strata Data Conference in New York.

Improving data consistency was one of the goals of a cloud-based data lake deployment at Financial Industry Regulatory Authority Inc., an organization in Washington, D.C., that creates and enforces rules for financial markets. Before the big data platform was set up, fragmented data sets in siloed systems made it hard for data scientists and analysts to do their jobs effectively, said John Hitchingham, director of performance engineering at the not-for-profit regulator, more commonly known as FINRA.

A homegrown data catalog, called herd, was “a real key piece for making this all work,” Hitchingham said in a presentation at the conference. FINRA collects metadata and data lineage info in the catalog; it also lists processing jobs and related data sets there, and it uses the catalog to track schemas and different versions of data in the big data architecture, which runs in the Amazon Web Services (AWS) cloud.

To help ensure the data is clean and consistent, Hitchingham’s team runs validation routines after it’s ingested into Amazon Simple Storage Service (S3) and registered in the catalog. The validated data is then written back to S3, completing a process that he said also reduces the amount of ETL processing required to normalize and enrich data sets before they’re made available for analysis.

Data quality takes a business turn

Brendan Aldrich, CDO at Ivy Tech Community CollegeBrendan Aldrich

The analytics team at Ivy Tech Community College in Indianapolis also does validation checks as data is ingested into its AWS-based big data system — but only to make sure the data matches what’s in the source systems from which it’s coming. The bulk of the school’s data quality measures are now carried out by individual departments in their own systems, said Brendan Aldrich, Ivy Tech’s chief data officer.

“Data cleansing is a never-ending process,” Aldrich said in an interview before speaking at the conference. “Our goal was, rather than getting on that treadmill, why not engage users and get them involved in cleansing the data where it should be done, in the front-end systems?”

That process started taking shape when Ivy Tech, which operates 45 campuses and satellite locations across Indiana, deployed the cloud platform and Hitachi Vantara’s Pentaho BI software several years ago to give its business users self-service analytics capabilities. And it was cemented in July 2016 when the college hired a new president who mandated that business decisions be based on data, Aldrich said.

The central role data plays in decision-making gives departments a big incentive to ensure information is accurate before it goes into the analytics system, he added. As a result, data quality problems are being found and fixed more quickly now, according to Aldrich. “Even if you’re cleansing data centrally, you usually don’t find [an issue] until someone notices it and points it out,” he said. “In this case, we’re cleansing it faster than we were before.”