What Is a Data Warehouse?
The computer industry is overflowing with confusing terms that often have several meanings, and “Data Warehousing” is no exception. To some, a data warehouse is a database that contains a large volume of data. To others, it’s a departmental sub-set of data used to support a particular business function. Simply put, a data warehouse is a pool of data that supports business users in making strategic decisions. This data is organized to be easy to use and provide superior reporting performance.
By itself, the data warehouse provides no benefit. It acts as a processing and storage facility for strategic enterprise data. Adding an application that presents this data back to the user in a meaningful way, however, turns the data warehouse into a powerful analysis tool. This type of application is called a Decision Support System -- or DSS for short. Although there are sophisticated types of DSSs that can perform very specialized tasks -- such as customer profiling, revenue management, and demographic analysis -- the most common type of DSS is one that provides ad-hoc strategic reporting on business data.
Typically, when companies talk about a data warehouse, they are referring to both the warehouse and the DSS rolled into one -- the pool of data and the user interface that makes it meaningful to the organization. For the purposes of this white paper, “data warehouse” refers to both the warehouse and the DSS.
The Data Warehouse as a Competitive Advantage
The concept behind data warehousing is not new. In many companies, the IS department has been extracting key data from operational systems and using it for ad-hoc reporting for years. So why the increasing popularity of data warehousing? The reasons vary, but the major factors follow:
- Businesses are becoming more market-driven with increasing competition and deregulation.
- Brand loyalty has all but vanished in mass consumer markets as customers learn to expect more and more from the products they purchase.
- Methods of data warehousing are becoming more formalized and predictable.
- Hardware technology has advanced to the point that medium- and large-scale warehouses are now possible in client/server environments.
- Companies are seeing the strategic value in the transactional data they have accumulated over the years.
Data warehousing is a way to gain a competitive advantage through better strategic decision making and planning. The link between better decision making and improved competitiveness is obvious, but how the warehouse makes this possible might not be so clear. Data warehousing aims to supply the right data at the right time to the right set of business users, providing them with the analytical capability to make better strategic decisions.
In addition to planning and decision making, the data warehouse supports a more dynamic business model, which is a must in rapidly changing industries like telecommunications, transportation, and healthcare. Competition is fierce in these industries and any competitive advantage can directly affect the bottom line.
Implementing a Data Warehouse: The Right Approach
Businesses that have most successfully incorporated data warehousing into their organizations have done so by educating management, business users, and the technical staff before embarking on a large, costly warehousing project. Enabling the organization to see the value of warehousing is best accomplished by initially implementing a scope-reduced data warehouse that provides some fundamental business measurements, such as reporting of sales by customer, region, sales person, and products.
Typically, this small-scale warehouse provides information that is highly valuable and previously unavailable from the transactional systems. It also shows management and users how the data warehouse can add value to the organization. In addition to proving value, the initial warehouse provides ample training for the data warehouse team. The following are other benefits of using this approach:
- Business users begin to think of reporting needs in terms of value to the company rather than in terms of what data is available in a reasonable amount of time from the transactional systems.
- Any outside consulting services can be evaluated before committing to a large-scale project.
- The technical staff becomes more efficient in warehouse implementation, which reduces costs for a more full-scale effort.
- Management can get a better feel for sizing and scheduling data warehousing projects.
- Business users and technical staff gain management buy in, which makes future efforts run more smoothly.
Phases of the Warehouse Project
A typical warehouse project is divided into the five phases: requirements analysis, design, construction, testing, and implementation. The requirements analysis phase determines the most important aspect of the warehouse: the business goals. The ultimate success or failure of the warehouse is determined by how well the data warehouse meets these business goals. During the design phase, a data warehousing architecture that supports the short-term and long-term requirements is created. This architecture is implemented during the construction phase and verified against user requirements during the testing phase. The warehouse is put into production during the implementation phase, when users are trained and knowledge transfer is completed.
Requirements Analysis: Identifying the Business Goals
The requirements analysis phase is easily the most important phase of building a data warehouse. How well this phase is completed affects the quality and cost of the entire project. A data warehouse with detailed, quantitative business goals is more likely to be successful, cost less, and become a valuable strategic tool than a warehouse with loosely defined goals. Requirements should be gathered through the input of key individuals throughout the organization and should be documented in a requirements document that serves as a foundation for a design.
Design: A Blueprint for Construction
The design of the data warehouse ideally begins when the requirements analysis document is stable and has organizational buy-in. The purpose of the design phase is to create a blueprint for the many components that comprise the data warehouse. The detail level of the design should be sufficient for developers to construct the system.
Construction: A Parallel Approach
After the requirements analysis and design phases are complete, warehouse construction begins. A parallel approach to construction -- one that tackles key development tasks in tandem -- is the best approach. It minimizes project time that is otherwise spent waiting for one task to be completed before beginning another. For a parallel approach to work, the design specifications must support this approach.
Testing: Satisfying User Requirements
The first objective of testing is to demonstrate that the system addresses the business problems and satisfies user requirements. Too often, data warehouses do not serve these very important needs. An effective means of ensuring that the business problems are addressed is staging a review of deliverables produced during the analysis and design phases. Once the requirements have been validated, integration testing -- the project team’s final test of the system’s functionality, performance, and fit -- begins. During this phase, all components of the system should be tested together to ensure that the warehouse and its components are functioning correctly.
Implementation: Putting the Warehouse into Production
The goal of the implementation stage is to create a production version of the data warehouse. This keeps the development and test environments separate from the production data warehouse. Creating the production data warehouse involves a variety of tasks, typically including the following:
- Maintaining users and security policies
- Creating a new database
- Loading historical data into the database
- Validating and enhancing load and query performance
A key component of warehouse implementation is training. Business users must be trained to use the warehouse for meaningful analysis and decision support, or the power of the warehouse will be lost. Mentoring users and transferring knowledge helps ensure success going forward.
Back to the Fundamentals
Having the right approach is critical, but it is only half the battle. Several organizational issues have to be addressed before the environment is right for beginning a data warehousing project. Before getting started, organizations should have answers to the following key questions.
Who Is the Sponsor?
A data warehouse that is not properly sponsored will not likely live up to its expectations. Like any information system, the data warehouse needs a strong sponsor to mitigate the many risks associated with such a complex and costly project. Unlike most IS projects, data warehousing requires sponsorship at higher levels of management due to its ability to affect a large portion of an organization.
Most of the obstacles encountered during warehousing projects are political, not technical, and require effective managerial skills -- as well as a great deal of foresight -- to overcome. These obstacles can occur at any level of the organization and can vary greatly depending upon a company’s culture, people, and stability. Several warehouse projects have run into trouble because a department failed to commit the necessary resources, time, or knowledge to the data warehousing team. Each of these took a sponsor with the proper managerial authority and skill to get back on track.
Should We Build a Data Warehouse or Data Mart?
The answer to this question depends on your long-term needs for data warehousing. With an increasing number of off-the-shelf, customizable ad-hoc query and reporting tools flooding the market, it is becoming apparent that developing and maintaining the DSS application itself is fairly trivial compared to developing and maintaining the appropriate source data structures and flows. This is where the data mart enters the picture.
Because each mart is of limited scope and typically contains medium- to high-level data, a decision support system can be easily developed from it. How, then, do we make our data marts fairly easy to design and develop? The best way is to invest some extra planning time up front to design a detailed (i.e. low-level data) warehouse with a scope that is sufficiently broad enough to support feeding any number of future data marts of more limited scope and higher levels of summary. In a nutshell, then, designing and developing DSS applications based on sufficiently integrated marts is greatly simplified by using the detailed data from the data warehouse to design, develop, and feed the data marts.
If you cannot afford the extra time and cost associated with this strategy then you will be tempted to forgo the warehouse design and immediately develop data marts. Data marts can and are sometimes fed directly by operational systems instead of a detailed warehouse, and this can be a viable solution for the short term. In the long run, however, as more marts are developed and incorporated into the organization, there will be increasing data discrepancy problems and each one will not be nearly as trivial and cheap to implement as one that’s fed from a detailed warehouse.
Therefore, a data warehouse that facilitates the creation of data marts is often the best choice for an organization that envisions the use of data warehousing technology well into the future. Since the warehouse is part of the infrastructure, it has a higher initial cost but lower incremental costs.
Who Should Be on the Team?
The extent that a company wants to incorporate data warehousing into the fabric of the organization determines the composition of the warehouse team. Companies embracing data warehousing for the long term should use as many of their internal resources as possible to implement the project. Using consultants who can supplement the knowledge and skills not possessed in-house and train the warehouse team to take over the project is a good approach. This provides benefits for both the company and the employees. The company enjoys reduced costs over time, and the employees are part of an initiative that many perceive as marketable and new.
Creating a separate group that is responsible for only data warehousing has proven to be a good approach for developing the project team. The major justification for separating this team from other groups in the company is to give them the freedom they need to develop and use implementation methods that may be different from those used throughout the rest of the company.
The types of resources needed to implement a data warehouse are the same ones needed to implement an operational database system. These include a project manager, database administrators, data modelers, application developers, database architects, and very importantly, business users. However, these resources must know the specifics behind implementing a data warehouse. For example, the data modeler should know how to create a data model that supports the data warehouse. This requires entirely different modeling techniques than those used to design operational systems.
The skill level of the resources and the size of the data warehouse project affect the actual number of resources needed. Not including the business users, three highly-experienced resources are usually needed for a medium-sized project with a reasonable schedule.
How Much Will It Cost?
There are two unique challenges to overcome when developing a budget for a data warehousing project. First is the initial plasticity of the requirements. It is very difficult to determine this cost before gathering requirements because converting the business needs into requirements (and hence cost) is not a straightforward process.
Of course, management will want to know how much the entire project is going to cost before any work begins. Therefore, it is often helpful to treat the requirements gathering phase of a data warehousing project as it own project. The costs and schedules associated with completing the requirements phase are relatively easy to estimate. At the end of this phase, cost and schedules to complete the remainder of the warehousing project can be easily and accurately estimated.
A related issue is the difficulty in determining return on investment. This is due to the infrastructural nature of data warehousing. Providing ROI figures for infrastructural investments is notoriously difficult. However, it is often possible to determine the ROI of data marts that are fed by a central warehouse by identifying the value they can deliver to the organization.
The Future of Data Warehousing
Ten years from now, data warehouses will likely be as commonplace as transactional systems are today. Due to increases in competitive pressures and consumer expectations, strategic planning and decision making will become a standard and necessary part of doing business. Faster, cheaper, more powerful hardware systems will increase the number and size of data warehouses in the marketplace, further expanding the analysis of business information for a competitive advantage.
Although warehousing projects are large in scope and require organizations to take a good look at their strategic goals and objectives, the hard work is well worth it. Not only is the data warehouse a competitive tool for the near term, but it is likely to be the standard for strategic decision making in the future.
About the Author
John Dennis is founder and president of Galileo Technologies, Inc., a full-service Data Warehousing Consulting firm based in Atlanta, Georgia, that provides data warehouse consulting to medium and large companies in the Southeast. Dennis has focused on data warehousing solutions and the formalization of methods and techniques for the past six years, successfully implementing and managing many large-scale data warehousing projects in a variety of industries, including telecommunications, healthcare, transportation, and retail. Galileo Technologies uses a goal-oriented data warehouse methodology that focuses on the business need rather than the data in the systems. This methodology has been refined to include the best practices and techniques within the data warehouse industry. John Dennis and Galileo Technologies, Inc., can be reached at 770-590-8595 or jdennis@galileous.com.
© Copyright 1998, Galileo Technologies, Inc., All Rights Reserved |