Board of Cooperative Education Services Builds a Virtual Data Warehouse with Analytics/204
The Case for a Data Warehouse
Imagine all your data about a single topic coming together seamlessly from a number of disparate systems throughout your enterprise for decision support and analysis. Furthermore, the data is cleansed and possibly even transformed or aggregated for analytical use. Finally, you provide a variety of query, reporting, data mining, and analytical tools to end-users. They can explore the data freely and answer their own questions without excessive involvement from the IT department. That is the concept of a data warehouse in a nutshell.
However, building a data warehouse isn’t as simple and straightforward as the concept. Consuming a tremendous amount of programmer and financial resources, a full-blown data warehouse architecture requires the precise integration of numerous new and existing software components. As such, building a data warehouse is not a project that is lightly undertaken.
The Virtues of a Virtual Data Warehouse
Now suppose you could quickly build a virtual data warehouse as a working prototype of a data warehouse. And while you worked on the design and implementation of the data warehouse, the virtual data warehouse is up and running, fulfilling the needs of end users.
Unlike a data warehouse, the virtual data warehouse does not store data separately from the operational store. It applies appropriate business intelligence tools against views of operational data, providing the end-user with some decision support capabilities quickly and easily. And because users can satisfy most of their own information requests, the IT department - particularly of a small or mid-size company - can devote more time to the design and implementation of the data warehouse.
As one Model 204 user has discovered, the recently-released Analytics/204 product is the ideal business intelligence tool for building a virtual data warehouse.
The Board of Cooperative Education Services (BOCES) in Syracuse, New York, provides student-oriented research data to approximately 2,500 school principals, administrators and researchers in almost every school district in the state. Every request differs; each involves vast information about any number of students and requires that results be displayed in diverse formats. Recently, despite their best efforts to keep up with the ever-growing demand, IT began to fall behind with reports. They knew they needed a data warehouse.
The Challenge at BOCES
After making the decision to build a data warehouse for their internal clients, the modestly-sized IT department at BOCES had a real challenge before them.
- BOCES has 10 years of research data amounting to 30 million records of 3,000 fields, utilizing 161 different record types.
- All the data is stored in 58 Model 204 databases on the organization’s IBM S390 MVS enterprise server.
- It will take some time just to figure out what data the users would need to access in the data warehouse, let alone integrate the other components such as data cleansing, transformation, and analytical tools.
- Programming staff estimates it will take between 8 and 18 months to fully construct a data warehouse.
- Most troubling is this Catch-22: they still must continue to serve their large end-user base, which leaves very little time to build the data warehouse that they hope will eventually reduce their workload.
Choosing a Business Intelligence Tool
Realizing that this created a no-win situation, BOCES considered an interim, virtual data warehouse solution, hoping to free up IT resources to devote to the data warehouse project. The first task was to choose a business intelligence tool that was up to the challenge of turning BOCES’ operational data into a virtual data warehouse. Among other products, BOCES considered a product from Rocket Software called Analytics/204
Analyzing with Analytics/204
Analytics/204 is a business intelligence tool for analyzing even the largest enterprise databases. It relies on Model 204’s trademark indexing technology to deliver maximum performance for analytical applications. Rocket's database products have used bit mapped indices, which are recognized for their superior performance in query and analytical operations. Any number of fields in the database can be indexed, resulting in the creation of an index table for each one. The practical result is that most analytical operations are executed against the key tables, never accessing the actual data records. Functions such as summaries and value distributions are completed almost instantly, even with exceedingly large databases.
Consider a database that contains name and address information for 50 million U.S. citizens. Summarizing the number of people by state takes less than a second, regardless of the number of data records, since all work is done at the index table level. The State field index table is very small, containing only 50 unique values.
For more complex queries and analyses, Analytics/204 combines index table information through the use of relational and logical operators, producing similarly fast results. In a sense, the index tables ARE the data warehouse, with the actual data records being used only when a full drill-down is required. This, of course, is completely transparent to the user, who doesn’t need to worry about whether their operations are being completed at the index table or detail record level.
Back at BOCES: Choosing Analytics/204
After a brief review of popular business intelligence tools, BOCES chose Analytics/204
"Analytics/204 was attractive to us for several reasons," explains Larry Dismore, director of the Central New York Regional Information Center at BOCES. "It provided all the basic query, reporting, and analytical functions that we felt our users need. It was from our database vendor, so we knew there would be no integration problems and that the learning curve for producing custom functionality would be minimal. The performance was pretty much unbeatable. We were not naive about the risks of throwing large numbers of researchers at operational data, and we were very concerned about the performance of both the analytical and the operational applications. The indexing technology inherent to Analytics/204 made this a non-issue."
Designing the Virtual Warehouse
Like building a data warehouse, constructing a virtual data warehouse requires some understanding of the users’ informational needs. But because a virtual data warehouse consists mainly of index tables and logical views of the data, rather than physical copies of operational data, BOCES was not concerned with getting the views right the first time.
If they built a view that users didn’t need - remove it. If they omitted a view, which later surfaced as a requirement - build it on the fly. The use of the warehouse was undisturbed.
BOCES constructed numerous views of all current databases, providing the users maximum range to analyze the data. This helped their users understand what data was available to them, which translated into more accurate user requirements for the data warehouse. In two weeks, the IT department completed and delivered the initial implementation of the virtual data warehouse.
Accessing the Virtual Data Warehouse
Working in an intuitive, Windows-based environment, pioneer users find it relatively easy to access and analyze the data, given the appropriate database view. Here are some tasks they can accomplish:
- Query and Segment Data - Users can easily create simple or highly complex queries in a point-and-click environment. The values associated with each field are provided to the user upon request, helping the user find the exact data needed. Users can search the data based on the result of a macro or formula, versus a specific value. Because users don’t always know what they’re looking for when they first begin, they can iteratively refine their queries, eventually narrowing down the data in a step-by-step fashion. When the user is satisfied with the selection set of records and fields, the user can name and save the selection criteria for easy reuse later.
- Summarize the Data - Summaries of the data such as counts, averages, and maximum and minimum values are easily gotten. Summaries can be grouped by any number of different fields, or even by computed values. Summaries generally appear instantaneously, regardless of database size, because computations are executed against the index table structures, rather than against the actual database records.
- Display and Manipulate the Data - The virtual data warehouse comes with numerous tools for data display, including reports, charts, and graphs. These output formats are also interactive; the user can manipulate them to form new queries, drill down to underlying data, or view the data from different perspectives. When you require a more specialized view of the data, you can easily export the data into other analytical packages such as spreadsheets or OLAP tools.
Is a Virtual Data Warehouse in Your Future?
Not every application or enterprise is a good candidate for a virtual data warehouse. BOCES’ data and application has several characteristics that contributed to their swift success, including:
- Clean, Usable Data - BOCES’ operational data did not have serious integrity problems that prevented it from being effective for business intelligence purposes prior to cleansing. Their data did not need to be substantially transformed. Aggregations, which will likely be performed for the data warehouse, were not essential due to the ease and speed of dynamic summaries through the index tables.
- Appropriate Hardware Platform - Because BOCES’ operational databases are stored on their IBM OS/390™ enterprise server, they can handle the additional user load. On a smaller server, the number of users accessing the virtual data warehouse would most likely be limited - perhaps not meeting the original requirement.
- Appropriate Database Platform - The BOCES operational databases are stored in Model 204. If a significantly less robust DBMS was in use, the number of users accessing the warehouse would most likely be limited.
- Network Support - The infrastructure was already in place to connect the PCs to the mainframe, thus allowing users fast access to the new virtual data warehouse.
- Homogeneous Data - Although Analytics, with Model 204 structures, can analyze any data source, the BOCES project was particularly efficient since all data was in Model 204.
How did you score?
If several of the previous factors were not positive, the virtual data warehouse solution might have looked more and more like a full-blown data warehouse implementation, and possibly not worth the added effort.
A Virtual Data Warehouse for the Long Term
A virtual data warehouse not only quickly solves a business intelligence need, it also contributes significantly to the effective design of the data warehouse - a benefit that saves both time and money. When the data warehouse is up and running, the virtual data warehouse continues as the testing ground. Even a well-designed data warehouse needs adjustments over time. These areas can be researched by the virtual data warehouse.
And from another viewpoint, the strengths of the data warehouse will undoubtedly generate more interest in its use by other end-users, who in turn require enhancements to the data warehouse to satisfy their needs. These new demands can be immediately satisfied by the ever-flexible virtual data warehouse, always one step ahead.