Advanced technics in rdbms

Question

Wiki User · Answer

1. Introduction

One of the goals of benchmarks is to compare several database management

systems on the same class of applications in order to show which one is more

efficient for this particular class. Benchmarks used in the industry today are always

oriented towards uncovering a defined limited set of features that a database should

implement efficiently in order to successfully support the class of applications the

benchmark was created for.

One of the first industry benchmarks, the TP1 benchmark [1], developed by

IBM purported to measure the performance of a system handling ATM transactions

in a batch mode. Both TPC-C and TPC-D have gained widespread acceptance as the

industry's premier benchmarks in their respective fields (OLTP and Decision

Support). TPC-E failed to garner enough support because being an enterprise

benchmark, it was only relevant to a relatively small number of companies

competing in that space [11]. The Simple Database Operations Benchmark [9], Altair

Complex-Object Benchmark [14], OO1 [6] and OO7 [3, 2] Benchmarks were created

to provide useful insight for end-users evaluating the performance of Object Oriented

Database Management Systems (OODBMS).

Most object-relational DBMS are build on top of relational database by

adding the following four key features [12]: inheritance, complex object support, an

extensible type system, and triggers. The appearance of such databases necessitated

the creation of the BUCKY benchmark [4], which tests most of these specific

2

features. It emphasizes those features of object-relational databases that are not

covered by pure relational databases.

1.1 Semantic Model

The semantic database models in general, and the Semantic Binary Model

SBM [8, 10] in particular, represent information as a collection of elementary facts

categorizing objects or establishing relationships of various kinds between pairs of

objects. The facts in the database are of three types: facts stating that an object

belongs to a category; facts stating that there is a relationship between objects; and

facts relating objects to values. The relationships can be multivalued.

The objects are categorized into classes according to their common

properties. These classes, called categories, need not be disjoint; that is, one object

may belong to several of them. Further, an arbitrary structure of subcategories and

supercategories can be defined.

1.2 Why a Different Benchmark

Unfortunately, most benchmarks do not provide general problem statement;

instead they enforce a specific implementation that can not be efficiently translated

to a different data model. For example, TPC benchmarks compares the efficiency of

implementation of the same solution for different relational DBMS'es rather then

comparing how efficiently DBMS'es are able to solve a given problem. The

benchmark proposed does not enforce specific implementation thus allow native,

efficient implementation for any DBMS - semantic, relational or any other, which

makes it highly portable [7].

Majority of existing benchmarks is designed to evaluate features native to

relational DBMS'es and none of them are suitable to evaluate performance of the

features characteristic of semantic database applications. The benchmark proposed

evaluates the ability of DBMS to efficiently support such features including sparse

data, complex inheritances, many-to-many relations and variable field length.

The rest of the paper is organized as follows: The benchmark application

and the requirements for execution of transactions are defined in general terms in

sections 2, 5, 6 and 7. Sections 3 and 4 describe our implementations of this

application for semantic and relational database respectively. Section 8 presents the

results we obtained for a semantic and a relational database and analyses the results.

Conclusions are provided in section 9.

2. The Problem Stated

It has recently become a practice for many consumer organizations to

conduct consumer surveys. A large number of survey forms are mailed out to people

and small companies with questions about shopping preferences in order to

determine consumer patterns. Some consumers will fill out the survey and mail it

back. The results of the survey should be stored in a database.

Our survey collects information about several types of legal persons. It is

mailed to physical persons and corporations and we keep information about all the

legal persons the survey was mailed to. Those who answer the survey and mail it

back are considered consumers and categorized into the corresponding category. We

also try to collect referrals from people to their friends and mail our survey to the

referred people. We remember the information about referrals so that we can explore

correlation between groups of people who know each other. A person may refer

another person without filling out the survey and thus without becoming a consumer

as we define it.

3

Among the other questions, we will ask a consumer which stores does he

prefer to shop at. We will thus keep a catalog of stores with their names and types

(pharmacy, mass merchandiser, etc.) A consumer may list several stores he usually

shops at, so that many-to-many relationship is established between consumers and

stores. We will also ask a consumer to tell us his approximate annual expenditures in

thousands of dollars and a list of his hobbies.

We will collect information about ten types of products. Consumers will fall

into different categories based on their answers about which products they regularly

consume, for example "coffee drinker." We will mail them a survey where they will

tell us if they regularly consume some product and answer questions about different

brands they consume within each product group. Each product group in our survey

has 10 brands. A consumer will tell us, which brands of the product he uses and

show his preference of different brands in terms of a satisfaction rating of 1 to 4.

Some consumers may indicate that they use this type of product, but none of the

brands listed by us. This option should also be accommodated. We will also let a

consumer write any comment about any brand he is using (from the ten brands we

are asking about) if he wishes to do so. In practice, consumers seldom write any

comments, so this information will be sparse. However, if they do write a comment it

can be of any length and will probably be rather long.

3. A Semantic Schema for the Benchmark Database

The benchmark deals with persons and corporations. Person and

corporation are represented by two different categories with the appropriate personal

attributes. They both inherit from a category Legal Person. A category Consumer,

will inherit from Legal Person, since both persons and corporations may be

consumers and there is no need to distinguish between them as consumers. Since the

same object can not be both a person and a corporation, categories Person and

Corporation will be set up as disjoint.

A legal person who answered our survey becomes a consumer. The

category Consumer will then be used to

capture information about such legal

persons as consumers. The attribute

"expenditure", is thus an attribute of the

category Consumer. The category

Consumer is not disjoint with either

Person or Corporation. In fact, every

consumer must be either a person or a

corporation. All of this is supported by a

semantic database on the level of

schema enforced integrity constraints.

The relationship between categories is shown in Figure 1.

Consumers can further be categorized into inherited categories G0, G1, ...

G9, which represent consumers of a particular product. Thus, those consumers who

are soap users will be categorized into the category "Soap Users" (one of Gi). The

same consumer may be categorized into several Gis if he is a user of several types of

products.

In our initial database population about 50% of all objects in the database

will be categorized into each of the following categories: Legal Person, Person,

Consumer and several of the G0, G1, ... G9 categories. Some will be categorized into

Figure 1. Relationship between subcategories

Consumer

Legal Person

Person Corporation

4

Legal Person, Corporation, Consumer and several of the G0, G1, ... G9. Some will

be just Person and Legal Person or Corporation and Legal Person.

In addition to this, there will be some objects in the category Store.

Consumers will be related to the stores they usually shop at. This relation is many-tomany,

which means that a consumer may be related to several stores at once and a

single store will, with high probability, be related to many consumers.

NAME: STRING TOTAL

ADDRESS: STRING TOTAL

A0, A1 … A9: INTEGER

C0, C1 … C9: STRING

EXPENDITURE: INTEGER

A0, A1…A9: INTEGER

C0, C1 … A9: STRING

A0, A1 … A9: INTEGER

C0, C1 … C9: STRING

->KNOWS (M:M)->

NAME: STRING

SSN: INTEGER

HOBBY: STRING M:M

ADDRESS: STRING

NAME: STRING TOTAL

TYPE: STRING

Customer-of

(m:m)

…

PERSON LEGAL PERSON CORPORATION

CONSUMER STORE

G0 G1 G9

Figure 2. Semantic Schema for the Semantic Benchmark

A special relation "knows" is going from category Person into itself. The

semantics of this is that a person may know and refer to us several other persons and

will be related to each one of them via this relation. Since a person typically refers

(and may be referred by) several other persons, this relation will also be many-tomany.

A semantic schema of the database appears in Figure 2.

4. Relational Schemas for the Benchmark Database

LegalPerson

Id

Type

Name

Address

SSN

Indexes:

Name (Name)

SSN (SSN)

LPersonHobby

LegalPersonId

Hobby

Indexes:

Hobby (Hobby,LegalPersonId)

Consumer

LegalPersonId

Expenditure

Store

Name

Type

ConsumerCustomerOf

LegalPersonId

StoreName

Indexes:

Store (StoreName,LegalPersonId)

G0

LegalPersonId

a1,a2,…,a9

c1,c2,…,c9

Indexes:

a1 (a1,LegalPersonId)

…

a9 (a9,LegalPersonId)

LPersonKnowsLPerson

LegalPersonId

KnowsId

Indexes:

Knows (LegalPersonId,KnowsId)

G1

LegalPersonId

a1,a2,…,a9

c1,c2,…,c9

Indexes:

a1 (a1,LegalPersonId)

…

a9 (a9,LegalPersonId)

G9

LegalPersonId

a1,a2,…,a9

c1,c2,…,c9

Indexes:

a1 (a1,LegalPersonId)

…

a9 (a9,LegalPersonId)

…

Figure 3. A relational schema for the Sparse model

While the benchmark application has a clear and obvious representation in

the semantic schema, creating a relational schema for this applications in not evident.

When designing the semantic schema our only concern was the readability of the

schema capturing the full semantics of the application. In designing a relational

5

schema, we have to think more about the efficiency of the queries that we expect will

be executed on this database and about the tradeoffs between this efficiency,

readability, and the size of the database.

Among the many choices for the relational schema we considered (for

detailed analysis see [13]), we have chosen two that we believe to be the best

candidates for efficient implementation. We call them the Sparse model and the

Compact model. The corresponding relational schemas for these two models are

shown in Figures 3 and 4, respectively. In the Sparse model the creation of ten

indexes per group (consisting of LegalPersonId and ai) is reasonable to facilitate

efficient access to the data in G0…G9. Even though our benchmark queries do not

access every ai in every Gi, the schema should accommodate all similar queries with

the same efficiency, so we have to create all of the indexes. However, creating so

many indexes will slow down update operations and take up too much disk space

and cache memory.

LegalPerson

Id

Type

Name

Address

SSN

Indexes:

Name (Name)

SSN (SSN)

LPersonHobby

LPId

Hobby

Indexes:

Hobby (Hobby,LPId)

Consumer

LPId

Expenditure

Store

Name

Type

ConsumerCustomerOf

LPId

StoreName

Indexes:

Store (StoreName,LPId)

Answer

Type

LPId

Id

Value

Indexes:

AnswerType (Type,Id,Value)

Comment

LPId

Type

Id

Value

LPersonKnowsLPerson

LPId

KnowsId

Indexes:

Knows (LPId,KnowsId)

Figure 4. A relational schema for the Compact model

For the Compact model we create two tables, one to keep all ai attributes,

the other to keep all ci attributes. Each table has the columns: LegalPersonId, group

number, attribute number and the value of this attribute. The primary key will consist

of LegalPersonId, group number and attribute number. For this model, we need to

create just one additional index consisting of group number, attribute number and

value.

Lower

bound

Upper

bound

Mean Variance

Name length 5 40 12 5

Address length 15 100 35 20

Comment length 5 255 30 100

Number of hobbies per consumer 0 19 0 10

Number of stores per consumer 1 19 4 10

Expenditure 1 89 20 10

Number of groups a consumer belongs to 1 10 5 4

Number of brands a consumer uses 0 9 1 1

Table 2. Normal distribution of parameters for initial database population

6

5. Initial Database Population

The number of Legal Persons in the initial population defines the scalability

[7] of the database. The results published in this paper were obtained on a database

with 200,000 corporations and 800,000 persons. 500,000 of persons and 100,000 of

corporations are consumers. The data was generated randomly according to the table

of Normal Distribution (Gaussian) parameters (Table 2). The detailed definition of

the initial data set can be found in [13]. A random set of Persons must be pre-chosen

for transaction #4. This set consists of 0.1% of all Persons in the database.

6. Database Transactions

Our benchmark consists of five transactions performed independently and

sequentially. We expect efficient DBMS-specific implementations for each particular

database.

Transaction 1:

The first transaction is simple. The task is to count the number of

consumers that belong to every one of the ten product consumer groups in the

database. The result is just one number: the cardinality of the intersection of groups

G0...G9. The formal definition of the transaction is shown in formula (1).

􀀀9

=0

=

i

i R G , (1)

where Gi are the groups of consumers that consume a particular product.

Transaction 2:

The second transaction consists of two different physical database

transactions. The first one finds all consumers who use brand #1 of product #1 as

their first preference among the brands of product #1, and at the same time use brand

#2 of product #2 as their second preference among the brands of product #2. Such

consumers form a new group in the database Gnew. This new group must be created

in the database and all found consumers should be categorized into it. The formal

definition of the transaction is shown in formula (2).

{ | [ :: ] 1} { | [ :: ] 2} 1 1 1 2 2 2 G = g ÎG g G A = g ÎG g G A = new (2)

The second physical transaction should delete the newly created group from

the database. The sum of execution times of both physical transactions is considered

the execution time of Benchmark Transaction #2.

Transaction 3:

The third transaction is a complex query counting those consumers who

regularly shop at store X and have hobby Y, excluding those who use brand #3 of

product #3 as their third preference among the brands of product #3, and at the same

time use brand #4 of product #4 as their fourth preference among the brands of

product #4. The result is just one number - a count of such consumers. The formal

definition of the transaction is shown in formula (3).

({ | [ :: ] 3} { | [ :: ] 4})

{ | [ :: _ :: ] })

({ | [ :: ] }

3 3 3 4 4 4 Î = Î =

Î = -

Î =

=

g G g G A g G g G A

c Consumer c Consumer customer of name Y

p Person p Person hobby X

R

(3)

7

Transaction 4:

The fourth transaction can be explained by the following: For each person

from a given (randomly chosen) set of 0.1% of all persons, expand the relation

"knows" to relate this person to all people he has a chain of acquaintance to. Abort

the transaction rather than commit. Print the length of the maximal chain from a

person. The formal definition of the transaction is shown in formula (4).

NewDatabase OldDatabase K = , where

. . , 1.. 1 . . , . . }

{ . . | , , , , ,... :

1 1

1 2

< > " = - < > < >

= < > Î Î $ $ Î

s knows a i n a knows a + a knows p

K s knows p s S p Person n a a a Person

i i n

n (4)

Transaction 5:

The fifth transaction counts the number of consumers in each one of the ten

product consumer groups in the database. The result is ten numbers: the cardinality

of the each of the groups G0...G9. The formal definition of the transaction is shown

in formula (5).

R = G ,i = 0..9 i i (5)

7. Execution of Transactions

The benchmark is running in single user mode. Only one client is running

the benchmark transactions at a time. Thus, we are not testing concurrency control

performance by this benchmark. A DBMS is, however, allowed to use any kind of

parallelism it can exploit in single user mode.

The benchmark transactions are executed in two modes: hot and cold. Both

"cold time" and "hot time" are collected for each transaction. Both results are

included in the final result. The Cold time is the time required to perform a

transaction immediately after starting the DBMS on a system with an empty cache.

This is normally achieved by rebooting the system before executing each transaction.

The hot time is the time required to perform a transaction immediately after

performing an identical transaction without clearing any cache or restarting the

DBMS. To collect the hot time we run the same transaction in a loop until the time

of execution stabilizes, which typically happens on the third or fourth run. Once the

execution time stabilizes we compute the arithmetic mean of the following five

transactions and this is considered the final hot execution time for this transaction.

8. Results and Analysis

We ran the benchmark for the Sem-ODB Semantic Object-Oriented

Database Engine implemented at Florida International University's High

Performance Database Research Center (HPDRC) [10]. We also ran the benchmark

for one of the leading commercial relational databases. The tests were done on a dual

processor Pentium II 400Mhz with 256MB total memory and 9Gb Seagate SCSI disk

drive under Windows NT Server 4.0 Enterprise Edition.

The version of Sem-ODB used for benchmarking did not utilize the

multiprocessor architecture of the underlying hardware. We did, however, observe

the relational database using both processors for parallel computations. We have run

tests with different memory limitations imposed on the DBMS'es. Sem-ODB was

allowed to use 16 Megabytes of memory and never actually used more than 12

Megabytes for the benchmark transactions. For the relational database, two different

8

tests were conducted. In one, the relational DBMS was allowed to use 16 Megabytes,

in the other 128 Megabytes. For some transactions, the 16 Megabyte quota was

enough for efficient execution, for other transactions the relational DBMS was

willing to use up to 128 Megabytes for better performance.

Both cold and hot times were collected for each memory limitation and for

both relational schemas (sparse and compact). Thus, eight execution times were

collected per transaction for the relational DBMS. This was done to make sure that

we have done everything we could to allow the relational database to achieve its best

performance. We observed that in some cases the sparse model was more efficient,

but in other cases the compact model was faster. In order to prevent criticism on the

choice of the model, we decided to include all the results in this paper.

We have spent a considerable amount of time inventing different

implementations and fine tuning the relational database. We tried different

combinations of indexes, keys, DBMS options, and schemas in order to achieve

greater performance. The semantic database on the other hand, did not require any

tweaking to optimize its performance. Its performance was acceptable in the very

first version.

The semantic DBMS is able to capture the exact semantics of the problem

domain and provide a single optimal way to represent it in a semantic schema. All

the appropriate indexes are built automatically and follow from the definition of the

Semantic Database. By its design, it can efficiently answer arbitrary queries without

the need for an experienced person to spend time tuning it for a particular

application.

DBMS Model Semantic Relational Sparse Relational Compact

DB Size (Mb) 406Mb 1046Mb 382Mb

RAM 16Mb 16Mb 128Mb 16Mb 128Mb

Cold times (seconds)

Transaction 1 1.61 11.52 11.55 16.11 15.94

Transaction 2 1.13 0.53 0.56 0.34 0.36

Transaction 3 0.91 5.95 5.91 5.97 5.88

Transaction 4 55.65 55.63 43.02 55.63 43.02

Transaction 5 8.62 11.66 11.53 15.31 15.17

Hot times (seconds)

Transaction 1 0.04 11.66 5.39 15.81 12.58

Transaction 2 0.07 0.28 0.28 0.09 0.09

Transaction 3 0.33 2.72 2.72 2.72 2.70

Transaction 4 0.23 35.02 2.87 35.02 2.87

Transaction 5 6.85 11.36 2.17 14.92 10.32

Table 3. The benchmark results

One might think that to create enough indexes for the execution of arbitrary

queries, the semantic database would have to unnecessarily use too much disk space.

The results however prove this not true. In our relational implementation the sparse

model contains a similar number of indexes to the semantic database but requires 2.5

times more disk space. The compact model uses about the same amount of disk

space as the semantic database, but has worse performance on most transactions and

is not universal in the sense that this model would not be possible at all if, for

example, the attributes a0..a9 were of different types.

The semantic database is outperformed by the relational on a few

transactions in

Wiki User · Answer

Here's an advanced technique: use complete sentences without obscure abbreviations.

Advanced technics in rdbms

What else can I help you with?

What are the features of Advanced RDBMS?

Did ancient Egypt have advanced technology?

The comparison Between RDBMS and conventional File System?

Who or what is responsible for maintaining database consistency?

When was Advanced Gravis Computer Technology created?

What is advanced techniques in RDBMS Describe?

What are the four distingutive writing technics?

What are the features of Advanced RDBMS?

What are the application of RDBMS?

Is foxpro a rdbms?

When was FL Technics created?

What is FL Technics's population?

When was Technics - brand - created?

What are the different types of management systems?

Is oracle difficult?

When was Don't Sweat the Technics created?

Is MySQL DBMS or RDBMS?

Resources

Top Categories

Product

Company