1. Introduction
One of the goals of benchmarks is to compare several database management
systems on the same class of applications in order to show which one is more
efficient for this particular class. Benchmarks used in the industry today are always
oriented towards uncovering a defined limited set of features that a database should
implement efficiently in order to successfully support the class of applications the
benchmark was created for.
One of the first industry benchmarks, the TP1 benchmark [1], developed by
IBM purported to measure the performance of a system handling ATM transactions
in a batch mode. Both TPC-C and TPC-D have gained widespread acceptance as the
industry's premier benchmarks in their respective fields (OLTP and Decision
Support). TPC-E failed to garner enough support because being an enterprise
benchmark, it was only relevant to a relatively small number of companies
competing in that space [11]. The Simple Database Operations Benchmark [9], Altair
Complex-Object Benchmark [14], OO1 [6] and OO7 [3, 2] Benchmarks were created
to provide useful insight for end-users evaluating the performance of Object Oriented
Database Management Systems (OODBMS).
Most object-relational DBMS are build on top of relational database by
adding the following four key features [12]: inheritance, complex object support, an
extensible type system, and triggers. The appearance of such databases necessitated
the creation of the BUCKY benchmark [4], which tests most of these specific
2
features. It emphasizes those features of object-relational databases that are not
covered by pure relational databases.
1.1 Semantic Model
The semantic database models in general, and the Semantic Binary Model
SBM [8, 10] in particular, represent information as a collection of elementary facts
categorizing objects or establishing relationships of various kinds between pairs of
objects. The facts in the database are of three types: facts stating that an object
belongs to a category; facts stating that there is a relationship between objects; and
facts relating objects to values. The relationships can be multivalued.
The objects are categorized into classes according to their common
properties. These classes, called categories, need not be disjoint; that is, one object
may belong to several of them. Further, an arbitrary structure of subcategories and
supercategories can be defined.
1.2 Why a Different Benchmark
Unfortunately, most benchmarks do not provide general problem statement;
instead they enforce a specific implementation that can not be efficiently translated
to a different data model. For example, TPC benchmarks compares the efficiency of
implementation of the same solution for different relational DBMS'es rather then
comparing how efficiently DBMS'es are able to solve a given problem. The
benchmark proposed does not enforce specific implementation thus allow native,
efficient implementation for any DBMS - semantic, relational or any other, which
makes it highly portable [7].
Majority of existing benchmarks is designed to evaluate features native to
relational DBMS'es and none of them are suitable to evaluate performance of the
features characteristic of semantic database applications. The benchmark proposed
evaluates the ability of DBMS to efficiently support such features including sparse
data, complex inheritances, many-to-many relations and variable field length.
The rest of the paper is organized as follows: The benchmark application
and the requirements for execution of transactions are defined in general terms in
sections 2, 5, 6 and 7. Sections 3 and 4 describe our implementations of this
application for semantic and relational database respectively. Section 8 presents the
results we obtained for a semantic and a relational database and analyses the results.
Conclusions are provided in section 9.
2. The Problem Stated
It has recently become a practice for many consumer organizations to
conduct consumer surveys. A large number of survey forms are mailed out to people
and small companies with questions about shopping preferences in order to
determine consumer patterns. Some consumers will fill out the survey and mail it
back. The results of the survey should be stored in a database.
Our survey collects information about several types of legal persons. It is
mailed to physical persons and corporations and we keep information about all the
legal persons the survey was mailed to. Those who answer the survey and mail it
back are considered consumers and categorized into the corresponding category. We
also try to collect referrals from people to their friends and mail our survey to the
referred people. We remember the information about referrals so that we can explore
correlation between groups of people who know each other. A person may refer
another person without filling out the survey and thus without becoming a consumer
as we define it.
3
Among the other questions, we will ask a consumer which stores does he
prefer to shop at. We will thus keep a catalog of stores with their names and types
(pharmacy, mass merchandiser, etc.) A consumer may list several stores he usually
shops at, so that many-to-many relationship is established between consumers and
stores. We will also ask a consumer to tell us his approximate annual expenditures in
thousands of dollars and a list of his hobbies.
We will collect information about ten types of products. Consumers will fall
into different categories based on their answers about which products they regularly
consume, for example "coffee drinker." We will mail them a survey where they will
tell us if they regularly consume some product and answer questions about different
brands they consume within each product group. Each product group in our survey
has 10 brands. A consumer will tell us, which brands of the product he uses and
show his preference of different brands in terms of a satisfaction rating of 1 to 4.
Some consumers may indicate that they use this type of product, but none of the
brands listed by us. This option should also be accommodated. We will also let a
consumer write any comment about any brand he is using (from the ten brands we
are asking about) if he wishes to do so. In practice, consumers seldom write any
comments, so this information will be sparse. However, if they do write a comment it
can be of any length and will probably be rather long.
3. A Semantic Schema for the Benchmark Database
The benchmark deals with persons and corporations. Person and
corporation are represented by two different categories with the appropriate personal
attributes. They both inherit from a category Legal Person. A category Consumer,
will inherit from Legal Person, since both persons and corporations may be
consumers and there is no need to distinguish between them as consumers. Since the
same object can not be both a person and a corporation, categories Person and
Corporation will be set up as disjoint.
A legal person who answered our survey becomes a consumer. The
category Consumer will then be used to
capture information about such legal
persons as consumers. The attribute
"expenditure", is thus an attribute of the
category Consumer. The category
Consumer is not disjoint with either
Person or Corporation. In fact, every
consumer must be either a person or a
corporation. All of this is supported by a
semantic database on the level of
schema enforced integrity constraints.
The relationship between categories is shown in Figure 1.
Consumers can further be categorized into inherited categories G0, G1, ...
G9, which represent consumers of a particular product. Thus, those consumers who
are soap users will be categorized into the category "Soap Users" (one of Gi). The
same consumer may be categorized into several Gis if he is a user of several types of
products.
In our initial database population about 50% of all objects in the database
will be categorized into each of the following categories: Legal Person, Person,
Consumer and several of the G0, G1, ... G9 categories. Some will be categorized into
Figure 1. Relationship between subcategories
Consumer
Legal Person
Person Corporation
4
Legal Person, Corporation, Consumer and several of the G0, G1, ... G9. Some will
be just Person and Legal Person or Corporation and Legal Person.
In addition to this, there will be some objects in the category Store.
Consumers will be related to the stores they usually shop at. This relation is many-tomany,
which means that a consumer may be related to several stores at once and a
single store will, with high probability, be related to many consumers.
NAME: STRING TOTAL
ADDRESS: STRING TOTAL
A0, A1 … A9: INTEGER
C0, C1 … C9: STRING
EXPENDITURE: INTEGER
A0, A1…A9: INTEGER
C0, C1 … A9: STRING
A0, A1 … A9: INTEGER
C0, C1 … C9: STRING
->KNOWS (M:M)->
NAME: STRING
SSN: INTEGER
HOBBY: STRING M:M
ADDRESS: STRING
NAME: STRING TOTAL
TYPE: STRING
Customer-of
(m:m)
…
PERSON LEGAL PERSON CORPORATION
CONSUMER STORE
G0 G1 G9
Figure 2. Semantic Schema for the Semantic Benchmark
A special relation "knows" is going from category Person into itself. The
semantics of this is that a person may know and refer to us several other persons and
will be related to each one of them via this relation. Since a person typically refers
(and may be referred by) several other persons, this relation will also be many-tomany.
A semantic schema of the database appears in Figure 2.
4. Relational Schemas for the Benchmark Database
LegalPerson
Id
Type
Name
Address
SSN
Indexes:
Name (Name)
SSN (SSN)
LPersonHobby
LegalPersonId
Hobby
Indexes:
Hobby (Hobby,LegalPersonId)
Consumer
LegalPersonId
Expenditure
Store
Name
Type
ConsumerCustomerOf
LegalPersonId
StoreName
Indexes:
Store (StoreName,LegalPersonId)
G0
LegalPersonId
a1,a2,…,a9
c1,c2,…,c9
Indexes:
a1 (a1,LegalPersonId)
…
a9 (a9,LegalPersonId)
LPersonKnowsLPerson
LegalPersonId
KnowsId
Indexes:
Knows (LegalPersonId,KnowsId)
G1
LegalPersonId
a1,a2,…,a9
c1,c2,…,c9
Indexes:
a1 (a1,LegalPersonId)
…
a9 (a9,LegalPersonId)
G9
LegalPersonId
a1,a2,…,a9
c1,c2,…,c9
Indexes:
a1 (a1,LegalPersonId)
…
a9 (a9,LegalPersonId)
…
Figure 3. A relational schema for the Sparse model
While the benchmark application has a clear and obvious representation in
the semantic schema, creating a relational schema for this applications in not evident.
When designing the semantic schema our only concern was the readability of the
schema capturing the full semantics of the application. In designing a relational
5
schema, we have to think more about the efficiency of the queries that we expect will
be executed on this database and about the tradeoffs between this efficiency,
readability, and the size of the database.
Among the many choices for the relational schema we considered (for
detailed analysis see [13]), we have chosen two that we believe to be the best
candidates for efficient implementation. We call them the Sparse model and the
Compact model. The corresponding relational schemas for these two models are
shown in Figures 3 and 4, respectively. In the Sparse model the creation of ten
indexes per group (consisting of LegalPersonId and ai) is reasonable to facilitate
efficient access to the data in G0…G9. Even though our benchmark queries do not
access every ai in every Gi, the schema should accommodate all similar queries with
the same efficiency, so we have to create all of the indexes. However, creating so
many indexes will slow down update operations and take up too much disk space
and cache memory.
LegalPerson
Id
Type
Name
Address
SSN
Indexes:
Name (Name)
SSN (SSN)
LPersonHobby
LPId
Hobby
Indexes:
Hobby (Hobby,LPId)
Consumer
LPId
Expenditure
Store
Name
Type
ConsumerCustomerOf
LPId
StoreName
Indexes:
Store (StoreName,LPId)
Answer
Type
LPId
Id
Value
Indexes:
AnswerType (Type,Id,Value)
Comment
LPId
Type
Id
Value
LPersonKnowsLPerson
LPId
KnowsId
Indexes:
Knows (LPId,KnowsId)
Figure 4. A relational schema for the Compact model
For the Compact model we create two tables, one to keep all ai attributes,
the other to keep all ci attributes. Each table has the columns: LegalPersonId, group
number, attribute number and the value of this attribute. The primary key will consist
of LegalPersonId, group number and attribute number. For this model, we need to
create just one additional index consisting of group number, attribute number and
value.
Lower
bound
Upper
bound
Mean Variance
Name length 5 40 12 5
Address length 15 100 35 20
Comment length 5 255 30 100
Number of hobbies per consumer 0 19 0 10
Number of stores per consumer 1 19 4 10
Expenditure 1 89 20 10
Number of groups a consumer belongs to 1 10 5 4
Number of brands a consumer uses 0 9 1 1
Table 2. Normal distribution of parameters for initial database population
6
5. Initial Database Population
The number of Legal Persons in the initial population defines the scalability
[7] of the database. The results published in this paper were obtained on a database
with 200,000 corporations and 800,000 persons. 500,000 of persons and 100,000 of
corporations are consumers. The data was generated randomly according to the table
of Normal Distribution (Gaussian) parameters (Table 2). The detailed definition of
the initial data set can be found in [13]. A random set of Persons must be pre-chosen
for transaction #4. This set consists of 0.1% of all Persons in the database.
6. Database Transactions
Our benchmark consists of five transactions performed independently and
sequentially. We expect efficient DBMS-specific implementations for each particular
database.
Transaction 1:
The first transaction is simple. The task is to count the number of
consumers that belong to every one of the ten product consumer groups in the
database. The result is just one number: the cardinality of the intersection of groups
G0...G9. The formal definition of the transaction is shown in formula (1).
9
=0
=
i
i R G , (1)
where Gi are the groups of consumers that consume a particular product.
Transaction 2:
The second transaction consists of two different physical database
transactions. The first one finds all consumers who use brand #1 of product #1 as
their first preference among the brands of product #1, and at the same time use brand
#2 of product #2 as their second preference among the brands of product #2. Such
consumers form a new group in the database Gnew. This new group must be created
in the database and all found consumers should be categorized into it. The formal
definition of the transaction is shown in formula (2).
{ | [ :: ] 1} { | [ :: ] 2} 1 1 1 2 2 2 G = g ÎG g G A = g ÎG g G A = new (2)
The second physical transaction should delete the newly created group from
the database. The sum of execution times of both physical transactions is considered
the execution time of Benchmark Transaction #2.
Transaction 3:
The third transaction is a complex query counting those consumers who
regularly shop at store X and have hobby Y, excluding those who use brand #3 of
product #3 as their third preference among the brands of product #3, and at the same
time use brand #4 of product #4 as their fourth preference among the brands of
product #4. The result is just one number - a count of such consumers. The formal
definition of the transaction is shown in formula (3).
({ | [ :: ] 3} { | [ :: ] 4})
{ | [ :: _ :: ] })
({ | [ :: ] }
3 3 3 4 4 4 Î = Î =
Î = -
Î =
=
g G g G A g G g G A
c Consumer c Consumer customer of name Y
p Person p Person hobby X
R
(3)
7
Transaction 4:
The fourth transaction can be explained by the following: For each person
from a given (randomly chosen) set of 0.1% of all persons, expand the relation
"knows" to relate this person to all people he has a chain of acquaintance to. Abort
the transaction rather than commit. Print the length of the maximal chain from a
person. The formal definition of the transaction is shown in formula (4).
NewDatabase OldDatabase K = , where
. . , 1.. 1 . . , . . }
{ . . | , , , , ,... :
1 1
1 2
< > " = - < > < >
= < > Î Î $ $ Î
s knows a i n a knows a + a knows p
K s knows p s S p Person n a a a Person
i i n
n (4)
Transaction 5:
The fifth transaction counts the number of consumers in each one of the ten
product consumer groups in the database. The result is ten numbers: the cardinality
of the each of the groups G0...G9. The formal definition of the transaction is shown
in formula (5).
R = G ,i = 0..9 i i (5)
7. Execution of Transactions
The benchmark is running in single user mode. Only one client is running
the benchmark transactions at a time. Thus, we are not testing concurrency control
performance by this benchmark. A DBMS is, however, allowed to use any kind of
parallelism it can exploit in single user mode.
The benchmark transactions are executed in two modes: hot and cold. Both
"cold time" and "hot time" are collected for each transaction. Both results are
included in the final result. The Cold time is the time required to perform a
transaction immediately after starting the DBMS on a system with an empty cache.
This is normally achieved by rebooting the system before executing each transaction.
The hot time is the time required to perform a transaction immediately after
performing an identical transaction without clearing any cache or restarting the
DBMS. To collect the hot time we run the same transaction in a loop until the time
of execution stabilizes, which typically happens on the third or fourth run. Once the
execution time stabilizes we compute the arithmetic mean of the following five
transactions and this is considered the final hot execution time for this transaction.
8. Results and Analysis
We ran the benchmark for the Sem-ODB Semantic Object-Oriented
Database Engine implemented at Florida International University's High
Performance Database Research Center (HPDRC) [10]. We also ran the benchmark
for one of the leading commercial relational databases. The tests were done on a dual
processor Pentium II 400Mhz with 256MB total memory and 9Gb Seagate SCSI disk
drive under Windows NT Server 4.0 Enterprise Edition.
The version of Sem-ODB used for benchmarking did not utilize the
multiprocessor architecture of the underlying hardware. We did, however, observe
the relational database using both processors for parallel computations. We have run
tests with different memory limitations imposed on the DBMS'es. Sem-ODB was
allowed to use 16 Megabytes of memory and never actually used more than 12
Megabytes for the benchmark transactions. For the relational database, two different
8
tests were conducted. In one, the relational DBMS was allowed to use 16 Megabytes,
in the other 128 Megabytes. For some transactions, the 16 Megabyte quota was
enough for efficient execution, for other transactions the relational DBMS was
willing to use up to 128 Megabytes for better performance.
Both cold and hot times were collected for each memory limitation and for
both relational schemas (sparse and compact). Thus, eight execution times were
collected per transaction for the relational DBMS. This was done to make sure that
we have done everything we could to allow the relational database to achieve its best
performance. We observed that in some cases the sparse model was more efficient,
but in other cases the compact model was faster. In order to prevent criticism on the
choice of the model, we decided to include all the results in this paper.
We have spent a considerable amount of time inventing different
implementations and fine tuning the relational database. We tried different
combinations of indexes, keys, DBMS options, and schemas in order to achieve
greater performance. The semantic database on the other hand, did not require any
tweaking to optimize its performance. Its performance was acceptable in the very
first version.
The semantic DBMS is able to capture the exact semantics of the problem
domain and provide a single optimal way to represent it in a semantic schema. All
the appropriate indexes are built automatically and follow from the definition of the
Semantic Database. By its design, it can efficiently answer arbitrary queries without
the need for an experienced person to spend time tuning it for a particular
application.
DBMS Model Semantic Relational Sparse Relational Compact
DB Size (Mb) 406Mb 1046Mb 382Mb
RAM 16Mb 16Mb 128Mb 16Mb 128Mb
Cold times (seconds)
Transaction 1 1.61 11.52 11.55 16.11 15.94
Transaction 2 1.13 0.53 0.56 0.34 0.36
Transaction 3 0.91 5.95 5.91 5.97 5.88
Transaction 4 55.65 55.63 43.02 55.63 43.02
Transaction 5 8.62 11.66 11.53 15.31 15.17
Hot times (seconds)
Transaction 1 0.04 11.66 5.39 15.81 12.58
Transaction 2 0.07 0.28 0.28 0.09 0.09
Transaction 3 0.33 2.72 2.72 2.72 2.70
Transaction 4 0.23 35.02 2.87 35.02 2.87
Transaction 5 6.85 11.36 2.17 14.92 10.32
Table 3. The benchmark results
One might think that to create enough indexes for the execution of arbitrary
queries, the semantic database would have to unnecessarily use too much disk space.
The results however prove this not true. In our relational implementation the sparse
model contains a similar number of indexes to the semantic database but requires 2.5
times more disk space. The compact model uses about the same amount of disk
space as the semantic database, but has worse performance on most transactions and
is not universal in the sense that this model would not be possible at all if, for
example, the attributes a0..a9 were of different types.
The semantic database is outperformed by the relational on a few
transactions in
to implement data of the specifically desginnated advance excel Microsoft to enable user needs to comply with all data protection act and copyright laws and exemplify the end users specification and relatioal RDBMS
None that we know of as advanced.
file system is based on ISAM. It takes time to search a data in a file (table). It is very difficult to maintain physical refrential integrity. It's not an easy talk to patition or scale out a database. RDBMS is quite wide concept. Everything is under one logical box. We can maitain data integrity, reduce the redundancy and increase the consistency of the data. It is very easy to maintain the data and scale out it.
The RDBMS (Relational Database Management System) is responsible for the control of the database however it requires appropriate planning and implementation of your design in order for this to work properly. Further proper maintenance of the Database environment is critical to its on-going operation. In short proper design by the database architect followed by proper implementation of this design by the DBA or SQL Developer coupled with proper SQL environment deployment is responsible for the healthy operation of a RDBMS.
Victorian Partnership for Advanced Computing was created in 2000.
mala anser pahije
what are the writing technics? what are the writing technics?
to implement data of the specifically desginnated advance excel Microsoft to enable user needs to comply with all data protection act and copyright laws and exemplify the end users specification and relatioal RDBMS
what is rdbms and its application
yes it is a rdbms
FL Technics was created in 2007.
FL Technics's population is 550.
Technics - brand - was created in 1918-03.
1. relational database management system(RDBMS) 2. object Rdbms. 1. relational database management system(RDBMS) 2. object Rdbms.
Oracle is no more complex than any other RDBMS, such as MySQL or PostgreSQL. It has some unique features not found in other RDBMS', and is missing some features that may be found in other systems. Knowing "standard SQL" will give a basic level of knowledge needed to use Oracle, and then, as with all other RDBMS software, some additional learning will be required to take advantage of advanced features.
Don't Sweat the Technics was created on 1998-05-26.
AnswerMySQL, like most modern Database Management Systems is based on the relational model. So it is a RDBMS (Relational Database Management System).