A Database has three views.
The External Schema: What the end user sees.
The Internal Schema: What the programmers of the program see.
The Conceptual Schema: The basic plan of the database. Most of the time this is in paper form as a Conceptual Schema Diagram (CSD)
Entity relationship diagram for railway reservation system?
Oh, what a lovely thought! Imagine the railways as a beautiful landscape painting. In the center, you have entities like 'Passenger,' 'Train,' and 'Reservation.' Connect them with lines showing relationships, like 'Passenger books Reservation,' and 'Train has Reservation.' Just like painting a happy little tree, let your diagram flow with harmony and clarity. Remember, there are no mistakes, just happy little accidents.
What is the advantages of DBMS over file processing?
Advantages of DBMS (Database Management Systems) are followings:
A true DBMS offers several advantages over file processing. The principal advantages of a DBMS are the followings:
• Flexibility: Because programs and data are independent, programs do not have to be modified when types of unrelated data are added to or deleted from the database, or when physical storage changes.
• Fast response to information requests: Because data are integrated into a single database, complex requests can be handled much more rapidly then if the data were located in separate, non-integrated files. In many businesses, faster response means better customer service.
• Multiple access: Database software allows data to be accessed in a variety of ways (such as through various key fields) and often, by using several programming languages (both 3GL and nonprocedural 4GL programs).
• Lower user training costs: Users often find it easier to learn such systems and training costs may be reduced. Also, the total time taken to process requests may be shorter, which would increase user productivity.
• Less storage: Theoretically, all occurrences of data items need be stored only once, thereby eliminating the storage of redundant data. System developers and database designers often use data normalization to minimize data redundancy. By-bdchanchal@gmail.com
Advantages of computerization?
There are various advantages associated with computerization. For example, unlike hard copies, computer files can be backed up at multiple locations.
What are the advantages of a warehouse?
Data warehouse is the pool of huge amount of data. The data in data ware house can be archived. And when the data is needed you can extract it from the archived files.
What is Partial functional dependency?
Partial Functional Dependency Indicates that if A and B are attributes of a table , B is partially dependent on A if there is some attribute that can be removed from A and yet the dependency still holds. Say for Ex, consider the following functional dependency that exists in the Tbl_Staff table: StaffID,Name -------> BranchID BranchID is functionally dependent on a subset of A (StaffID,Name), namely StaffID. Source :http://www.mahipalreddy.com/dbdesign/dbqa.htm
What are the three basic steps to creating database?
The following are the basic steps of creating database
Figure out why you need database-:This is the first step in creating database which decide reason for creating database example creating database for store data
What is Controlled redundancy in a database?
In non-database systems each application has its own private files. This can often lead to redundancy in stored data, with resultant waste in storage space. In a database the data is integrated. The database may be thought of as a unification of several otherwise distinct data files, with any redundancy among those files partially or wholly eliminated. Data integration is generally regarded as an important characteristic of a database. The avoidance of redundancy should be an aim, however, the vigour with which this aim should be pursued is open to question. Redundancy is * direct if a value is a copy of another * indirect if the value can be derived from other values: ** simplifies retrieval but complicates update ** conversely integration makes retrieval slow and updates easier * Data redundancy can lead to inconsistency in the database unless controlled. * the system should be aware of any data duplication - the system is responsible for ensuring updates are carried out correctly. * a DB with uncontrolled redundancy can be in an inconsistent state - it can supply incorrect or conflicting information * a given fact represented by a single entry cannot result in inconsistency - few systems are capable of propagating updates i.e. most systems do not support controlled redundancy.
1. Introduction
One of the goals of benchmarks is to compare several database management
systems on the same class of applications in order to show which one is more
efficient for this particular class. Benchmarks used in the industry today are always
oriented towards uncovering a defined limited set of features that a database should
implement efficiently in order to successfully support the class of applications the
benchmark was created for.
One of the first industry benchmarks, the TP1 benchmark [1], developed by
IBM purported to measure the performance of a system handling ATM transactions
in a batch mode. Both TPC-C and TPC-D have gained widespread acceptance as the
industry's premier benchmarks in their respective fields (OLTP and Decision
Support). TPC-E failed to garner enough support because being an enterprise
benchmark, it was only relevant to a relatively small number of companies
competing in that space [11]. The Simple Database Operations Benchmark [9], Altair
Complex-Object Benchmark [14], OO1 [6] and OO7 [3, 2] Benchmarks were created
to provide useful insight for end-users evaluating the performance of Object Oriented
Database Management Systems (OODBMS).
Most object-relational DBMS are build on top of relational database by
adding the following four key features [12]: inheritance, complex object support, an
extensible type system, and triggers. The appearance of such databases necessitated
the creation of the BUCKY benchmark [4], which tests most of these specific
2
features. It emphasizes those features of object-relational databases that are not
covered by pure relational databases.
1.1 Semantic Model
The semantic database models in general, and the Semantic Binary Model
SBM [8, 10] in particular, represent information as a collection of elementary facts
categorizing objects or establishing relationships of various kinds between pairs of
objects. The facts in the database are of three types: facts stating that an object
belongs to a category; facts stating that there is a relationship between objects; and
facts relating objects to values. The relationships can be multivalued.
The objects are categorized into classes according to their common
properties. These classes, called categories, need not be disjoint; that is, one object
may belong to several of them. Further, an arbitrary structure of subcategories and
supercategories can be defined.
1.2 Why a Different Benchmark
Unfortunately, most benchmarks do not provide general problem statement;
instead they enforce a specific implementation that can not be efficiently translated
to a different data model. For example, TPC benchmarks compares the efficiency of
implementation of the same solution for different relational DBMS'es rather then
comparing how efficiently DBMS'es are able to solve a given problem. The
benchmark proposed does not enforce specific implementation thus allow native,
efficient implementation for any DBMS - semantic, relational or any other, which
makes it highly portable [7].
Majority of existing benchmarks is designed to evaluate features native to
relational DBMS'es and none of them are suitable to evaluate performance of the
features characteristic of semantic database applications. The benchmark proposed
evaluates the ability of DBMS to efficiently support such features including sparse
data, complex inheritances, many-to-many relations and variable field length.
The rest of the paper is organized as follows: The benchmark application
and the requirements for execution of transactions are defined in general terms in
sections 2, 5, 6 and 7. Sections 3 and 4 describe our implementations of this
application for semantic and relational database respectively. Section 8 presents the
results we obtained for a semantic and a relational database and analyses the results.
Conclusions are provided in section 9.
2. The Problem Stated
It has recently become a practice for many consumer organizations to
conduct consumer surveys. A large number of survey forms are mailed out to people
and small companies with questions about shopping preferences in order to
determine consumer patterns. Some consumers will fill out the survey and mail it
back. The results of the survey should be stored in a database.
Our survey collects information about several types of legal persons. It is
mailed to physical persons and corporations and we keep information about all the
legal persons the survey was mailed to. Those who answer the survey and mail it
back are considered consumers and categorized into the corresponding category. We
also try to collect referrals from people to their friends and mail our survey to the
referred people. We remember the information about referrals so that we can explore
correlation between groups of people who know each other. A person may refer
another person without filling out the survey and thus without becoming a consumer
as we define it.
3
Among the other questions, we will ask a consumer which stores does he
prefer to shop at. We will thus keep a catalog of stores with their names and types
(pharmacy, mass merchandiser, etc.) A consumer may list several stores he usually
shops at, so that many-to-many relationship is established between consumers and
stores. We will also ask a consumer to tell us his approximate annual expenditures in
thousands of dollars and a list of his hobbies.
We will collect information about ten types of products. Consumers will fall
into different categories based on their answers about which products they regularly
consume, for example "coffee drinker." We will mail them a survey where they will
tell us if they regularly consume some product and answer questions about different
brands they consume within each product group. Each product group in our survey
has 10 brands. A consumer will tell us, which brands of the product he uses and
show his preference of different brands in terms of a satisfaction rating of 1 to 4.
Some consumers may indicate that they use this type of product, but none of the
brands listed by us. This option should also be accommodated. We will also let a
consumer write any comment about any brand he is using (from the ten brands we
are asking about) if he wishes to do so. In practice, consumers seldom write any
comments, so this information will be sparse. However, if they do write a comment it
can be of any length and will probably be rather long.
3. A Semantic Schema for the Benchmark Database
The benchmark deals with persons and corporations. Person and
corporation are represented by two different categories with the appropriate personal
attributes. They both inherit from a category Legal Person. A category Consumer,
will inherit from Legal Person, since both persons and corporations may be
consumers and there is no need to distinguish between them as consumers. Since the
same object can not be both a person and a corporation, categories Person and
Corporation will be set up as disjoint.
A legal person who answered our survey becomes a consumer. The
category Consumer will then be used to
capture information about such legal
persons as consumers. The attribute
"expenditure", is thus an attribute of the
category Consumer. The category
Consumer is not disjoint with either
Person or Corporation. In fact, every
consumer must be either a person or a
corporation. All of this is supported by a
semantic database on the level of
schema enforced integrity constraints.
The relationship between categories is shown in Figure 1.
Consumers can further be categorized into inherited categories G0, G1, ...
G9, which represent consumers of a particular product. Thus, those consumers who
are soap users will be categorized into the category "Soap Users" (one of Gi). The
same consumer may be categorized into several Gis if he is a user of several types of
products.
In our initial database population about 50% of all objects in the database
will be categorized into each of the following categories: Legal Person, Person,
Consumer and several of the G0, G1, ... G9 categories. Some will be categorized into
Figure 1. Relationship between subcategories
Consumer
Legal Person
Person Corporation
4
Legal Person, Corporation, Consumer and several of the G0, G1, ... G9. Some will
be just Person and Legal Person or Corporation and Legal Person.
In addition to this, there will be some objects in the category Store.
Consumers will be related to the stores they usually shop at. This relation is many-tomany,
which means that a consumer may be related to several stores at once and a
single store will, with high probability, be related to many consumers.
NAME: STRING TOTAL
ADDRESS: STRING TOTAL
A0, A1 … A9: INTEGER
C0, C1 … C9: STRING
EXPENDITURE: INTEGER
A0, A1…A9: INTEGER
C0, C1 … A9: STRING
A0, A1 … A9: INTEGER
C0, C1 … C9: STRING
->KNOWS (M:M)->
NAME: STRING
SSN: INTEGER
HOBBY: STRING M:M
ADDRESS: STRING
NAME: STRING TOTAL
TYPE: STRING
Customer-of
(m:m)
…
PERSON LEGAL PERSON CORPORATION
CONSUMER STORE
G0 G1 G9
Figure 2. Semantic Schema for the Semantic Benchmark
A special relation "knows" is going from category Person into itself. The
semantics of this is that a person may know and refer to us several other persons and
will be related to each one of them via this relation. Since a person typically refers
(and may be referred by) several other persons, this relation will also be many-tomany.
A semantic schema of the database appears in Figure 2.
4. Relational Schemas for the Benchmark Database
LegalPerson
Id
Type
Name
Address
SSN
Indexes:
Name (Name)
SSN (SSN)
LPersonHobby
LegalPersonId
Hobby
Indexes:
Hobby (Hobby,LegalPersonId)
Consumer
LegalPersonId
Expenditure
Store
Name
Type
ConsumerCustomerOf
LegalPersonId
StoreName
Indexes:
Store (StoreName,LegalPersonId)
G0
LegalPersonId
a1,a2,…,a9
c1,c2,…,c9
Indexes:
a1 (a1,LegalPersonId)
…
a9 (a9,LegalPersonId)
LPersonKnowsLPerson
LegalPersonId
KnowsId
Indexes:
Knows (LegalPersonId,KnowsId)
G1
LegalPersonId
a1,a2,…,a9
c1,c2,…,c9
Indexes:
a1 (a1,LegalPersonId)
…
a9 (a9,LegalPersonId)
G9
LegalPersonId
a1,a2,…,a9
c1,c2,…,c9
Indexes:
a1 (a1,LegalPersonId)
…
a9 (a9,LegalPersonId)
…
Figure 3. A relational schema for the Sparse model
While the benchmark application has a clear and obvious representation in
the semantic schema, creating a relational schema for this applications in not evident.
When designing the semantic schema our only concern was the readability of the
schema capturing the full semantics of the application. In designing a relational
5
schema, we have to think more about the efficiency of the queries that we expect will
be executed on this database and about the tradeoffs between this efficiency,
readability, and the size of the database.
Among the many choices for the relational schema we considered (for
detailed analysis see [13]), we have chosen two that we believe to be the best
candidates for efficient implementation. We call them the Sparse model and the
Compact model. The corresponding relational schemas for these two models are
shown in Figures 3 and 4, respectively. In the Sparse model the creation of ten
indexes per group (consisting of LegalPersonId and ai) is reasonable to facilitate
efficient access to the data in G0…G9. Even though our benchmark queries do not
access every ai in every Gi, the schema should accommodate all similar queries with
the same efficiency, so we have to create all of the indexes. However, creating so
many indexes will slow down update operations and take up too much disk space
and cache memory.
LegalPerson
Id
Type
Name
Address
SSN
Indexes:
Name (Name)
SSN (SSN)
LPersonHobby
LPId
Hobby
Indexes:
Hobby (Hobby,LPId)
Consumer
LPId
Expenditure
Store
Name
Type
ConsumerCustomerOf
LPId
StoreName
Indexes:
Store (StoreName,LPId)
Answer
Type
LPId
Id
Value
Indexes:
AnswerType (Type,Id,Value)
Comment
LPId
Type
Id
Value
LPersonKnowsLPerson
LPId
KnowsId
Indexes:
Knows (LPId,KnowsId)
Figure 4. A relational schema for the Compact model
For the Compact model we create two tables, one to keep all ai attributes,
the other to keep all ci attributes. Each table has the columns: LegalPersonId, group
number, attribute number and the value of this attribute. The primary key will consist
of LegalPersonId, group number and attribute number. For this model, we need to
create just one additional index consisting of group number, attribute number and
value.
Lower
bound
Upper
bound
Mean Variance
Name length 5 40 12 5
Address length 15 100 35 20
Comment length 5 255 30 100
Number of hobbies per consumer 0 19 0 10
Number of stores per consumer 1 19 4 10
Expenditure 1 89 20 10
Number of groups a consumer belongs to 1 10 5 4
Number of brands a consumer uses 0 9 1 1
Table 2. Normal distribution of parameters for initial database population
6
5. Initial Database Population
The number of Legal Persons in the initial population defines the scalability
[7] of the database. The results published in this paper were obtained on a database
with 200,000 corporations and 800,000 persons. 500,000 of persons and 100,000 of
corporations are consumers. The data was generated randomly according to the table
of Normal Distribution (Gaussian) parameters (Table 2). The detailed definition of
the initial data set can be found in [13]. A random set of Persons must be pre-chosen
for transaction #4. This set consists of 0.1% of all Persons in the database.
6. Database Transactions
Our benchmark consists of five transactions performed independently and
sequentially. We expect efficient DBMS-specific implementations for each particular
database.
Transaction 1:
The first transaction is simple. The task is to count the number of
consumers that belong to every one of the ten product consumer groups in the
database. The result is just one number: the cardinality of the intersection of groups
G0...G9. The formal definition of the transaction is shown in formula (1).
9
=0
=
i
i R G , (1)
where Gi are the groups of consumers that consume a particular product.
Transaction 2:
The second transaction consists of two different physical database
transactions. The first one finds all consumers who use brand #1 of product #1 as
their first preference among the brands of product #1, and at the same time use brand
#2 of product #2 as their second preference among the brands of product #2. Such
consumers form a new group in the database Gnew. This new group must be created
in the database and all found consumers should be categorized into it. The formal
definition of the transaction is shown in formula (2).
{ | [ :: ] 1} { | [ :: ] 2} 1 1 1 2 2 2 G = g ÎG g G A = g ÎG g G A = new (2)
The second physical transaction should delete the newly created group from
the database. The sum of execution times of both physical transactions is considered
the execution time of Benchmark Transaction #2.
Transaction 3:
The third transaction is a complex query counting those consumers who
regularly shop at store X and have hobby Y, excluding those who use brand #3 of
product #3 as their third preference among the brands of product #3, and at the same
time use brand #4 of product #4 as their fourth preference among the brands of
product #4. The result is just one number - a count of such consumers. The formal
definition of the transaction is shown in formula (3).
({ | [ :: ] 3} { | [ :: ] 4})
{ | [ :: _ :: ] })
({ | [ :: ] }
3 3 3 4 4 4 Î = Î =
Î = -
Î =
=
g G g G A g G g G A
c Consumer c Consumer customer of name Y
p Person p Person hobby X
R
(3)
7
Transaction 4:
The fourth transaction can be explained by the following: For each person
from a given (randomly chosen) set of 0.1% of all persons, expand the relation
"knows" to relate this person to all people he has a chain of acquaintance to. Abort
the transaction rather than commit. Print the length of the maximal chain from a
person. The formal definition of the transaction is shown in formula (4).
NewDatabase OldDatabase K = , where
. . , 1.. 1 . . , . . }
{ . . | , , , , ,... :
1 1
1 2
< > " = - < > < >
= < > Î Î $ $ Î
s knows a i n a knows a + a knows p
K s knows p s S p Person n a a a Person
i i n
n (4)
Transaction 5:
The fifth transaction counts the number of consumers in each one of the ten
product consumer groups in the database. The result is ten numbers: the cardinality
of the each of the groups G0...G9. The formal definition of the transaction is shown
in formula (5).
R = G ,i = 0..9 i i (5)
7. Execution of Transactions
The benchmark is running in single user mode. Only one client is running
the benchmark transactions at a time. Thus, we are not testing concurrency control
performance by this benchmark. A DBMS is, however, allowed to use any kind of
parallelism it can exploit in single user mode.
The benchmark transactions are executed in two modes: hot and cold. Both
"cold time" and "hot time" are collected for each transaction. Both results are
included in the final result. The Cold time is the time required to perform a
transaction immediately after starting the DBMS on a system with an empty cache.
This is normally achieved by rebooting the system before executing each transaction.
The hot time is the time required to perform a transaction immediately after
performing an identical transaction without clearing any cache or restarting the
DBMS. To collect the hot time we run the same transaction in a loop until the time
of execution stabilizes, which typically happens on the third or fourth run. Once the
execution time stabilizes we compute the arithmetic mean of the following five
transactions and this is considered the final hot execution time for this transaction.
8. Results and Analysis
We ran the benchmark for the Sem-ODB Semantic Object-Oriented
Database Engine implemented at Florida International University's High
Performance Database Research Center (HPDRC) [10]. We also ran the benchmark
for one of the leading commercial relational databases. The tests were done on a dual
processor Pentium II 400Mhz with 256MB total memory and 9Gb Seagate SCSI disk
drive under Windows NT Server 4.0 Enterprise Edition.
The version of Sem-ODB used for benchmarking did not utilize the
multiprocessor architecture of the underlying hardware. We did, however, observe
the relational database using both processors for parallel computations. We have run
tests with different memory limitations imposed on the DBMS'es. Sem-ODB was
allowed to use 16 Megabytes of memory and never actually used more than 12
Megabytes for the benchmark transactions. For the relational database, two different
8
tests were conducted. In one, the relational DBMS was allowed to use 16 Megabytes,
in the other 128 Megabytes. For some transactions, the 16 Megabyte quota was
enough for efficient execution, for other transactions the relational DBMS was
willing to use up to 128 Megabytes for better performance.
Both cold and hot times were collected for each memory limitation and for
both relational schemas (sparse and compact). Thus, eight execution times were
collected per transaction for the relational DBMS. This was done to make sure that
we have done everything we could to allow the relational database to achieve its best
performance. We observed that in some cases the sparse model was more efficient,
but in other cases the compact model was faster. In order to prevent criticism on the
choice of the model, we decided to include all the results in this paper.
We have spent a considerable amount of time inventing different
implementations and fine tuning the relational database. We tried different
combinations of indexes, keys, DBMS options, and schemas in order to achieve
greater performance. The semantic database on the other hand, did not require any
tweaking to optimize its performance. Its performance was acceptable in the very
first version.
The semantic DBMS is able to capture the exact semantics of the problem
domain and provide a single optimal way to represent it in a semantic schema. All
the appropriate indexes are built automatically and follow from the definition of the
Semantic Database. By its design, it can efficiently answer arbitrary queries without
the need for an experienced person to spend time tuning it for a particular
application.
DBMS Model Semantic Relational Sparse Relational Compact
DB Size (Mb) 406Mb 1046Mb 382Mb
RAM 16Mb 16Mb 128Mb 16Mb 128Mb
Cold times (seconds)
Transaction 1 1.61 11.52 11.55 16.11 15.94
Transaction 2 1.13 0.53 0.56 0.34 0.36
Transaction 3 0.91 5.95 5.91 5.97 5.88
Transaction 4 55.65 55.63 43.02 55.63 43.02
Transaction 5 8.62 11.66 11.53 15.31 15.17
Hot times (seconds)
Transaction 1 0.04 11.66 5.39 15.81 12.58
Transaction 2 0.07 0.28 0.28 0.09 0.09
Transaction 3 0.33 2.72 2.72 2.72 2.70
Transaction 4 0.23 35.02 2.87 35.02 2.87
Transaction 5 6.85 11.36 2.17 14.92 10.32
Table 3. The benchmark results
One might think that to create enough indexes for the execution of arbitrary
queries, the semantic database would have to unnecessarily use too much disk space.
The results however prove this not true. In our relational implementation the sparse
model contains a similar number of indexes to the semantic database but requires 2.5
times more disk space. The compact model uses about the same amount of disk
space as the semantic database, but has worse performance on most transactions and
is not universal in the sense that this model would not be possible at all if, for
example, the attributes a0..a9 were of different types.
The semantic database is outperformed by the relational on a few
transactions in
Which type of computer is used to operate large corporate systems and databases?
Typically a Mainframe, or smaller Server.
Sometimes Supercomputers are utilized, but very rarely for simple data storage and retreival.
How do companies use databases?
Companies use databases for a myriad purposes:
Accounting
Payroll
Time Sheets
Asset Management
Human Resources
Security
And many more purposes, depending on the organization's use of technology.
What are the basic concepts of a database management system?
This list could easily be expanded:
Tables
Columns
Rows
Data types
Normalization
3rd normal form
Primary key
Foreign key
Relationship
Entity Relationship (ER) diagram
Structured Query Language (SQL)
Table joins
Indexing
Clustered index
Secondary index
Views
An example of a document management system?
Im familiar with eFileCabinet and their document management system. They offer a great variety of solutions for banking, education, government, HR, legal, etc.
What is dial-up access control and call-back systems?
If users connect to the system by rote via dial-up line(e.g. from home), access should be restricted by a dial-up access control. Dial-up access controls prevent unauthorized acces from remote users that attempt to access a secured environment. These controls range from dialback controls to remove user authentication. Dial-back controls are used over dial-up telecommunication lines.
Limitationns of a manual inventory system?
http://books.google.com/books?id=NV9CVhi0CCEC&pg=PA358&lpg=PA358&dq=%22manual%2Binventory%2Bsystem%22&source=bl&ots=wjIQBMEQ12&sig=DDX-4CqGWK2VNhWoDcK9UKoG3Ls&hl=en&ei=4p3ySuO6CpjxkAWk0KGuAw&sa=X&oi=book_result&ct=result&resnum=7&ved=0CB4Q6AEwBg#v=onepage&q=%22manual%2Binventory%2Bsystem%22&f=false
How do you put SQL Server in hot backup mode?
Use the TSQL backup command to backup an active database.
Difference between contiguous and non contiguous memory allocation?
In a contiguous memory allocation there is no overhead during execution of a program. In a non contiguous memory allocation address translation is performed during execution.
What is characteristics of database management system?
Basic Characteristics of DBMS
• Represents complex relationship between data
• Controls data redundancy.
• Enforces user defined rules.
• Ensures data sharing.
• It has automatic and intelligent backup and recovery procedures.
• It has central dictionary to store information.
• Pertaining to data and its manipulation.
• It has different interfaces via which user can manipulate the data.
• Enforces data access authorization.
Is there a combat action ribbon database website?
Yes, there is one that the Marine Corps has set up for Marines only that I know of. When it first came online, you could randomly search anyone. It has recently been revised where you must know the last 4 of the SSN, Last & First name. The site is located at: https://www.manpower.usmc.mil/pls/apex/f?p=102:1:876115574353329:SEARCH_PI:NO:RP:P1_SEARCH_PI_VALUE:YES CUT & PASTE THIS ENTIRE LINK ONTO YOUR BROWSER, GOOD LUCK! pacandiausmcvet@gmail.com
What is a database management system?
Database management system is a software system that creates,expands and maintains the database.