Why relational database approach is better than earlier method?

Draw class diagram of placement cell?

General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. An array DBMS is a kind of NoSQL DBMS that allows modeling, storage, and retrieval of (usually large) multi-dimensional arrays such as satellite images and climate simulation output. In a hypertext or hypermedia database, any word or a piece of text representing an object, e.g., another piece of text, an article, a picture, or a film, can be hyperlinked to that object. Hypertext databases are particularly useful for organizing large amounts of disparate information. For example, they are useful for organizing online encyclopedias, where users can conveniently jump around the text. The World Wide Web is thus a large distributed hypertext database. A knowledge base (abbreviated KB, kb or Δ) is a special kind of database for knowledge management, providing the means for the computerized collection, organization, and retrieval of knowledge. Also a collection of data representing problems with their solutions and related experiences.A mobile database can be carried on or synchronized from a mobile computing device. Operational databases store detailed data about the operations of an organization. They typically process relatively high volumes of updates using transactions. Examples include customer databases that record contact, credit, and demographic information about a business's customers, personnel databases that hold information such as salary, benefits, skills data about employees, enterprise resource planning systems that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings. A parallel database seeks to improve performance through parallelization for tasks such as loading data, building indexes and evaluating queries.The major parallel DBMS architectures which are induced by the underlying hardware architecture are: Shared memory architecture, where multiple processors share the main memory space, as well as other data storage. Shared disk architecture, where each processing unit (typically consisting of multiple processors) has its own main memory, but all units share the other storage. Shared nothing architecture, where each processing unit has its own main memory and other storage.Probabilistic databases employ fuzzy logic to draw inferences from imprecise data. Real-time databases process transactions fast enough for the result to come back and be acted on right away. A spatial database can store the data with multidimensional features. The queries on such data include location-based queries, like "Where is the closest hotel in my area?". A temporal database has built-in time aspects, for example a temporal data model and a temporal version of SQL. More specifically the temporal aspects usually include valid-time and transaction-time. A terminology-oriented database builds upon an object-oriented database, often customized for a specific field. An unstructured data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects, etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging. Connolly and Begg define database management system (DBMS) as a "software system that enables users to define, create, maintain and control access to the database". Examples of DBMS's include MySQL, PostgreSQL, MSSQL, Oracle Database, and Microsoft Access. The DBMS acronym is sometimes extended to indicate the underlying database model, with RDBMS for the relational, OODBMS for the object (oriented) and ORDBMS for the object-relational model. Other extensions can indicate some other characteristic, such as DDBMS for a distributed database management systems. The functionality provided by a DBMS can vary enormously. The core functionality is the storage, retrieval and update of data. Codd proposed the following functions and services a fully-fledged general purpose DBMS should provide: Data storage, retrieval and update User accessible catalog or data dictionary describing the metadata Support for transactions and concurrency Facilities for recovering the database should it become damaged Support for authorization of access and update of data Access support from remote locations Enforcing constraints to ensure data in the database abides by certain rulesIt is also generally to be expected the DBMS will provide a set of utilities for such purposes as may be necessary to administer the database effectively, including import, export, monitoring, defragmentation and analysis utilities. The core part of the DBMS interacting between the database and the application interface sometimes referred to as the database engine. Often DBMSs will have configuration parameters that can be statically and dynamically tuned, for example the maximum amount of main memory on a server the database can use. The trend is to minimise the amount of manual configuration, and for cases such as embedded databases the need to target zero-administration is paramount. The large major enterprise DBMSs have tended to increase in size and functionality and can have involved thousands of human years of development effort through their lifetime.Early multi-user DBMS typically only allowed for the application to reside on the same computer with access via terminals or terminal emulation software. The client–server architecture was a development where the application resided on a client desktop and the database on a server allowing the processing to be distributed. This evolved into a multitier architecture incorporating application servers and web servers with the end user interface via a web browser with the database only directly connected to the adjacent tier.A general-purpose DBMS will provide public application programming interfaces (API) and optionally a processor for database languages such as SQL to allow applications to be written to interact with the database. A special purpose DBMS may use a private API and be specifically customised and linked to a single application. For example, an email system performing many of the functions of a general-purpose DBMS such as message insertion, message deletion, attachment handling, blocklist lookup, associating messages an email address and so forth however these functions are limited to what is required to handle email. External interaction with the database will be via an application program that interfaces with the DBMS. This can range from a database tool that allows users to execute SQL queries textually or graphically, to a web site that happens to use a database to store and search information. A programmer will code interactions to the database (sometimes referred to as a datasource) via an application program interface (API) or via a database language. The particular API or language chosen will need to be supported by DBMS, possible indirectly via a pre-processor or a bridging API. Some API's aim to be database independent, ODBC being a commonly known example. Other common API's include JDBC and ADO.NET. Database languages are special-purpose languages, which allow one or more of the following tasks, sometimes distinguished as sublanguages: Data control language (DCL) – controls access to data; Data definition language (DDL) – defines data types such as creating, altering, or dropping tables and the relationships among them; Data manipulation language (DML) – performs tasks such as inserting, updating, or deleting data occurrences; Data query language (DQL) – allows searching for information and computing derived information.Database languages are specific to a particular data model. Notable examples include: SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the first commercial languages for the relational model, although it departs in some respects from the relational model as described by Codd (for example, the rows and columns of a table can be ordered). SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. The standards have been regularly enhanced since and is supported (with varying degrees of conformance) by all mainstream commercial relational DBMSs. OQL is an object model language standard (from the Object Data Management Group). It has influenced the design of some of the newer query languages like JDOQL and EJB QL. XQuery is a standard XML query language implemented by XML database systems such as MarkLogic and eXist, by relational databases with XML capability such as Oracle and DB2, and also by in-memory XML processors such as Saxon. SQL/XML combines XQuery with SQL.A database language may also incorporate features like: DBMS-specific configuration and storage engine management Computations to modify query results, like counting, summing, averaging, sorting, grouping, and cross-referencing Constraint enforcement (e.g. in an automotive database, only allowing one engine type per car) Application programming interface version of the query language, for programmer convenience Database storage is the container of the physical materialization of a database. It comprises the internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data", and internal data structures) to reconstruct the conceptual level and external level from the internal level when needed. Putting data into permanent storage is generally the responsibility of the database engine a.k.a. "storage engine". Though typically accessed by a DBMS through the underlying operating system (and often using the operating systems' file systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database). Some DBMSs support specifying which character encoding was used to store data, so multiple encodings can be used in the same database. Various low-level database storage structures are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as indexing may be used to improve performance. Conventional storage is row-oriented, but there are also column-oriented and correlation databases. Often storage redundancy is employed to increase performance. A common example is storing materialized views, which consist of frequently needed external views or query results. Storing such views saves the expensive computing of them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy. Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases, the entire database is replicated. Database security deals with all various aspects of protecting the database content, its owners, and its users. It ranges from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized entities (e.g., a person or a computer program). Database access control deals with controlling who (a person or a certain computer program) is allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or using specific access paths to the former (e.g., using specific indexes or other data structures to access information). Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces. This may be managed directly on an individual basis, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called "subschemas". For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data. If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. Data security in general deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see physical security), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see data encryption). Change and access logging records who accessed which attributes, what was changed, and when it was changed. Logging services allow for a forensic database audit later by keeping a record of access occurrences and changes. Sometimes application-level code is used to record changes rather than leaving this to the database. Monitoring can be set up to attempt to detect security breaches. Database transactions can be used to introduce some level of fault tolerance and data integrity after recovery from a crash. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). The acronym ACID describes some ideal properties of a database transaction: atomicity, consistency, isolation, and durability. A database built with one DBMS is not portable to another DBMS (i.e., the other DBMS cannot run it). However, in some situations, it is desirable to migrate a database from one DBMS to another. The reasons are primarily economical (different DBMSs may have different total costs of ownership or TCOs), functional, and operational (different DBMSs may have different capabilities). The migration involves the database's transformation from one DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all related application programs) intact. Thus, the database's conceptual and external architectural levels should be maintained in the transformation. It may be desired that also some aspects of the architecture internal level are maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself, which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration between specific DBMSs. Typically, a DBMS vendor provides tools to help importing databases from other popular DBMSs. After designing a database for an application, the next stage is building the database. Typically, an appropriate general-purpose DBMS can be selected to be used for this purpose. A DBMS provides the needed user interfaces to be used by database administrators to define the needed application's data structures within the DBMS's respective data model. Other user interfaces are used to select needed DBMS parameters (like security related, storage allocation parameters, etc.). When the database is ready (all its data structures and other needed components are defined), it is typically populated with initial application's data (database initialization, which is typically a distinct project; in many cases using specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases, the database becomes operational while empty of application data, and data are accumulated during its operation. After the database is created, initialised and populated it needs to be maintained. Various database parameters may need changing and the database may need to be tuned (tuning) for better performance; application's data structures may be changed or added, new related application programs may be written to add to the application's functionality, etc. Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this, a backup operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this effectively). When it is decided by a database administrator to bring the database back to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files are used to restore that state. Static analysis techniques for software verification can be applied also in the scenario of query languages. In particular, the *Abstract interpretation framework has been extended to the field of query languages for relational databases as a way to support sound approximation techniques. The semantics of query languages can be tuned according to suitable abstractions of the concrete domain of data. The abstraction of relational database system has many interesting applications, in particular, for security purposes, such as fine grained access control, watermarking, etc. Other DBMS features might include: Database logs – This helps in keeping a history of the executed functions. Graphics component for producing graphs and charts, especially in a data warehouse system. Query optimizer – Performs query optimization on every query to choose an efficient query plan (a partial order (tree) of operations) to be executed to compute the query result. May be specific to a particular storage engine. Tools or hooks for database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.Increasingly, there are calls for a single system that incorporates all of these core functionalities into the same build, test, and deployment framework for database management and source control. Borrowing from other developments in the software industry, some market such offerings as "DevOps for database". The first task of a database designer is to produce a conceptual data model that reflects the structure of the information to be held in the database. A common approach to this is to develop an entity-relationship model, often with the aid of drawing tools. Another popular approach is the Unified Modeling Language. A successful data model will accurately reflect the possible state of the external world being modeled: for example, if people can have more than one phone number, it will allow this information to be captured. Designing a good conceptual data model requires a good understanding of the application domain; it typically involves asking deep questions about the things of interest to an organization, like "can a customer also be a supplier?", or "if a product is sold with two different forms of packaging, are those the same product or different products?", or "if a plane flies from New York to Dubai via Frankfurt, is that one flight or two (or maybe even three)?". The answers to these questions establish definitions of the terminology used for entities (customers, products, flights, flight segments) and their relationships and attributes. Producing the conceptual data model sometimes involves input from business processes, or the analysis of workflow in the organization. This can help to establish what information is needed in the database, and what can be left out. For example, it can help when deciding whether the database needs to hold historic data as well as current data. Having produced a conceptual data model that users are happy with, the next stage is to translate this into a schema that implements the relevant data structures within the database. This process is often called logical database design, and the output is a logical data model expressed in the form of a schema. Whereas the conceptual data model is (in theory at least) independent of the choice of database technology, the logical data model will be expressed in terms of a particular database model supported by the chosen DBMS. (The terms data model and database model are often used interchangeably, but in this article we use data model for the design of a specific database, and database model for the modeling notation used to express that design). The most popular database model for general-purpose databases is the relational model, or more precisely, the relational model as represented by the SQL language. The process of creating a logical database design using this model uses a methodical approach known as normalization. The goal of normalization is to ensure that each elementary "fact" is only recorded in one place, so that insertions, updates, and deletions automatically maintain consistency. The final stage of database design is to make the decisions that affect performance, scalability, recovery, security, and the like, which depend on the particular DBMS. This is often called physical database design, and the output is the physical data model. A key goal during this stage is data independence, meaning that the decisions made for performance optimization purposes should be invisible to end-users and applications. There are two types of data independence: Physical data independence and logical data independence. Physical design is driven mainly by performance requirements, and requires a good knowledge of the expected workload and access patterns, and a deep understanding of the features offered by the chosen DBMS. Another aspect of physical database design is security. It involves both defining access control to database objects as well as defining security levels and methods for the data itself. A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized, and manipulated. The most popular example of a database model is the relational model (or the SQL approximation of relational), which uses a table-based format. Common logical data models for databases include: Navigational databases Hierarchical database model Network model Graph database Relational model Entity–relationship model Enhanced entity–relationship model Object model Document model Entity–attribute–value model Star schemaAn object-relational database combines the two related structures. Physical data models include: Inverted index Flat fileOther models include: Associative model Multidimensional model Array model Multivalue modelSpecialized models are optimized for particular types of data: XML database Semantic model Content store Event store Time series model A database management system provides three views of the database data: The external level defines how each group of end-users sees the organization of data in the database. A single database can have any number of views at the external level. The conceptual level unifies the various external views into a compatible global view. It provides the synthesis of all the external views. It is out of the scope of the various database end-users, and is rather of interest to database application developers and database administrators. The internal level (or physical level) is the internal organization of data inside a DBMS. It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the data, using storage structures such as indexes to enhance performance. Occasionally it stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in an attempt to optimize overall performance across all activities.While there is typically only one conceptual (or logical) and physical (or internal) view of the data, there can be any number of different external views. This allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but does not need details about employees that are the interest of the human resources department. Thus different departments need different views of the company's database. The three-level database architecture relates to the concept of data independence which was one of the major initial driving forces of the relational model. The idea is that changes made at a certain level do not affect the view at a higher level. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which reduces the impact of making physical changes to improve performance. The conceptual view provides a level of indirection between internal and external. On one hand it provides a common view of the database, independent of different external view structures, and on the other hand it abstracts away details of how the data are stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and depends on its implementation, requires a different level of detail and uses its own types of data structure types. Separating the external, conceptual and internal levels was a major feature of the relational database model implementations that dominate 21st century databases. Database technology has been an active research topic since the 1960s, both in academia and in the research and development groups of companies (for example IBM Research). Research activity includes theory and development of prototypes. Notable research topics have included models, the atomic transaction concept, and related concurrency control techniques, query languages and query optimization methods, RAID, and more. The database research area has several dedicated academic journals (for example, ACM Transactions on Database Systems-TODS, Data and Knowledge Engineering-DKE) and annual conferences (e.g., ACM SIGMOD, ACM PODS, VLDB, IEEE ICDE).

Advanced technics in rdbms?

1. Introduction One of the goals of benchmarks is to compare several database management systems on the same class of applications in order to show which one is more efficient for this particular class. Benchmarks used in the industry today are always oriented towards uncovering a defined limited set of features that a database should implement efficiently in order to successfully support the class of applications the benchmark was created for. One of the first industry benchmarks, the TP1 benchmark [1], developed by IBM purported to measure the performance of a system handling ATM transactions in a batch mode. Both TPC-C and TPC-D have gained widespread acceptance as the industry's premier benchmarks in their respective fields (OLTP and Decision Support). TPC-E failed to garner enough support because being an enterprise benchmark, it was only relevant to a relatively small number of companies competing in that space [11]. The Simple Database Operations Benchmark [9], Altair Complex-Object Benchmark [14], OO1 [6] and OO7 [3, 2] Benchmarks were created to provide useful insight for end-users evaluating the performance of Object Oriented Database Management Systems (OODBMS). Most object-relational DBMS are build on top of relational database by adding the following four key features [12]: inheritance, complex object support, an extensible type system, and triggers. The appearance of such databases necessitated the creation of the BUCKY benchmark [4], which tests most of these specific 2 features. It emphasizes those features of object-relational databases that are not covered by pure relational databases. 1.1 Semantic Model The semantic database models in general, and the Semantic Binary Model SBM [8, 10] in particular, represent information as a collection of elementary facts categorizing objects or establishing relationships of various kinds between pairs of objects. The facts in the database are of three types: facts stating that an object belongs to a category; facts stating that there is a relationship between objects; and facts relating objects to values. The relationships can be multivalued. The objects are categorized into classes according to their common properties. These classes, called categories, need not be disjoint; that is, one object may belong to several of them. Further, an arbitrary structure of subcategories and supercategories can be defined. 1.2 Why a Different Benchmark Unfortunately, most benchmarks do not provide general problem statement; instead they enforce a specific implementation that can not be efficiently translated to a different data model. For example, TPC benchmarks compares the efficiency of implementation of the same solution for different relational DBMS'es rather then comparing how efficiently DBMS'es are able to solve a given problem. The benchmark proposed does not enforce specific implementation thus allow native, efficient implementation for any DBMS - semantic, relational or any other, which makes it highly portable [7]. Majority of existing benchmarks is designed to evaluate features native to relational DBMS'es and none of them are suitable to evaluate performance of the features characteristic of semantic database applications. The benchmark proposed evaluates the ability of DBMS to efficiently support such features including sparse data, complex inheritances, many-to-many relations and variable field length. The rest of the paper is organized as follows: The benchmark application and the requirements for execution of transactions are defined in general terms in sections 2, 5, 6 and 7. Sections 3 and 4 describe our implementations of this application for semantic and relational database respectively. Section 8 presents the results we obtained for a semantic and a relational database and analyses the results. Conclusions are provided in section 9. 2. The Problem Stated It has recently become a practice for many consumer organizations to conduct consumer surveys. A large number of survey forms are mailed out to people and small companies with questions about shopping preferences in order to determine consumer patterns. Some consumers will fill out the survey and mail it back. The results of the survey should be stored in a database. Our survey collects information about several types of legal persons. It is mailed to physical persons and corporations and we keep information about all the legal persons the survey was mailed to. Those who answer the survey and mail it back are considered consumers and categorized into the corresponding category. We also try to collect referrals from people to their friends and mail our survey to the referred people. We remember the information about referrals so that we can explore correlation between groups of people who know each other. A person may refer another person without filling out the survey and thus without becoming a consumer as we define it. 3 Among the other questions, we will ask a consumer which stores does he prefer to shop at. We will thus keep a catalog of stores with their names and types (pharmacy, mass merchandiser, etc.) A consumer may list several stores he usually shops at, so that many-to-many relationship is established between consumers and stores. We will also ask a consumer to tell us his approximate annual expenditures in thousands of dollars and a list of his hobbies. We will collect information about ten types of products. Consumers will fall into different categories based on their answers about which products they regularly consume, for example "coffee drinker." We will mail them a survey where they will tell us if they regularly consume some product and answer questions about different brands they consume within each product group. Each product group in our survey has 10 brands. A consumer will tell us, which brands of the product he uses and show his preference of different brands in terms of a satisfaction rating of 1 to 4. Some consumers may indicate that they use this type of product, but none of the brands listed by us. This option should also be accommodated. We will also let a consumer write any comment about any brand he is using (from the ten brands we are asking about) if he wishes to do so. In practice, consumers seldom write any comments, so this information will be sparse. However, if they do write a comment it can be of any length and will probably be rather long. 3. A Semantic Schema for the Benchmark Database The benchmark deals with persons and corporations. Person and corporation are represented by two different categories with the appropriate personal attributes. They both inherit from a category Legal Person. A category Consumer, will inherit from Legal Person, since both persons and corporations may be consumers and there is no need to distinguish between them as consumers. Since the same object can not be both a person and a corporation, categories Person and Corporation will be set up as disjoint. A legal person who answered our survey becomes a consumer. The category Consumer will then be used to capture information about such legal persons as consumers. The attribute "expenditure", is thus an attribute of the category Consumer. The category Consumer is not disjoint with either Person or Corporation. In fact, every consumer must be either a person or a corporation. All of this is supported by a semantic database on the level of schema enforced integrity constraints. The relationship between categories is shown in Figure 1. Consumers can further be categorized into inherited categories G0, G1, ... G9, which represent consumers of a particular product. Thus, those consumers who are soap users will be categorized into the category "Soap Users" (one of Gi). The same consumer may be categorized into several Gis if he is a user of several types of products. In our initial database population about 50% of all objects in the database will be categorized into each of the following categories: Legal Person, Person, Consumer and several of the G0, G1, ... G9 categories. Some will be categorized into Figure 1. Relationship between subcategories Consumer Legal Person Person Corporation 4 Legal Person, Corporation, Consumer and several of the G0, G1, ... G9. Some will be just Person and Legal Person or Corporation and Legal Person. In addition to this, there will be some objects in the category Store. Consumers will be related to the stores they usually shop at. This relation is many-tomany, which means that a consumer may be related to several stores at once and a single store will, with high probability, be related to many consumers. NAME: STRING TOTAL ADDRESS: STRING TOTAL A0, A1 … A9: INTEGER C0, C1 … C9: STRING EXPENDITURE: INTEGER A0, A1…A9: INTEGER C0, C1 … A9: STRING A0, A1 … A9: INTEGER C0, C1 … C9: STRING ->KNOWS (M:M)-> NAME: STRING SSN: INTEGER HOBBY: STRING M:M ADDRESS: STRING NAME: STRING TOTAL TYPE: STRING Customer-of (m:m) … PERSON LEGAL PERSON CORPORATION CONSUMER STORE G0 G1 G9 Figure 2. Semantic Schema for the Semantic Benchmark A special relation "knows" is going from category Person into itself. The semantics of this is that a person may know and refer to us several other persons and will be related to each one of them via this relation. Since a person typically refers (and may be referred by) several other persons, this relation will also be many-tomany. A semantic schema of the database appears in Figure 2. 4. Relational Schemas for the Benchmark Database LegalPerson Id Type Name Address SSN Indexes: Name (Name) SSN (SSN) LPersonHobby LegalPersonId Hobby Indexes: Hobby (Hobby,LegalPersonId) Consumer LegalPersonId Expenditure Store Name Type ConsumerCustomerOf LegalPersonId StoreName Indexes: Store (StoreName,LegalPersonId) G0 LegalPersonId a1,a2,…,a9 c1,c2,…,c9 Indexes: a1 (a1,LegalPersonId) … a9 (a9,LegalPersonId) LPersonKnowsLPerson LegalPersonId KnowsId Indexes: Knows (LegalPersonId,KnowsId) G1 LegalPersonId a1,a2,…,a9 c1,c2,…,c9 Indexes: a1 (a1,LegalPersonId) … a9 (a9,LegalPersonId) G9 LegalPersonId a1,a2,…,a9 c1,c2,…,c9 Indexes: a1 (a1,LegalPersonId) … a9 (a9,LegalPersonId) … Figure 3. A relational schema for the Sparse model While the benchmark application has a clear and obvious representation in the semantic schema, creating a relational schema for this applications in not evident. When designing the semantic schema our only concern was the readability of the schema capturing the full semantics of the application. In designing a relational 5 schema, we have to think more about the efficiency of the queries that we expect will be executed on this database and about the tradeoffs between this efficiency, readability, and the size of the database. Among the many choices for the relational schema we considered (for detailed analysis see [13]), we have chosen two that we believe to be the best candidates for efficient implementation. We call them the Sparse model and the Compact model. The corresponding relational schemas for these two models are shown in Figures 3 and 4, respectively. In the Sparse model the creation of ten indexes per group (consisting of LegalPersonId and ai) is reasonable to facilitate efficient access to the data in G0…G9. Even though our benchmark queries do not access every ai in every Gi, the schema should accommodate all similar queries with the same efficiency, so we have to create all of the indexes. However, creating so many indexes will slow down update operations and take up too much disk space and cache memory. LegalPerson Id Type Name Address SSN Indexes: Name (Name) SSN (SSN) LPersonHobby LPId Hobby Indexes: Hobby (Hobby,LPId) Consumer LPId Expenditure Store Name Type ConsumerCustomerOf LPId StoreName Indexes: Store (StoreName,LPId) Answer Type LPId Id Value Indexes: AnswerType (Type,Id,Value) Comment LPId Type Id Value LPersonKnowsLPerson LPId KnowsId Indexes: Knows (LPId,KnowsId) Figure 4. A relational schema for the Compact model For the Compact model we create two tables, one to keep all ai attributes, the other to keep all ci attributes. Each table has the columns: LegalPersonId, group number, attribute number and the value of this attribute. The primary key will consist of LegalPersonId, group number and attribute number. For this model, we need to create just one additional index consisting of group number, attribute number and value. Lower bound Upper bound Mean Variance Name length 5 40 12 5 Address length 15 100 35 20 Comment length 5 255 30 100 Number of hobbies per consumer 0 19 0 10 Number of stores per consumer 1 19 4 10 Expenditure 1 89 20 10 Number of groups a consumer belongs to 1 10 5 4 Number of brands a consumer uses 0 9 1 1 Table 2. Normal distribution of parameters for initial database population 6 5. Initial Database Population The number of Legal Persons in the initial population defines the scalability [7] of the database. The results published in this paper were obtained on a database with 200,000 corporations and 800,000 persons. 500,000 of persons and 100,000 of corporations are consumers. The data was generated randomly according to the table of Normal Distribution (Gaussian) parameters (Table 2). The detailed definition of the initial data set can be found in [13]. A random set of Persons must be pre-chosen for transaction #4. This set consists of 0.1% of all Persons in the database. 6. Database Transactions Our benchmark consists of five transactions performed independently and sequentially. We expect efficient DBMS-specific implementations for each particular database. Transaction 1: The first transaction is simple. The task is to count the number of consumers that belong to every one of the ten product consumer groups in the database. The result is just one number: the cardinality of the intersection of groups G0...G9. The formal definition of the transaction is shown in formula (1). 􀀀9 =0 = i i R G , (1) where Gi are the groups of consumers that consume a particular product. Transaction 2: The second transaction consists of two different physical database transactions. The first one finds all consumers who use brand #1 of product #1 as their first preference among the brands of product #1, and at the same time use brand #2 of product #2 as their second preference among the brands of product #2. Such consumers form a new group in the database Gnew. This new group must be created in the database and all found consumers should be categorized into it. The formal definition of the transaction is shown in formula (2). { | [ :: ] 1} { | [ :: ] 2} 1 1 1 2 2 2 G = g ÎG g G A = g ÎG g G A = new (2) The second physical transaction should delete the newly created group from the database. The sum of execution times of both physical transactions is considered the execution time of Benchmark Transaction #2. Transaction 3: The third transaction is a complex query counting those consumers who regularly shop at store X and have hobby Y, excluding those who use brand #3 of product #3 as their third preference among the brands of product #3, and at the same time use brand #4 of product #4 as their fourth preference among the brands of product #4. The result is just one number - a count of such consumers. The formal definition of the transaction is shown in formula (3). ({ | [ :: ] 3} { | [ :: ] 4}) { | [ :: _ :: ] }) ({ | [ :: ] } 3 3 3 4 4 4 Î = Î = Î = - Î = = g G g G A g G g G A c Consumer c Consumer customer of name Y p Person p Person hobby X R (3) 7 Transaction 4: The fourth transaction can be explained by the following: For each person from a given (randomly chosen) set of 0.1% of all persons, expand the relation "knows" to relate this person to all people he has a chain of acquaintance to. Abort the transaction rather than commit. Print the length of the maximal chain from a person. The formal definition of the transaction is shown in formula (4). NewDatabase OldDatabase K = , where . . , 1.. 1 . . , . . } { . . | , , , , ,... : 1 1 1 2 < > " = - < > < > = < > Î Î $ $ Î s knows a i n a knows a + a knows p K s knows p s S p Person n a a a Person i i n n (4) Transaction 5: The fifth transaction counts the number of consumers in each one of the ten product consumer groups in the database. The result is ten numbers: the cardinality of the each of the groups G0...G9. The formal definition of the transaction is shown in formula (5). R = G ,i = 0..9 i i (5) 7. Execution of Transactions The benchmark is running in single user mode. Only one client is running the benchmark transactions at a time. Thus, we are not testing concurrency control performance by this benchmark. A DBMS is, however, allowed to use any kind of parallelism it can exploit in single user mode. The benchmark transactions are executed in two modes: hot and cold. Both "cold time" and "hot time" are collected for each transaction. Both results are included in the final result. The Cold time is the time required to perform a transaction immediately after starting the DBMS on a system with an empty cache. This is normally achieved by rebooting the system before executing each transaction. The hot time is the time required to perform a transaction immediately after performing an identical transaction without clearing any cache or restarting the DBMS. To collect the hot time we run the same transaction in a loop until the time of execution stabilizes, which typically happens on the third or fourth run. Once the execution time stabilizes we compute the arithmetic mean of the following five transactions and this is considered the final hot execution time for this transaction. 8. Results and Analysis We ran the benchmark for the Sem-ODB Semantic Object-Oriented Database Engine implemented at Florida International University's High Performance Database Research Center (HPDRC) [10]. We also ran the benchmark for one of the leading commercial relational databases. The tests were done on a dual processor Pentium II 400Mhz with 256MB total memory and 9Gb Seagate SCSI disk drive under Windows NT Server 4.0 Enterprise Edition. The version of Sem-ODB used for benchmarking did not utilize the multiprocessor architecture of the underlying hardware. We did, however, observe the relational database using both processors for parallel computations. We have run tests with different memory limitations imposed on the DBMS'es. Sem-ODB was allowed to use 16 Megabytes of memory and never actually used more than 12 Megabytes for the benchmark transactions. For the relational database, two different 8 tests were conducted. In one, the relational DBMS was allowed to use 16 Megabytes, in the other 128 Megabytes. For some transactions, the 16 Megabyte quota was enough for efficient execution, for other transactions the relational DBMS was willing to use up to 128 Megabytes for better performance. Both cold and hot times were collected for each memory limitation and for both relational schemas (sparse and compact). Thus, eight execution times were collected per transaction for the relational DBMS. This was done to make sure that we have done everything we could to allow the relational database to achieve its best performance. We observed that in some cases the sparse model was more efficient, but in other cases the compact model was faster. In order to prevent criticism on the choice of the model, we decided to include all the results in this paper. We have spent a considerable amount of time inventing different implementations and fine tuning the relational database. We tried different combinations of indexes, keys, DBMS options, and schemas in order to achieve greater performance. The semantic database on the other hand, did not require any tweaking to optimize its performance. Its performance was acceptable in the very first version. The semantic DBMS is able to capture the exact semantics of the problem domain and provide a single optimal way to represent it in a semantic schema. All the appropriate indexes are built automatically and follow from the definition of the Semantic Database. By its design, it can efficiently answer arbitrary queries without the need for an experienced person to spend time tuning it for a particular application. DBMS Model Semantic Relational Sparse Relational Compact DB Size (Mb) 406Mb 1046Mb 382Mb RAM 16Mb 16Mb 128Mb 16Mb 128Mb Cold times (seconds) Transaction 1 1.61 11.52 11.55 16.11 15.94 Transaction 2 1.13 0.53 0.56 0.34 0.36 Transaction 3 0.91 5.95 5.91 5.97 5.88 Transaction 4 55.65 55.63 43.02 55.63 43.02 Transaction 5 8.62 11.66 11.53 15.31 15.17 Hot times (seconds) Transaction 1 0.04 11.66 5.39 15.81 12.58 Transaction 2 0.07 0.28 0.28 0.09 0.09 Transaction 3 0.33 2.72 2.72 2.72 2.70 Transaction 4 0.23 35.02 2.87 35.02 2.87 Transaction 5 6.85 11.36 2.17 14.92 10.32 Table 3. The benchmark results One might think that to create enough indexes for the execution of arbitrary queries, the semantic database would have to unnecessarily use too much disk space. The results however prove this not true. In our relational implementation the sparse model contains a similar number of indexes to the semantic database but requires 2.5 times more disk space. The compact model uses about the same amount of disk space as the semantic database, but has worse performance on most transactions and is not universal in the sense that this model would not be possible at all if, for example, the attributes a0..a9 were of different types. The semantic database is outperformed by the relational on a few transactions in

Why relational database approach is better than earlier method?

Add your answer:

Draw class diagram of placement cell?

How does Git work?

Advanced technics in rdbms?

What is the purpose of using joins in database is normalization the only purpose of it?

Which Is Better Hello or Hi?

Is relational database better than manual database?

What are the benefits of relational database?

When would you want to use a relational database?

Why is it better to use a relational database instead of a flat file database?

Why do you learn database model?

What is the difference between object-oriented database and relational database?

What problems associated with storing data in a list is avoided by storing data in a relational database?

What are the typical benefits of relational databases?

Compare contrast relational databases object-relational databases and object-oriented databases Site an example or scenario where each type is best used?

Which database is better when we are going to have millions of records?

Can you group a field twice in excel For example I have the field of states but I want to make multiple 'Region' groups as there are three sets of regions each including all 50 statesin my company?

What is the explanation for the implication of database approach?

Resources

Top Categories

Product

Company