22 March 2010

DISTRIBUTED DATABASE SYSTEM

What is DDMS
http://en.wikipedia.org/wiki/Distributed_database
A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.

Collections of data (eg. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. Replication and distribution of databases improve database performance at end-user worksites.

In short a distributed database is a collection of databases that can be stored at different computer network sites. Each database may involve different database management systems and different architectures that distribute the execution of transactions . The objective of a distributed database management system (DDBMS) is to control the management of a distributed database (DDB) in such a way that it appears to the user as a centralized database.

Providing the appearance of a centralized database system is one of the many objectives of a distributed database system. Such an image is accomplished by using the following transparencies: Location Transparency, Performance Transparency, Copy Transparency, Transaction Transparency, Transaction Transparency, Fragment Transparency, Schema Change Transparency, and Local DBMS Transparency. These eight transparencies are believed to incorporate the desired functions of a distributed database system.

Limitations
Lack of DDBMS Standards
Of C.J. Date's twelve commandments for a distributed database, four refer to open standards issues required for DDBMSs to reach their full potential. These are hardware independence, operating system independence, network independence, and database independence. Today's DDBMS products are still do not meet these four standards. Burleson described the causes of lack of open standards in the DDBMS market and the impact this has had on DDBMS growth (Burleson, 1994, pp. 72-73). Five years later, this is still a fact. DDBMS technology is relatively new, and is still suffering from vendors fighting to develop and hold on to proprietary features. Today the situation is improving, but cross-vendor connectivity is sometimes limited, especially for legacy systems that do not implement newer standards.

How to achieve vendor independence
Ref: FUTURE TRENDS IN DATA BASE SYSTEMS (paper)


Research Issues in Distributed DBMSs:
There has been a mountain of research on algorithms to support distributed data bases in the areas of query processing [SELI80], concurrency control [BERN81], crash recovery [SKEE82] and update of multiple copies [DAVI85]. In this section, I indicate two important problems which require further investigation. First, users are contemplating very large distributed data base systems consisting of hundreds or even thousands of nodes. In a large network, it becomes unreasonable to assume that each relation has a unique name. Moreover, having the query optimizer inspect all possible processing sites as candidate locations to perform a distributed join will result in unreasonably long optimizer running times. In short, the problems of ‘‘scale’’ in distributed data bases merit investigation by the research community.

Second, current techniques for updating multiple copies of objects require additional investigation. Consider the simple case of a second copy of a person’s checking account at a remote location. When that person cashes a check, both copies must be updated to ensure consistency in case of failure. Hence, at least two round trip messages must be paid to the remote location to perform this reliably. If the remote account is in Hong Kong, one can expect to wait an unreasonable amount of time for this message traffic to occur. Hence, there will be no sub-second response times to updates of a replicated object. To a user of DBMS services, this delay is unreasonable, and algorithms that address this issue efficiently must be developed. Either a lesser guarantee than consistency must be considered, or alternatively algorithms that work only on special case updates (e.g, ones guaranteed to be commutative) must be investigated. The work reported in [KUMA88] is a step in this direction.


An important aspect related to databases is integrity. Preserving the data integrity is a much more complicated issue in the heterogeneous distributed databases than in homogeneous databases. If the nodes in the distributed database are heterogeneous, there may come up troubles that could threaten the integrity of the distributed data, among which we can mention:
• inconsistencies among local integrity integration constraints.
• difficulties in specifying global integrity constraints.
• inconsistencies among local and global integrity constraints.
If the control coming from the center is powerful, priority is given to global integrity constraints. In case of a poorer central control, local integrity constraints are given priority.


To date, little work has been reported on security for data warehouses as well as the inference problem for the warehouse. This is an area that needs much research.

My Plans

I plan to examine more security issues, concurrent transaction and update-anywhere asynchronous and synchronous system for distributed databases.

It may be good to separate development environment from execution environment of a distributed heterogeneous database system considering integrity constraints. A query is verified in development environment for integrity constraints then executed in execution environment.

Conclusion:
Distributed database systems are a reality. Many organizations are now deploying distributed database systems. Therefore, We have no choice but to ensure that these systems operate in a secure environment. We believe that as more and more technologies emerge, the impact of secure distributed database systems on these technologies will be significant.

16 March 2010

Performance Engineering of Software System

Software Performance Engineering (SPE) is a systematic, quantitative approach to the cost-effective development of software systems to meet performance requirements. SPE, a software-oriented approach, focuses on architecture, design, and implementation choices.
SPE gives you the information you need to build software that meets performance requirements on time and within budget.
Performance Engineering of Software System
http://new.cmg.org/proceedings/1989/89INT152.pdf

Performance engineering
http://en.wikipedia.org/wiki/Performance_engineering