Identification of entities and connections between them. Database subject area and its models Determining whether an attribute is optional

3. Data Model Components

Entity, entity definition, sources of information about entities

A data model is a conceptual description of a subject area - the most abstract level of database design. The data model consists of entities, attributes, domains, and relationships. Next - about each of the elements in detail.

3.1 Entities

An entity is something about which information needs to be stored in a database.

When designing databases, it is enough to describe the current situation - and most nouns and some verbs will be candidates for entities. For example: "Customers buy goods. Employees sell goods to customers. Suppliers deliver goods" - customers, goods, employees and suppliers are entities. The verbs “buy” and “sell” are also entities (although they can be one entity, different from the point of view of the buyer and the seller).

When designing a database, the main source of information about entities is a conversation with the customer in order to understand his business processes. In addition, standard documents used in business processes are analyzed: forms, reports, instructions, etc. After receiving such a list, it is necessary to check it for completeness and coherence, and also to identify duplicates - identical entities that are called by different words, and entities that are actually different, but are described by the same term.

Entities can model concrete concepts (customers, products, calls) and abstract ones (an agent is responsible for a client, a student is enrolled in a course).

The concept of the ER model. The concept of entity. Attributes. Types of Attributes

1. What problems may a developer encounter when designing a database?

When designing a database and developing a software product, the most important problem is the problem of interaction between the developer and the customer. The developer’s task is to most accurately recreate the customer’s wishes when developing a database management software product. The main problem that the developer needs to solve is the correct construction of the database, or rather the database schema (structure).

In addition, the developer additionally encounters other difficulties, which include:

  • search for efficient algorithms;
  • selection of appropriate data structures;
  • debugging and testing complex code;
  • design and usability of the application interface.

On development stage software, who manages the database, the developer must learn the customer's requirements in detail. The database should be designed so that it is understandable, most accurately reflects the problem being solved, and does not contain redundant data.

To facilitate the process of developing (designing) a database, so-called semantic models data. For different types The most famous database is the ER data model (Entity-Relationship model).

2. What is the ER model (Entity-relationship model)? Why do you need to develop an ER model?

The ER model (Entity-relationship model or Entity-relationship diagram) is a semantic data model that is intended to simplify the database design process. All types of databases can be generated from the ER model: relational, hierarchical, network, object. The ER model is based on the concepts of “entity”, “relationship” and “attribute”.

For large databases, building an ER model allows you to avoid design errors that are extremely difficult to correct, especially if the database is already in operation or at the testing stage. Errors in the development of the database structure can lead to reworking the software code that manages this database. As a result, time, money and human resources will be used inefficiently.

An ER model is a representation of a database in the form of visual graphical diagrams. The ER model visualizes a process that defines a certain subject area. An entity-relationship diagram is a diagram that graphically represents entities, attributes, and relationships.

The ER model is only a conceptual level of modeling. The ER model does not contain implementation details. For the same ER model, its implementation details may differ.

3. What is an entity in a database? Examples

An entity in a database is any object in a database that can be identified based on the essence of the subject area for which this database is being developed. The database designer must be able to correctly define entities.

Example 1. The following entities can be distinguished in the bookstore database:

  • book;
  • provider;
  • placement in the store.

Example 2. In the educational process accounting database of a certain educational institution, the following entities can be distinguished:

  • students (students);
  • teachers;
  • groups;
  • disciplines that are studied.
4. What types of entity types are there? Designation of entity types in the ER model

In the “entity”-“relationship” model, two types of entity types are distinguished:

  • weak type. This type of entity is dependent on a strong entity;
  • strong type. This is an independent type of entity that does not depend on anyone.

Figure 1 shows the designations of weak and strong entity types in the ER model.

Rice. 1. Designation of strong and weak types of entities

5. What are attributes for? Types of attributes. Designation of attributes on the ER model

Each entity type has a specific set of attributes. Attributes are intended to describe a specific entity.

The following types of attributes are distinguished:

  • simple attributes. These are attributes that can be part of composite attributes. These attributes consist of a single component. For example, simple attributes include: the code of a book in a library or a student’s course of study in an educational institution;
  • composite attributes. These are attributes that consist of several simple attributes. For example, the residential address may contain the name of the country, locality, street, house number;
  • unambiguous attributes. These are attributes that contain only one single value for some entity. For example, the attribute “Grade book number” for the entity type “Student” is unambiguous, since a student can have only one test book number (one value);
  • polysemantic attributes. These are attributes that can contain multiple values. For example, the multi-valued attribute “Phone number” for the “Student” entity, since a student can have several phone numbers (home, mobile, etc.);
  • arbitrary attributes. These are attributes whose value is formed based on the values ​​of other attributes. For example, a student's current course of study can be calculated based on the difference between the current year of study and the year the student entered the educational institution(if the student had no problems with studies and studied the discipline “Organization of Databases and Knowledge”) well.

In an ER diagram, attributes are designated as shown in Figure 2. As can be seen from the figure, any attribute is designated as an ellipse with a name inside the ellipse. If the attribute is the primary key, then its name is underlined.

Figure 2. Representation of attributes in ER model diagrams

6. How are entity types and attributes of the ER model implemented in real databases and the programs they manage?

When developing database management programs, entity types and their attributes can be represented in different ways, while adhering to several approaches:

  • choose a well-known technology as a data source (for example, Microsoft SQL Server, Oracle Database, Microsoft Access, Microsoft ODBC Data Source, etc.), which has already been researched, tested, standardized and has a huge set of database management tools;
  • develop your own database format and implement methods for processing it, and implement interaction with known data sources in the form of special commands like Import/Export. In this case, you will have to personally program all the routine work of maintaining and ensuring reliable operation of the database;
  • implement a combination of the two above approaches. Modern software development tools have a powerful set of libraries for processing complex sets and visualizing data in them (collections, arrays, visualization components, etc.).

If the database is implemented in well-known relational DBMS (for example, Microsoft Access, Microsoft SQL Server, etc.), then entity types are represented by tables. Attributes from the ER model correspond to the fields of the table. One entry in a database table represents one instance of an entity.

Each type of attribute is implemented as follows:

  • simple attribute or a single-valued attribute can be represented by an accessible set of basic types that are found in any programming language. For example, integer attributes are represented by type int, integer, uint, etc.; attributes containing a fractional part can be represented as float, double; string attributes of type string, etc.;
  • compound attribute is an object that includes several nested simple attributes. For example, in the Microsoft Access DBMS, a composite attribute of a table can be formed based on a set of simple types (fields). In programming languages, the union of fields is implemented by structures or classes;
  • multivalued attribute can be implemented by an array or collection of simple or compound attributes;
  • arbitrary attribute implemented by an additional field, which is calculated when accessing the table. Such a field is called a calculated field and is formed based on other fields of the table;
  • attribute that is the primary key may be an integer, string, or other ordinal type. In this case, the value of each table cell that corresponds to the primary key is unique. Most often, the primary key is an integer type (int, integer).

If the database is implemented in a unique format, then entity types are most conveniently represented as classes or structures. Entity attributes are implemented as fields (internal data) of the class. Class methods implement the necessary processing of class fields (attributes). Interaction (communication) between classes is implemented using specially designed interfaces using well-known design patterns.

7. Example of a fragment of an ER model for the “Student” entity type

The above example demonstrates a fragment of the ER model for the “Student” entity type.

Figure 3. Fragment of the ER model for the “Student” entity type

The above figure declares the following attributes, which in the DBMS (program) can have the following types:

  • Primary key attribute is a unique integer value that is generated automatically. In the DBMS this is a counter field;
  • The entry year attribute is a simple attribute that can be implemented as an integer value (int, integer);
  • the Phone number attribute is a multivalued attribute that can be implemented as an array or collection, etc.;
  • attribute Record book number– a simple attribute that can be implemented as a string of characters, since the record book number can contain letters in addition to numbers;
  • attribute Country, City, Street, House number are attributes that form the composite attribute Address. All these attributes can be of string (text) type (string, Text);
  • attribute Last Name, First Name, Patronymic – these are simple attributes that are part of the composite attribute Student Name. All these attributes can be of string (text) type (string, Text);
  • the Birthday attribute is a simple attribute of the Date type (DateTime);
  • attribute Student age– a calculated field, which is defined as the difference between the current (system) date and the value of the Birthday attribute.

Basic concepts of the “entity-relationship” database model (ER-model): entities, connections between them and their attributes (properties).

Essence– any concrete or abstract object in the subject area under consideration. Entities are the basic types of information that are stored in a database (in a relational database, each entity is assigned a table). Entities may include: students, clients, departments, etc. An entity instance and an entity type are different concepts. The concept of entity type refers to a set of homogeneous individuals, objects or events acting as a whole (for example, a student, a client, etc.). An entity instance refers, for example, to a specific person in a set. The entity type can be a student, and the instance can be Petrov, Sidorov, etc.

Attribute is a property of an entity in the subject area. Its name must be unique for a specific entity type. For example, for the student entity the following attributes can be used: last name, first name, patronymic, date and place of birth, passport details, etc. In a relational database, attributes are stored in table fields.

Connection– the relationship between entities in the subject area. Relationships are connections between parts of a database (in a relational database, they are connections between table records).

Entities is data that is classified by type, and relationships show how these data types relate to one another. If we describe a certain subject area in terms of an entity-relationship, we obtain an entity-relationship model for this database.

The arrow is a symbol of a relationship: one - to - many.

The main advantages of ER models: * clarity; * models allow you to design databases with a large number of objects and attributes;

The main elements of ER models: * objects (entities); * attributes of objects; * connections between objects.

The relationship between entities is characterized by: * connection type (1:1, 1:N, N:M); * class of belonging. A class can be required or optional. If each entity instance is involved in a relationship, then the membership class is mandatory, otherwise it is optional.


The concept of data normalization. Functional dependence.

Normalization is a formal method of analyzing relationships based on their primary key and existing relationships. Its task is to replace one schema (or set of relations) of a database with another schema in which the relations have a simpler and more regular structure.

Functional dependence. Let X and Y be two attributes of some relation. Y is said to be functionally dependent on X if at any given time each value of X corresponds to no more than one value of the attribute Y.

Functional dependence is denoted as X -> Y.

Attitude student S (Ns, Fio, Ngr, Addr, Tel). Each of the attributes Fio, Ngr, Addr, Tel functionally depends on the Ns attribute.

So, in a normalized relation, all non-key attributes are functionally dependent on the key of the relation. The key of the relation S is the attribute Ns.

Basic Rule when creating entity tables, this is “a separate table for each entity.”

Entity table fields can be of two types: key And non-key. Entering keys into a table in almost all relational DBMSs allows you to ensure the uniqueness of values ​​in table records by key, speed up the processing of table records, and also automatically sort records by values ​​in key fields.

Definition is usually sufficient simple key, less often - enter composite key. A table with a composite key can be, for example, a table storing a list of employees (last name, first name and patronymic), in which namesakes are found. Some DBMSs offer users the option to define an automatically generated key numbering field (in Access, this is a “counter” type field), which simplifies the solution to the problem of uniqueness of table records.

Sometimes entity tables have fields describing the properties or characteristics of objects. If the table has a significant number of repetitions in these fields and this information has a significant volume, then it is better to separate them into a separate table (adhering to the rule: “each entity has a separate table”). Moreover, an additional table should be created if the properties are interrelated.

When processing entity tables, keep the following in mind. It is easy to add and change a new entity, but when deleting, you must destroy all references to it from the link tables, otherwise the link tables will be incorrect. Many modern DBMSs block incorrect actions in such operations.

Posts link tables are intended to display relationships between entities, information about which is located in the corresponding entity tables.

Typically, one relationship table describes the relationship between two entities. Since entity tables in the simplest case each have one key field, the relationship table of two tables must have two keys to ensure the uniqueness of relationship records. You can create a table of relationships, like a table of objects, without keys, but then the functions of monitoring the uniqueness of records fall on the user.

More complex relationships (non-binary) should be reduced to binary ones. To describe the relationships of N objects, N-1 relationship tables are required. There should be no transitive connections. An excess of connections leads to contradictions (see the example of the EMPLOYEE-DEPARTMENT, EMPLOYEE-PROJECT and DEPARTMENT-PROJECT relationships in the previous subsection).

Characteristics of entities should not be included in relationship tables, otherwise anomalies are inevitable. It is better to store them in separate entity tables.


Using connection tables, you can also describe a somewhat specific type of connection - a linear connection, or a weak connection. An example of a linear connection can be considered the relation of belonging of entities to some other entity of a higher order (systems consisting of nodes; medicines consisting of components; metal alloys, etc.). In this case, one table of connections is sufficient to describe the connections.

When working with link tables, you should keep in mind that any entry from the link table can easily be deleted, since entities can do without links for some time. When adding or changing the contents of table records, you must check the correctness of references to existing objects, since a relationship cannot exist without objects. Most modern DBMSs control the correctness of object references.

Under integrity understand the property of a database, which means that it contains complete, consistent and adequately reflecting the subject area information.

Distinguish physical and logical integrity. Physical integrity means that there is physical access to the data and that the data is not lost. Logical integrity means the absence of logical errors in the database, which include violation of the structure of the database or its objects, deletion or change of established connections between objects, etc. In the future we will talk about logical integrity.

Maintaining the integrity of the database includes checking (monitoring) integrity and restoring it if inconsistencies are detected in the database. The integrity of the database is set using integrity constraints in the form of conditions that must be satisfied by the data stored in the database.

Among integrity constraints, two main types of constraints can be distinguished: value restrictions relationship attributes and structural limitations to tuples of relations.

Example value restrictions relationship attributes is the requirement that empty or duplicate values ​​in attributes are inadmissible, as well as control that attribute values ​​belong to a given range. Thus, in relationship records about personnel, the values ​​of the Birth_Date attribute cannot exceed the values ​​of the Reception_Date attribute.

The most flexible means of implementing control over attribute values ​​are stored procedures And triggers, available in some DBMSs.

Structural limitations define requirements integrity of entities And link integrity. For each entity instance represented in a relation, there is only one tuple corresponding to it. Requirement integrity of entities is that any tuple of a relation must be distinguishable from any other tuple of this relation, i.e., in other words, any relation must have a primary key.

The formulation of the link integrity requirement is closely related to the concept foreign key. Let us recall that foreign keys are used to connect relationships (database tables) with each other. In this case, the attribute of one relationship (parent) is called foreign key of this relationship, if it is primary the key of another relationship (child). A relation in which a foreign key is defined is said to reference a relation in which the same attribute is the primary key.

Referential integrity requires that for every foreign key value in the parent table, there must be a row in the child table with the same primary key value. For example, if relation R1 contains information about department employees, and the attribute of this relation Must is the primary key of relation R2, then in this relation for each position from R1 there should be a row with the corresponding salary.

Many modern DBMSs have tools for monitoring the integrity of the database.

Entity is a real or abstract object that has significant significance for the subject area. The entity must have a name expressed by a singular noun

An informal way to identify entities is to look for abstractions that describe objects, processes, roles, and other concepts. The formal way to identify entities is to analyze text descriptions of the subject area, highlighting nouns and selecting them as abstractions.

An entity instance is a specific representative of a given entity. For example, an instance of the Employee entity can be employee Ivanov.

Each entity must have the following properties:

have a unique name;

have one or more attributes that either belong to the entity or are inherited through a relationship;

have one or more attributes that uniquely identify each instance of an entity.

Attribute is a characteristic of an entity that is significant for the subject area under consideration and is intended to identify, classify, quantify or express the state of the entity.

The following types of attributes exist:

simple - consists of one data element;

composite - consists of several data elements;

unambiguous - contains one value for one entity;

multi-valued - contains several values ​​for one entity;

optional - can have an empty (undefined) value;

derived - a value derived from the value of another attribute.

A unique identifier is a set of attributes whose values ​​together are unique for each instance of an entity. Removing any attribute from an identifier violates its uniqueness. Unique identifiers are shown as underlined in the diagram.

Each entity can have any number of connections with other entities.

Relationships between entities

Relationship is a named association between entities that is significant for the subject area under consideration.

The degree of connection is the number of entities involved in the connection.

Communication power - the number of entity instances participating in the communication.

Depending on the power value, communication can be one of three types:

one-to-one (denoted 1:1).

one-to-many (denoted 1:N).

many-to-many (denoted M:N).

One to one. Means that in such a relationship, entities with one role always correspond to no more than one entity with another role. Since the degree of connection for each entity is 1, they are connected by one line.

One-to-many. An entity with one role can be associated with any number of entities with another role.

Many-to-many. In this case, each of the associated entities can be represented by any number of instances.