Median. Detailed theory with examples. The median of a set of numbers is How to find the largest median of a triangle

The central tendency of data can be considered not only as a value with zero total deviation (arithmetic mean) or maximum frequency (mode), but also as some mark (aggregate value) dividing the ranked data (sorted in ascending or descending order) into two equal parts . Half of the original data is less than this mark, and half is more. That's what it is median.

So, the median in statistics is the level of the indicator that divides the data set into two equal halves. The values ​​in one half are less than the median and the other half are greater than the median. As an example, let's look at a set of random numbers.

Obviously, with a symmetric distribution, the middle, dividing the population in half, will be located in the very center - in the same place as the arithmetic mean (and mode). This is, so to speak, an ideal situation when the mode, median and arithmetic mean coincide and all their properties fall on one point - maximum frequency, halving, zero sum of deviations - all in one place. However, life is not as symmetrical as a normal distribution.

Let's say we are dealing with technical measurements of deviations from the expected value of something (content of elements, distance, level, mass, etc., etc.). If everything is OK, then the deviations will most likely be distributed according to a law close to normal, approximately as in the figure above. But if there is an important and uncontrollable factor in the process, then anomalous values ​​may appear that will significantly affect the arithmetic mean, but will hardly affect the median.

The sample median is an alternative to the arithmetic mean, because it is resistant to abnormal deviations (outliers).

Mathematical property of the median is that the sum of absolute (modulo) deviations from the median value gives the minimum possible value when compared with deviations from any other value. Even less than the arithmetic average, oh how! This fact finds its application, for example, when solving transport problems, when it is necessary to calculate the construction site of objects near the road in such a way that the total length of flights to it from different places is minimal (stops, gas stations, warehouses, etc., etc. .).

Median formula in statistics for discrete data is somewhat reminiscent of a fashion formula. Namely, because there is no formula as such. The median value is selected from the available data and only if this is not possible, a simple calculation is carried out.

First of all, the data is ranked (sorted in descending order). Next there are two options. If the number of values ​​is odd, then the median will correspond to the central value of the series, the number of which can be determined by the formula:

No. Me– number of the value corresponding to the median,

N– the number of values ​​in the data set.

Then the median is denoted as

This is the first option when there is one central value in the data. The second option occurs when the number of data is even, that is, instead of one there are two central values. The solution is simple: take the arithmetic mean of the two central values:

IN interval data It is not possible to select a specific value. The median is calculated according to a certain rule.

To begin with (after ranking the data), find median interval. This is the interval through which the desired median value passes. Determined using the accumulated share of ranked intervals. Where the accumulated share first exceeded 50% of all values, there is a median interval.

I don’t know who came up with the median formula, but they clearly proceeded from the assumption that the distribution of data within the median interval is uniform (i.e. 30% of the interval width is 30% of the values, 80% of the width is 80% of the values, etc.) . From here, knowing the number of values ​​from the beginning of the median interval to 50% of all values ​​in the population (the difference between half the number of all values ​​and the accumulated frequency of the pre-median interval), you can find what proportion they occupy in the entire median interval. This share is exactly transferred to the width of the median interval, indicating a specific value, subsequently called the median.

Let's look at the visual diagram.

It turned out a little cumbersome, but now, I hope, everything is clear and understandable. To avoid drawing such a graph every time when calculating, you can use a ready-made formula. The median formula is as follows:

Where x Me- lower limit of the median interval;

i Me- width of the median interval;

∑f/2- the number of all values ​​divided by 2 (two);

S(Me-1)- the total number of observations that were accumulated before the start of the median interval, i.e. accumulated frequency of the premedian interval;

f Me- number of observations in the median interval.

As is easy to see, the median formula consists of two terms: 1 – the value of the beginning of the median interval and 2 – the very part that is proportional to the missing accumulated share of up to 50%.

For example, let's calculate the median using the following data.

You need to find the median price, that is, the price that is cheaper and more expensive than half the quantity of goods. To begin with, we will make auxiliary calculations of the accumulated frequency, accumulated share, and total number of goods.

Using the last column “Accumulated share” we determine the median interval - 300-400 rubles (the accumulated share is more than 50% for the first time). Interval width – 100 rub. Now all that remains is to substitute the data into the above formula and calculate the median.

That is, one half of the goods has a price lower than 350 rubles, and the other half has a higher price. It's simple. The arithmetic average, calculated using the same data, is equal to 355 rubles. The difference is not significant, but it is there.

Calculate median in Excel

It is easy to find the median for numerical data using an Excel function called - MEDIAN. Interval data is another matter. There is no corresponding function in Excel. Therefore, you need to use the above formula. What can you do? But this is not very tragic, since calculating the median from interval data is a rare case. You can do the math once on a calculator.

Finally, I offer a problem. There is a data set. 15, 5, 20, 5, 10. What is the average? Four options:

The mode, median, and sample mean are different ways of determining central tendency in a sample.

  • In addition to power averages in statistics, for the relative characterization of the value of a varying characteristic and the internal structure of distribution series, structural averages are used, which are mainly represented by fashion and median.

    Fashion- This is the most common variant of the series. Fashion is used, for example, in determining the size of clothes and shoes that are most in demand among customers. The mode for a discrete series is the one with the highest frequency. When calculating the mode for an interval variation series, you must first determine the modal interval (based on the maximum frequency), and then the value of the modal value of the attribute using the formula:

    Median - this is the value of the attribute that underlies the ranked series and divides this series into two equal parts.

    To determine the median in a discrete series if frequencies are available, first calculate the half-sum of frequencies , and then determine which value of the variant falls on it. (If the sorted series contains an odd number of features, then the median number is calculated using the formula:

    M e = (n (number of features in total) + 1)/2,

    in the case of an even number of features, the median will be equal to the average of the two features in the middle of the row).

    When calculating the median for interval variation series First, determine the median interval within which the median is located, and then determine the value of the median using the formula:

    Example. Find the mode and median.

    Solution:
    In this example, the modal interval is within the age group of 25-30 years, since this interval has the highest frequency (1054).

    Let's calculate the magnitude of the mode:

    This means that the modal age of students is 27 years.

    Let's calculate the median. The median interval is in the age group of 25-30 years, since within this interval there is an option that divides the population into two equal parts (Σf i /2 = 3462/2 = 1731). Next, we substitute the necessary numerical data into the formula and get the median value:

    This means that one half of the students are under 27.4 years old, and the other half are over 27.4 years old.

    In addition to mode and median, indicators such as quartiles can be used, dividing the ranked series into 4 equal parts, deciles -10 parts and percentiles - into 100 parts.

    Mode and median– a special kind of averages that are used to study the structure of the variation series. They are sometimes called structural averages, in contrast to the previously discussed power averages.

    Fashion– this is the value of a characteristic (variant) that is most often found in a given population, i.e. has the highest frequency.

    Fashion has great practical application and in some cases only fashion can characterize social phenomena.

    Median- this is a variant that is in the middle of an ordered variation series.

    The median shows the quantitative limit of the value of a varying characteristic, which has been reached by half of the units in the population. Using the median along with the average or instead of it is advisable if there are open intervals in the variation series, because to calculate the median, conditional establishment of the boundaries of open intervals is not required, and therefore the lack of information about them does not affect the accuracy of the calculation of the median.

    The median is also used when the indicators to be used as weights are unknown. The median is used instead of the arithmetic mean in statistical methods of product quality control. The sum of the absolute deviations of the options from the median is less than from any other number.

    Let's consider the calculation of the mode and median in a discrete variation series :

    Determine the mode and median.

    Fashion Mo = 4 years, since this value corresponds to the highest frequency f = 5.

    Those. the largest number of workers have 4 years of experience.

    In order to calculate the median, we first find half the sum of the frequencies. If the sum of frequencies is an odd number, then we first add one to this sum and then divide in half:

    The median will be the eighth option.

    In order to find which option will be the eighth by number, we will accumulate frequencies until we get a sum of frequencies equal to or greater than half the sum of all frequencies. The corresponding option will be the median.

    Meh = 4 years.

    Those. half of the workers have less than four years of experience, half more.

    If the sum of accumulated frequencies against one option is equal to half the sum of frequencies, then the median is defined as the arithmetic mean of this option and the next one.

    Calculation of mode and median in interval variation series

    The mode in the interval variation series is calculated by the formula

    Where X M0- initial boundary of the modal interval,

    hm 0 – the value of the modal interval,

    fm 0 , fm 0-1 , fm 0+1 – frequency of the modal interval preceding and following the modal interval, respectively.

    Modal The interval to which the highest frequency corresponds is called.

    Example 1

    Groups by experience

    Number of workers, people

    Accumulated frequencies

    Determine the mode and median.

    Modal interval, because it corresponds to the highest frequency f = 35. Then:

    Hm 0 =6, 0 =35

    hm 0 =2, 0-1 =20

    0+1 =11

    Conclusion: The largest number of workers have approximately 6.7 years of experience.

    For an interval series, Me is calculated using the following formula:

    Where Hm e– lower border of the medial interval,

    hmm e– the size of the medial interval,

    – half the sum of frequencies,

    e– frequency of the median interval,

    Sm e-1– the sum of the accumulated frequencies of the interval preceding the median.

    Median interval is an interval that corresponds to a cumulative frequency equal to or greater than half the sum of the frequencies.

    Let's determine the median for our example.

    since 82>50, then the median interval is .

    Hm e =6, e =35,

    hmm e =2, Sm e-1 =47,

    Conclusion: Half of the workers have less than 6.16 years of experience, and half have more than 6.16 years of experience.

    Brief theory

    The most widely used in statistics are structural means, which include mode and median (nonparametric means).

    Fashion- the value of a characteristic (variant) that occurs in the distribution series with the highest frequency (weight). Fashion (Mo) is used to identify the value of a characteristic that is most widespread (the price on the market at which the largest number of sales of a given product were made, the number of shoes that is in greatest demand among buyers, etc.). The mode is used only in populations of large numbers. In a discrete series, the mode is found as the variant that has the highest frequency. In the interval series, first there is a modal interval, that is, the interval with the highest frequency, and then - the approximate value of the modal value of the attribute according to the formula:

    – lower limit of the modal interval

    - the value of the modal interval

    – frequency of the interval preceding the modal

    – modal interval frequency

    – frequency of the interval following the modal

    Quantiles- quantities that divide a set into a certain number of equal parts elements. The most famous quantile is the median, which divides the population into two equal parts. In addition to the median, quartiles are often used, dividing the ranked series into 4 equal parts, deciles - 10 parts, and percentiles - into 100 parts.

    Median- the value of the attribute for a unit located in the middle of the ranked (ordered) series. If a distribution series is represented by specific values ​​of a characteristic, then the median (Me) is found as the middle value of the characteristic.

    If the distribution series is discrete, then the median is found as the middle value of the attribute (for example, if the number of values ​​is odd - 45, then it corresponds to the 23rd value of the attribute in a series of values ​​arranged in ascending order, if the number of values ​​is even - 44, then the median corresponds to half the sum of 22 and 23 characteristic values).

    If the distribution series is interval, then initially find the median interval, which contains a unit located in the middle of the ranked series. To determine this interval, the sum of frequencies is divided in half and, based on the sequential accumulation (summation) of interval frequencies, starting from the first, the interval where the median is located is found. The median value in an interval series is calculated using the formula:

    - lower limit of the median interval

    - the value of the median interval

    Sum of frequency series

    – the sum of accumulated frequencies in the intervals preceding the median

    – frequency of the median interval

    Quartiles- these are the values ​​of the characteristic in the ranked series, selected in such a way that 25% of the units in the population will be less than the value, 25% of the units will be between and; 25% are between and , the remaining 25% exceed . Quartiles are determined using formulas similar to the formula for calculating the median. For an interval series:

    Decile is a structural variable that divides the distribution into 10 equal parts according to the number of units in the population. There are 9 deciles, and 10 decile groups. Deciles are determined using formulas similar to the formula for calculating the median and quartiles.

    In general, the general formula for calculating quantiles in an interval series is as follows:

    – ordinal number of quantile

    – quantile dimension (how many parts these quartiles divide the population into)

    – lower limit of the quantile interval

    – width of the quantile interval

    Cumulative frequency of the prequantile interval

    For a discrete series, the quantile number can be found using the formula:

    Example of problem solution

    Condition of task 1 (discrete ranked series)

    As a result of the research, the average monthly income of residents of one entrance was established:

    Define:

    Modal and median income, quantiles and deciles of income.

    The solution of the problem

    We already have a ranked series - the income values ​​of residents are distributed in ascending order.

    Fashion is the most common meaning. In this case we have a series with two modes.

    The median is the value of the attribute that divides the ordered set of data in half.

    Quartiles are the values ​​of a characteristic in a ranked series, selected in such a way that 25% of the units in the population will be less than the value ; 25% of the units will be contained between and ; 25% - between and ; the remaining 25% are superior.

    Dicili divide the row into 10 equal parts:

    If you do not need help now, but may need it in the future, then in order not to lose contact, join the VK group.

    Problem condition 2 (interval series)

    To determine the average deposit size at a credit institution, the following data were obtained:

    Calculate structural means (mode, median, quartiles).

    The solution of the problem

    Let us calculate the mode of the contribution size:

    Mode is the option that corresponds to the highest frequency.

    The mode is calculated by the formula:

    Start of modal interval

    Interval size

    Modal interval frequency

    Frequency of the interval preceding the modal

    Frequency of the interval following the modal

    Thus, the largest number of deposits are in the amount of 30.7 thousand rubles.

    Median is an option located in the middle of the distribution series.

    The median is calculated using the formula:

    Beginning (lower limit) of the median interval

    Interval size

    Sum of all frequencies of the series

    Median interval frequency

    Sum of accumulated frequencies of variants to the median

    Thus, half of the deposits are up to 28 thousand rubles, the other half are more than 28 thousand rubles.

    Let's calculate the quantiles:

    Thus, 25% of deposits are less than 20.8 thousand rubles, 25% of deposits are in the range of 20.8 thousand rubles. up to 28 thousand rubles, 25% lie in the range from 28 thousand rubles. up to 33 thousand rubles, 25% more than the value of 33 thousand rubles.

    Problem condition 3

    Construct graphs for the variation series. Show the mode, median, mean, and quartiles on the graph.

    Solution to Problem 3

    Let's calculate the average: To do this, sum up the products of the midpoints of the intervals and the corresponding frequencies, and divide the resulting sum by the sum of the frequencies.

    Median- this is the value of the attribute that divides the ranked series of the distribution into two equal parts - with attribute values ​​less than the median and with attribute values ​​greater than the median. To find the median, you need to find the value of the attribute that is in the middle of the ordered series.

    View the solution to the problem of finding the mode and median You can

    In ranked series, ungrouped data for finding the median are reduced to searching for the serial number of the median. The median can be calculated using the following formula:

    where Xm is the lower limit of the median interval;
    im - median interval;
    Sme is the sum of observations that were accumulated before the start of the median interval;
    fme is the number of observations in the median interval.

    Properties of the median

    1. The median does not depend on those attribute values ​​that are located on either side of it.
    2. Analytical operations with the median are very limited, so when combining two distributions with known medians, it is impossible to predict in advance the value of the median of the new distribution.
    3. The median has property of minimality. Its essence lies in the fact that the sum of the absolute deviations of x values ​​from the median is the minimum value compared to the deviation of X from any other value

    Graphical definition of median

    For determining medians by graphical method They use accumulated frequencies from which a cumulative curve is constructed. The vertices of the ordinates corresponding to the accumulated frequencies are connected by straight segments. By dividing the last ordinate in half, which corresponds to the total sum of frequencies, and drawing a perpendicular intersection with the cumulative curve to it, the ordinate of the desired median value is found.

    Definition of fashion in statistics

    Fashion - the value of the attribute, which has the highest frequency in the statistical distribution series.

    Definition of fashion is produced in different ways, and this depends on whether the varying characteristic is presented in the form of a discrete or interval series.

    Finding fashion and median is done by simply looking at the frequency column. In this column, find the largest number characterizing the highest frequency. It corresponds to a certain value of the attribute, which is the mode. In an interval variation series, the mode is approximately considered to be the central variant of the interval with the highest frequency. In such a distribution series the mode is calculated by the formula:

    where XMo is the lower limit of the modal interval;
    imo - modal interval;
    fм0, fм0-1, fм0+1 - frequencies in the modal, previous and following modal intervals.

    The modal interval is determined by the highest frequency.

    Fashion is widely used in statistical practice when analyzing consumer demand, recording prices, etc.

    Relationships between the arithmetic mean, median and mode

    For a unimodal symmetric series, the distributions , median and mode coincide. For asymmetric distributions they are not the same.

    K. Pearson, based on the alignment of various types of curves, determined that for moderately asymmetric distributions the following approximate relationships between the arithmetic mean, median and mode are valid: