Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of data. The purpose is to get meaning full information
Statistics applications
Introduction- This topic shall provide brief details for understanding of the statistical terms that are used in engineering application with a reference to concrete test results and their analysis for acceptance of results. The application of statistics shall be discussed in next blog. Terms described in this blog are:
Data
Probability
Mean-Mode-Median
Normal Distribution
Standard Deviation
Z score
Z score calculation
Standard deviation calculation
Statistics in Engineering – The Standard Deviation (σ)
What is happening in any field can be observed. What is observed can be recorded. Whatever can be recorded is called collection of facts or the DATA
(such as in the form of numbers, words, measurements, observations or descriptions)
The data is analyzed to get meaningful information.
Data Type
Survey of whole population is called Census
Survey of a group od Population is called Sample
Data Types
Analog data – A sound note changing uniformly and is continuous such without jerks
Digital data – A sound note changing uniformly and is continuous such with jerks
Binary data – Used in computers and phones
A Binary Number is made up of only 0s and 1s.
100110 is a binary data- uses only two digits
Bit- Measure of one digit of binary data. The number above has 6 digits
Byte, Megabytes, Gigabytes, Terabytes – are the units of binary data measurement.
Data is processed, analysed to get information. The information is used in ‘Monitoring & Control’ processes, which is an important group of ‘Project Management Processes’. The data can be represented diagrammatically as
Bar Charts, Pie Charts, Line Graphs, Scatter diagram, Histograms, Frequency Distribution etcetera
Probability- Probability is a branch of mathematics that deals with occurrence of a random event. For example,
When a coin is tossed in the air, the possible outcomes are Head and Tail. Hence probability of either head or tail is1/2.
When a dice having six faces numbering 1, 2, 3, 4, 5, 6, is thrown there I probability that any number from 1 to 6 can come on top. Hence probability of any number coming on top is 1/6
Mean, Median and Mode are the measures of central value of a data set.
Mean
The mean of data set 14,18,12,17,12,13,11,10, 9, 8= sum of all numbers/number of data
(14+18+12+17+12+13+11+10+9+8) / 10 = 12.4 is the mean
Mode is the value that occurs most time
Median of data 6,8, 9, 10, 11, 12, 12, 14, 14, 17, 18
Arrange the data in a sequence. The central figure 12 is the median value. If centre dataset has even numbers, then find the average of two central values to get the Median value.
If an event is random, it means that it does not seem to follow a definite plan or pattern or outcome.
Data observed can be found to have distribution towards left or towards right or towards centre or skewed.
The date which is centrally distributed is called normal distribution. Many things closely follow a Normal Distribution which includes outputs of engineering, studies, and research.
As an example, the crushing strength of same type of concrete cubes, is recorded as data for strength of concrete analysis, say 30 cubes (N=30).
Let the strength of concrete grade for which cubes are tested be 30 MPa.
The test strength of 30MPa concrete may not be 30 all the time, but it shall vary. This variation say is from 25 MPa to 35 MPa.
The test strength of each of 30 samples is recorded. Then the data is tabulated for strength and frequency. The strengths shall have frequency. (If 28 MPa test strength is observed 5 times in th 28 MPa is 5)
The strengths of the concrete when plotted against the frequency shall show a normal distribution. It shows that the strength of cubes has a mean value, and some of test results are close tohirty test operations, it means the frequency of strengt mean value on left and right side.
We say the data is “normally distributed”. The shape of the curve is as shown below: (source Wikipedia)
X-axis Strength (MPa)
y-axis Frequency
The main features of the ND curve are:
Abscissa, x-axis, represents the compressive strength, y-axis, represents the frequency of occurrence.
Total area of curve is equal to unity
Mean is a point on the x-axis having maximum frequency and dividing the area into two exact halves. The curve is symmetrical about mean.
Dark blue is less than one standard deviation from the mean. For the normal distribution, this includes 68.27 percent of the numbers.
Medium blue and Dark blue is two standard deviations from the mean include 95.45 percent.
Light blue, Medium blue, and Dark blue is 3 standard deviation and include 99.73 %. The other area accounts for 4 σ,5 σ and 6 σ.
The normal distribution is mathematically defined completely by two statistical parameters:
Population mean- μ and
Standard deviation- σ.
A mathematical characteristic of the normal distribution is that
(A)- 68.27% of the data lies within 1 standard deviation from the mean
(B)- 95.45% of the data is within 2 standard deviations.
(C)- 99.70% of the data is within 3 standard deviations
Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean) value.
A low standard deviation means that most of the numbers are close to the average.
A high standard deviation means that the numbers are more spread out and therefore the results are not consistent, and the design need to be reviewed’
A ‘Standard Normal Distribution’ is a normal distribution with mean (μ ) and standard deviation (σ) 1,2, 3…..
Areas under this curve can be found using a ‘standard normal table’
The 68% of the observations fall between -1 σ and 1 σ
The 95% fall between -2 σ and 2 σ
The 99.7% fall between -3 σ and 3 σ.