Dragon Academy

G12 Mathematics of Data Management

Exam 2020

Name: Surname: Date: Mo. May 25 2020. Save your answered form as a PDF file with name "Exam.pdf" and submit it as an attachment via Email to msantos@dragonacademy.org.

Note: Make sure to resize the text boxes so that they completely show your answers.

Warning: Do not reload the page. Do not go back. If you reload this page or go back you will loose all your answers and will have to start over again!

Questions

All questions weigh the same towards your final mark. You can check your notes and calculators or smartphone apps are also allowed.

Write your answer in the corresponding textboxes or choose the rigth radio buttons/checkboxes.

(Ktica) Extract as much information as possible from the following data by calculating the statistics listed.
Mean (Mn), Mode (Mo), Median (Md), Range (Rn), Variance (Va), Standard Deviation (Sd), Interquartal Range (IR) and Relative Spread (rS)
```
Data: 47 50 52 52 54 65 67 69 71 82 84 85 86 135

					
```
List your results here making sure you use the two-letter labels listed above for each statistic.
(KTIca) A sample of the height of G12 students in a school yields an average of $1.73m$ with a standard deviation of $0.15m$ and a relative spread of $0.087$. What are the values of these three statistics if we had measured heights in centimeters?
All would be the same, because they represent the actual heights of students, which doesn't depend on the units used.
Mean/Sd/rS: $173cm$, $15cm$, $0.087$
Mean/Sd/rS: $173cm$, $15cm$, $8.7$
None of the above: as the measurements are more precise the standard deviation should be smaller and none of those answers reflect that fact..
(KTIca) We have a dataset of 100 data points. A friend of yours tells you that he tried to model the data using 2 different models. Model I is a polynomial of degree 5 and model II is a polynomial of degree 50. The coefficients of determination are $R^2=0.87$ and $R^2=0.97$. Select all true answers
Most likely Model I is wrong, because it has a much lower $R^2$
Most likely Model II is useless, because in actual problem we will never find such a high value of $R^2$
Most likely Model I is underfitting the data, because you cannot possible model 100 data points with just 6 parameters.
Most likely model II is overfitting the data. Model II just improves somewhat the value of $R^2$ wrt that of ModelI. Yet it uses almost 10 times more parameters.
By Occam's razor, Model I seems to be the most reasonable in order to predict new values
(KticA) Use Desmos and provide both an exponential model as well as polynomial one of degree 4 for the following data. Write down the model and list the values of all its parameters. Write down as well the values of $R^2$. Which is the best model and why?
```
y:  -1 3/4 11/2 18 53 
x:   0  2    4   6  8 

					
```
Write the expression of each model and list the values of their coefficients. Note: Use x^3 to denote $x^3$.

The exponential is the best model because its $R^2=1$
The polynomial is the best model because its $R^2=1$
Both are equally good models because they have the same value of $R^2$
By Occam's razor, the exponential model is better because it achieves a similar $R^2$ with fewer parameters.
(KtiCa) We know that $P(A | I) = 1/4$ and $P(B | I) = 1/2$ and $P(AB | I) = 1/8$. Calculate:
1. $P(A-B | I )$
2. $P(A+B | I )$
3. $P(AB | I (A+B) )$
4. The odds of AB against $A+B$
Note: $Q-R=Q\,(\neg R)$

Write down your answers in this text box. Try to give a hint on how you come up with such a value.
Of 180 workers surveyed in an industrial community, 75 work in the paper mill and 40 in the water-treatment plant. What’s the probability that a worker picked up randomly works
1. Either the paper mill or the water-treatment plant
2. Somewhere other than the paper mill or the water-treatment plant
3. How does your answer to A change if 15 workers work part time on both, the mill and the plant.
Write down your answers in this text box.
(KtiCa) See the figure below. Among A, B and C, which pair of events are independent. Justify your answer by writing down some calculations.
Obviously, A and B are independent because they do not overlap
Only A and C because $P(C|IA)=P(C|I)$
Only B and C because $P(BC|I)=P(B|I)\,P(C|I)$
No pair is independent because C overlaps with the other two.

Write down your answers in this text box.