Statistikk og Simulering

Økt 12. Kontinuerlege stokastise variablar

Continuous stochastic variables

13.1. Continuous stochastic variables

Oppgåve 13.1 What possible outcomes do we have for the following measurements, assuming that we can measure infinitely accurately:

the height of an arbitrary person?
the weight of an arbitrary person?
the age of an arbitrary person?
time modulo one second?

In the previous exercise we have seen examples of random variables. However, the difference with the random variables we have seen before is that these are continuous random variables that can take on values over a continuous range.

Oppgåve 13.2 Input the following code into MATLAB: x=rand This code returns a uniformly distributed continuous random (pseudo) random number in the interval [0, 1].

Oppgåve 13.3 Repeat the above simulation a number of times, simulating a series of observations in order to verify that the outcome takes on random values from the sample space {0, 1}.

Oppgåve 13.4 Take a stopwatch that is able to measure at least thousands’ of seconds. Stop it at an arbitrary moment (wait for some time and do not watch your clock in the mean time) and write down the measured random time modulo one second.

(This should give you a number between 0 and 1). Repeat this ten times and collect your data in a table.

Oppgåve 13.5 What are the probabilities for each of the outcomes (still assuming infinite measurement precision)?

Oppgåve 13.6 Draw the empirical cumulative distribution function based on your data.

Oppgåve 13.7 Draw the theoretical cumulative distribution function in the same plot.

Oppgåve 13.8 Input and run the following code in MATLAB:

2x = rand(1,n) 
3stairs([0 sort(x)], 0:1/n:1, ’r’) % Plot the empirical c.d.f. 
4hold on 
5y = 0:.001:1 
6stairs(y, y, ’b’) % Plot the c.d.f. 
7hold off

Oppgåve 13.9 Explain the code and compare to exercise 51 and 52.

Oppgåve 13.10 Increase the sample size. What do you observe?

Oppgåve 13.11 What would the probability distribution look like? (Remember that the sum of all probabilities should be one)!

As we have seen, we can define the cumulative distribution function for a continuous stochastic variable in the same we as we have done for a discrete stochastic variable: The cumulative distribution function FX(x) is as always defined as the probability to get a result less than or equal to x:

FX(x) = P(X x) (27) 

However, the probability distribution has become useless, as the probability for every single value equals zero, whereas the integrated (total) probability should equal one. We therefore define a new function, the probability density fX(x) as the relative likelihood for the random variable to take on a given value. The area under the density function between two values gives the probability that the random variable falls between these values. The integral (total area) of the probability density function over the entire space is equal to one.

Oppgåve 13.12 Draw the probability density for a uniformly distributed continuous random variable representing a point in time between zero and one seconds.

In order to achieve the probability density function empirically, we cannot use the exact same method as we used for the probability distribution of a discrete random variable. The problem is that we never will get the same result twice, since the probability for each event is zero. We can solve this problem by gathering together data that lie close together, a method that is called binning. In this way, we can achieve a histogram of the data representing the relative probabilities for the different areas. Finally, we normalize such that the total area under the histogram equals one.

Les 6 §4.2 Continuous stochastic variables

Oppgåve 13.13 Frisvold and Moe: E4.7

Oppgåve 13.14 What will the probability density look like if we instead of one, each time draw two uniformly distributed random numbers between zero and one (for example by the stopwatch method) and add these together?

Oppgåve 13.15 Take a sample of 15 measurements and draw an empirical probability density function for the previous exercise. Use a sample size of 0.4.

Oppgåve 13.16 Input and run the following code in MATLAB:

1n=15;      % Sample size 
2nBins=5;   % Number of bins in the histogram 
3x = rand(1,n) % Uniform random number in <0,1> 
4y = rand(1,n) % And another 
5z=(x+y)  % Sum of two uniform random variables 
6bins=1/nBins:2/nBins:2-1/nBins % The array of bin centres 
7% (indicated by centre of first bin, 
8% step size, and centre of last bin) 
9% We need to make a bar chart as a histogram is not nomalizable: 
10[nelements,centers] = hist(z,bins) 
11bar(centers,nelements*nBins/2/sum(z),’b’) % Note the normalization of 
12% the number of elements in order to get the probability density! 
13hold on 
14y = 0:.001:2 
15plot(y, 1-abs(y-1), ’r’) % Theoretical probability density for 
16% the sum of two uniform random variables 
17hold off

Oppgåve 13.17 Increase the sample size and the number of bins in order to sample the probability density function.

Oppgåve 13.18 Adapt the above code and sample the probability density for the sum of 3, 4 and 5 uniformly distributed continuous random variables. What do you observe?

As in exercise 20, we are witness of the central limit theorem: The sum of multiple random variables very quickly approaches a normal distribution. The central limit theorem will be covered later.