Hello Peers, Today we are going to share all week’s assessment and quiz answers of the Practical Machine Learning course launched by Coursera totally free of cost✅✅✅. This is a certification course for every interested student.

In case you didn’t find this course for free, then you can apply for financial ads to get this course for totally free.

Coursera, India’s biggest learning platform launched millions of free courses for students daily. These courses are from various recognized universities, where industry experts and professors teach in a very well manner and in a more understandable way.

Here, you will find Practical Machine Learning Exam Answers in Bold Color which are given below.

These answers are updated recently and are 100% correct✅ answers of all week, assessment, and final exam answers of Practical Machine Learning from Coursera Free Certification Course.

`Use “Ctrl+F” To Find Any Questions Answer. & For Mobile User, You Just Need To Click On Three dots In Your Browser & You Will Get A “Find” Option There. Use These Option to Get Any Random Questions Answer.`

## About Practical Machine Learning Course

This course will cover the fundamentals of constructing and employing prediction functions, with a focus on practical applications. The course will teach a foundational understanding of such concepts as training and test sets, overfitting, and error rates.

Course Apply Link – Practical Machine Learning

## Practical Machine Learning Quiz Answers

#### Quiz 1: Quiz 1

Q1. Which of the following are components in building a machine learning algorithm?

• Training and test sets
• Statistical inference
• Machine learning
• Deciding on an algorithm.
• Artificial intelligence

Q2. Suppose we build a prediction algorithm on a data set and it is 100% accurate on that data set. Why might the algorithm not work well if we collect a new data set?

• Our algorithm may be overfitting the training data, predicting both the signal and the noise.
• We may be using a bad algorithm that doesn’t predict well on this kind of data.
• We have used neural networks which has notoriously bad performance.
• We have too few predictors to get good out of sample accuracy.

Q3. What are typical sizes for the training and test sets?

• 100% training set, 0% test set.
• 20% training set, 80% test set.
• 60% in the training set, 40% in the testing set.
• 90% training set, 10% test set

Q4. What are some common error rates for predicting binary variables (i.e. variables with two possible values like yes/no, disease/normal, clicked/didn’t click)? Check the correct answer(s).

• Correlation
• Accuracy
• R^2
• Root mean squared error
• Median absolute deviation

Q5. Suppose that we have created a machine learning algorithm that predicts whether a link will be clicked with 99% sensitivity and 99% specificity. The rate the link is clicked is 1/1000 of visits to a website. If we predict the link will be clicked on a specific visit, what is the probability it will actually be clicked?

• 9%
• 99%
• 0.009%
• 89.9%

#### Quiz 1: Quiz 2

Q1. Load the Alzheimer’s disease data using the commands:

`library(AppliedPredictiveModeling)data(AlzheimerDisease)`

Which of the following commands will create non-overlapping training and test sets with about 50% of the observations assigned to each?

`adData = data.frame(diagnosis,predictors)trainIndex = createDataPartition(diagnosis,p=0.5,list=FALSE)training = adData[trainIndex,]testing = adData[trainIndex,]`
`adData = data.frame(diagnosis,predictors)train = createDataPartition(diagnosis, p = 0.50,list=FALSE)test = createDataPartition(diagnosis, p = 0.50,list=FALSE)`
```adData = data.frame(diagnosis,predictors)
testIndex = createDataPartition(diagnosis, p = 0.50,list=FALSE)
`adData = data.frame(diagnosis,predictors)trainIndex = createDataPartition(diagnosis,p=0.5,list=FALSE)training = adData[-trainIndex,]testing = adData[-trainIndex,]`

Q2. Load the cement data using the commands:

`library(AppliedPredictiveModeling)data(concrete)library(caret)set.seed(1000)inTrain = createDataPartition(mixtures\$CompressiveStrength, p = 3/4)[]training = mixtures[ inTrain,]testing = mixtures[-inTrain,]`

Make a plot of the outcome (CompressiveStrength) versus the index of the samples. Color by each of the variables in the data set (you may find the cut2() function in the Hmisc package useful for turning continuous covariates into factors). What do you notice in these plots?

• The outcome variable is highly correlated with FlyAsh.
• There is a non-random pattern in the plot of the outcome versus index that is perfectly explained by the Age variable so there may be a variable missing.
• There is a non-random pattern in the plot of the outcome versus index that does not appear to be perfectly explained by any predictor suggesting a variable may be missing.
• There is a non-random pattern in the plot of the outcome versus index that is perfectly explained by the FlyAsh variable.

Q3. Load the cement data using the commands:

`library(AppliedPredictiveModeling)data(concrete)library(caret)set.seed(1000)inTrain = createDataPartition(mixtures\$CompressiveStrength, p = 3/4)[]training = mixtures[ inTrain,]testing = mixtures[-inTrain,]`

Make a histogram and confirm the SuperPlasticizer variable is skewed. Normally you might use the log transform to try to make the data more symmetric. Why would that be a poor choice for this variable?

• The SuperPlasticizer data include negative values so the log transform can not be performed.
• The log transform does not reduce the skewness of the non-zero values of SuperPlasticizer
• The log transform is not a monotone transformation of the data.
• There are a large number of values that are the same and even if you took the log(SuperPlasticizer + 1) they would still all be identical so the distribution would not be symmetric.

Q4. Load the Alzheimer’s disease data using the commands:

`library(caret)library(AppliedPredictiveModeling)set.seed(3433)data(AlzheimerDisease)adData = data.frame(diagnosis,predictors)inTrain = createDataPartition(adData\$diagnosis, p = 3/4)[]training = adData[ inTrain,]testing = adData[-inTrain,]`

Find all the predictor variables in the training set that begin with IL. Perform principal components on these variables with the preProcess() function from the caret package. Calculate the number of principal components needed to capture 80% of the variance. How many are there?

• 9
• 10
• 11
• 7

Q5. Load the Alzheimer’s disease data using the commands:

`library(caret)library(AppliedPredictiveModeling)set.seed(3433)data(AlzheimerDisease)adData = data.frame(diagnosis,predictors)inTrain = createDataPartition(adData\$diagnosis, p = 3/4)[]training = adData[ inTrain,]testing = adData[-inTrain,]`

Create a training data set consisting of only the predictors with variable names beginning with IL and the diagnosis. Build two predictive models, one using the predictors as they are and one using PCA with principal components explaining 80% of the variance in the predictors. Use method=”glm” in the train function.

What is the accuracy of each method in the test set? Which is more accurate?

• Non-PCA Accuracy: 0.91
• PCA Accuracy: 0.93
• Non-PCA Accuracy: 0.72
• PCA Accuracy: 0.71
• Non-PCA Accuracy: 0.72
• PCA Accuracy: 0.65
• Non-PCA Accuracy: 0.65
• PCA Accuracy: 0.72

#### Quiz 1: Quiz 3

Q1. For this quiz we will be using several R packages. R package versions change over time, the right answers have been checked using the following versions of the packages.

AppliedPredictiveModeling: v1.1.6

caret: v6.0.47

ElemStatLearn: v2012.04-0

pgmm: v1.1

rpart: v4.1.8

If you aren’t using these versions of the packages, your answers may not exactly match the right answer, but hopefully should be close.

Load the cell segmentation data from the AppliedPredictiveModeling package using the commands:

`library(AppliedPredictiveModeling)data(segmentationOriginal)library(caret)`
1. Subset the data to a training set and testing set based on the Case variable in the data set.
2. Set the seed to 125 and fit a CART model with the rpart method using all predictor variables and default caret settings.
3. In the final model what would be the final model prediction for cases with the following variable values:

a. TotalIntench2 = 23,000; FiberWidthCh1 = 10; PerimStatusCh1=2

b. TotalIntench2 = 50,000; FiberWidthCh1 = 10;VarIntenCh4 = 100

c. TotalIntench2 = 57,000; FiberWidthCh1 = 8;VarIntenCh4 = 100

d. FiberWidthCh1 = 8;VarIntenCh4 = 100; PerimStatusCh1=2

• a. WS
• b. WS
• c. PS
• d. Not possible to predict
• a. PS
• b. WS
• c. PS
• d. WS
• a. PS
• b. Not possible to predict
• c. PS
• d. Not possible to predict
• a. PS
• b. WS
• c. PS
• d. Not possible to predict

Q2. If K is small in a K-fold cross validation is the bias in the estimate of out-of-sample (test set) accuracy smaller or bigger? If K is small is the variance in the estimate of out-of-sample (test set) accuracy smaller or bigger. Is K large or small in leave one out cross validation?

• The bias is smaller and the variance is bigger. Under leave one out cross validation K is equal to one.
• The bias is larger and the variance is smaller. Under leave one out cross validation K is equal to the sample size.
• The bias is smaller and the variance is smaller. Under leave one out cross validation K is equal to one.
• The bias is larger and the variance is smaller. Under leave one out cross validation K is equal to two.

Q3. Load the olive oil data using the commands:

`library(pgmm)data(olive)olive = olive[,-1]`

(NOTE: If you have trouble installing the pgmm package, you can download the -code-olive-/code- dataset here: olive_data.zip. After unzipping the archive, you can load the file using the -code-load()-/code- function in R.)

These data contain information on 572 different Italian olive oils from multiple regions in Italy. Fit a classification tree where Area is the outcome variable. Then predict the value of area for the following data frame using the tree command with all defaults

`newdata = as.data.frame(t(colMeans(olive)))`

What is the resulting prediction? Is the resulting prediction strange? Why or why not?

• 2.783. There is no reason why this result is strange.
• 2.783. It is strange because Area should be a qualitative variable – but tree is reporting the average value of Area as a numeric variable in the leaf predicted for newdata
• 0.005291005 0 0.994709 0 0 0 0 0 0. The result is strange because Area is a numeric variable and we should get the average within each leaf.
• 4.59965. There is no reason why the result is strange.

Q4. Load the South Africa Heart Disease Data and create training and test sets with the following code:

`library(ElemStatLearn)data(SAheart)set.seed(8484)train = sample(1:dim(SAheart),size=dim(SAheart)/2,replace=F)trainSA = SAheart[train,]testSA = SAheart[-train,]`

Then set the seed to 13234 and fit a logistic regression model (method=”glm”, be sure to specify family=”binomial”) with Coronary Heart Disease (chd) as the outcome and age at onset, current alcohol consumption, obesity levels, cumulative tabacco, type-A behavior, and low density lipoprotein cholesterol as predictors. Calculate the misclassification rate for your model using this function and a prediction on the “response” scale:

`missClass = function(values,prediction){sum(((prediction > 0.5)*1) != values)/length(values)}`

What is the misclassification rate on the training set? What is the misclassification rate on the test set?

• Test Set Misclassification: 0.31
• Training Set: 0.27
• Test Set Misclassification: 0.35
• Training Set: 0.31
• Test Set Misclassification: 0.32
• Training Set: 0.30
• Test Set Misclassification: 0.38
• Training Set: 0.25

Q5. Load the vowel.train and vowel.test data sets:

`library(ElemStatLearn)data(vowel.train)data(vowel.test)`

Set the variable y to be a factor variable in both the training and test set. Then set the seed to 33833. Fit a random forest predictor relating the factor variable y to the remaining variables. Read about variable importance in random forests here: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr The caret package uses by default the Gini importance.

Calculate the variable importance using the varImp function in the caret package. What is the order of variable importance?

[NOTE: Use randomForest() specifically, not caret, as there’s been some issues reported with that approach. 11/6/2016]

• The order of the variables is:
• x.10, x.7, x.9, x.5, x.8, x.4, x.6, x.3, x.1,x.2
• The order of the variables is:
• x.2, x.1, x.5, x.6, x.8, x.4, x.9, x.3, x.7,x.10
• The order of the variables is:
• x.2, x.1, x.5, x.8, x.6, x.4, x.3, x.9, x.7,x.10
• The order of the variables is:
• x.1, x.2, x.3, x.8, x.6, x.4, x.5, x.9, x.7,x.10

#### Quiz 1: Quiz 4

Q1. For this quiz we will be using several R packages. R package versions change over time, the right answers have been checked using the following versions of the packages.

AppliedPredictiveModeling: v1.1.6

caret: v6.0.47

ElemStatLearn: v2012.04-0

pgmm: v1.1

rpart: v4.1.8

gbm: v2.1

lubridate: v1.3.3

forecast: v5.6

e1071: v1.6.4

If you aren’t using these versions of the packages, your answers may not exactly match the right answer, but hopefully should be close.

`library(ElemStatLearn)data(vowel.train)data(vowel.test)`

Set the variable y to be a factor variable in both the training and test set. Then set the seed to 33833. Fit (1) a random forest predictor relating the factor variable y to the remaining variables and (2) a boosted predictor using the “gbm” method. Fit these both with the train() command in the caret package.

What are the accuracies for the two approaches on the test data set? What is the accuracy among the test set samples where the two methods agree?

• RF Accuracy = 0.3233
• GBM Accuracy = 0.8371
• Agreement Accuracy = 0.9983
• RF Accuracy = 0.9987
• GBM Accuracy = 0.5152
• Agreement Accuracy = 0.9985
• RF Accuracy = 0.6082
• GBM Accuracy = 0.5152
• Agreement Accuracy = 0.5325
• RF Accuracy = 0.6082
• GBM Accuracy = 0.5152
• Agreement Accuracy = 0.6361

Q2. Load the Alzheimer’s data using the following commands

`library(caret)library(gbm)set.seed(3433)library(AppliedPredictiveModeling)data(AlzheimerDisease)adData = data.frame(diagnosis,predictors)inTrain = createDataPartition(adData\$diagnosis, p = 3/4)[]training = adData[ inTrain,]testing = adData[-inTrain,]`

Set the seed to 62433 and predict diagnosis with all the other variables using a random forest (“rf”), boosted trees (“gbm”) and linear discriminant analysis (“lda”) model. Stack the predictions together using random forests (“rf”). What is the resulting accuracy on the test set? Is it better or worse than each of the individual predictions?

• Stacked Accuracy: 0.88 is better than all three other methods
• Stacked Accuracy: 0.76 is better than random forests and boosting, but not lda.
• Stacked Accuracy: 0.93 is better than all three other methods
• Stacked Accuracy: 0.80 is better than random forests and lda and the same as boosting.

Q3. Load the concrete data with the commands:

`set.seed(3523)library(AppliedPredictiveModeling)data(concrete)inTrain = createDataPartition(concrete\$CompressiveStrength, p = 3/4)[]training = concrete[ inTrain,]testing = concrete[-inTrain,]`

Set the seed to 233 and fit a lasso model to predict Compressive Strength. Which variable is the last coefficient to be set to zero as the penalty increases? (Hint: it may be useful to look up ?plot.enet).

• Cement
• CoarseAggregate
• BlastFurnaceSlag
• FineAggregate

Q4. Load the data on the number of visitors to the instructors blog from here:

Using the commands:

`library(lubridate) # For year() function belowdat = read.csv("~/Desktop/gaData.csv")training = dat[year(dat\$date) < 2012,] testing = dat[(year(dat\$date)) > 2011,]tstrain = ts(training\$visitsTumblr)`

Fit a model using the bats() function in the forecast package to the training time series. Then forecast this model for the remaining time points. For how many of the testing points is the true value within the 95% prediction interval bounds?

• 93%
• 95%
• 94%
• 96%

Q5. Load the concrete data with the commands:

`set.seed(3523)library(AppliedPredictiveModeling)data(concrete)inTrain = createDataPartition(concrete\$CompressiveStrength, p = 3/4)[]training = concrete[ inTrain,]testin`

Set the seed to 325 and fit a support vector machine using the e1071 package to predict Compressive Strength using the default settings. Predict on the testing set. What is the RMSE?

• 45.09
• 6.72
• 11543.39
• 6.93

Prediction and machine learning are common activities undertaken by data scientists and data analysts. This course will cover the fundamentals of constructing and employing prediction functions, with a focus on practical applications. The course will teach a foundational understanding of such concepts as training and test sets, overfitting, and error rates. The course will also cover a variety of algorithmic and model-based machine learning techniques, including as regression, classification trees, Naive Bayes, and random forests. The course will cover the entire procedure for constructing prediction functions, including data gathering, feature generation, algorithm development, and evaluation.

This course is included in numerous curricula.
This course is applicable to a number of Specialization and Professional Certificate programs. This course will contribute to your education in any of the following programs:

• Data Science Specialization
• Statistics and Machine Learning Specialization in Data Science

WHAT YOU WILL Discover

• Utilize the fundamental elements of constructing and implementing prediction functions
• Understand training and testing sets, overfitting, and error rates.
• Describe machine learning approaches including regression and classification trees
• Describe the entire procedure for developing prediction functions.

SKILLS YOU WILL GAIN

• Random Forest
• Machine Learning (ML) Algorithms
• Machine Learning
• R Programming

### 1 thought on “Practical Machine Learning Coursera Quiz Answers 2022 | All Weeks Assessment Answers [💯Correct Answer]”

1. Very efficiently written story. It will be useful to anyone who employess it, as well as myself. Keep doing what you are doing – can’r wait to read more posts.  