Hello Peers, Today we are going to share** all week assessment and quizzes answers** of **Introduction to Machine Learning **course launched by Coursera for totally free of costâś…âś…âś…. This is a certification course for every interested students.

In case you didn’t find this course for free, then you can** apply for financial ads** to get this course for totally free.

*Checkout this article for – “How to Apply for Financial Ads?”*

**Coursera**, a **India’s biggest learning platform** which launched millions of free courses for students daily. These courses are from various recognized university, where industry experts and professors teaches in a very well manner and in a more understandable way.

Here, you will find ** Introduction to Machine Learning Exam Answers** in

**Bold Color**which are given below.

**These answers are updated recently and are** **100% correct**âś… **answers of all week, assessment and final exam answers of** **Introduction to Machine Learning****from Coursera** **Free** **Certification Course.**

Use â€śCtrl+Fâ€ť To Find Any Questions Answer. & For Mobile User, You Just Need To Click On Three dots In Your Browser & You Will Get A â€śFindâ€ť Option There. Use These Option to Get Any Random Questions Answer.

Apply Link – Introduction to Machine Learning Quiz

**Introduction to Machine Learning Quiz Answer**

**Week-1 **

Week 1 Comprehensive

1.

Question 1

Which of the following are necessary for supervised machine learning? (Choose all that are correct)

1 point

**A model****Learning from data****Labeled training data**- Human to teach the machine

2.

Question 2

What decision boundary can logistic regression provide?

1 point

- Arbitrarily complex functions
- Jagged edges
- Smooth curves
**Linear**

3.

Question 3

What is the primary advantage of using multiple filters?

1 point

- More complexity is always better.
- This requires less compute power.
**This allows the model to look for subtypes of the classification.**- This is simpler to implement.

4.

Question 4

Which one of the following best describes transfer learning in the context of document analysis?

1 point

- All parameters of the model are different between individuals.
**Parameters at the bottom of the model are transferable across all people and documents, while the parameters at the top are different between individuals.**- All parameters of the model are transferable across all people and documents.
- Parameters at the top of the model are transferable across all people and documents, while the parameters at the bottom are different between individuals.

5.

Question 5

Given the following image of data classifications, which of the following models would you choose?

1 point

**Logistic regression**- Multilayer perceptron

6.

Question 6

What new feature did neural networks acquire in 2010?

1 point

- A new computational platform: the GPU
- A new application: image search
- A new operation: convolution
**A new name: Deep Learning**

7.

Question 7

Which of the following is convolved with layer 2 features, or sub-motifs?

1 point

- Layer 2 feature map
**Layer 1 feature map**- Layer 3 feature map

8.

Question 8

Which of the following gives the best conceptual meaning of convolution?

1 point

- Surveying a feature map for high-level motif.
- Selecting an atomic element from an image.
- Stacking a collection of feature maps.
**Shifting a filter to every location in an image.**

9.

Question 9

What does transfer learning mean in the context of medical imaging?

1 point

- Just as assigning categories to images in ImageNet required millions of images, so too does analyzing medical images require millions of labeled medical images.
- Sufficient labeled radiological images can be used to learn all of the model parameters, so they can be used for ophthalmological or dermatological images.
- Once the convolutional layers are learned from labeled medical images, the top layers can be inferred from the parameters found with data from ImageNet.
**Weights of convolutional layers learned from ImageNet transfer to medical images, so we only need learn new parameters at the top of the network.**

10.

Question 10

What is the primary advantage of having a deep architecture?

1 point

- There is a higher probability that each motif is used in the classifier.
**The model shares knowledge between motifs through their shared substructures.**- A model can learn each top-level motif in isolation.
- The parameters of a deep architecture are less expensive to compute.

Week 2 Comprehensive

1.

Question 1

What does the equation for the loss function do conceptually?

1 point

- Mathematically define network outputs
**Penalize overconfidence**- Ignore historical statistical developments
- Reward indecision

2.

Question 2

What is overfitting?

1 point

- Overfitting refers to the fact that more complexity is always better, which is why deep learning works.
**Model complexity fits too well to training data and will not generalize in the real-world.**- Model complexity is perfectly matched to the data.
- Model complexity is not enough to capture the nuance of the data and will under-perform in the real-world.

3.

Question 3

Why should the test set only be used once?

1 point

**More than one use can lead to bias.**- More than one use can lead to overfitting.
- The model cannot learn anything new from subsequent uses.
- It is expensive to use more than once.

4.

Question 4

Which two of the following describe the purpose of a validation set?

1 point

- To estimate the performance of a model.
**To pick the best performing model.**- To test the performance in lieu of real-world data.
- To learn the model parameters.

5.

Question 5

How do we learn our network?

1 point

**Gradient descent**- Downhill skiing
- Monte Carlo simulation
- Analytically determine global minimum

6.

Question 6

What technique is used to minimize loss for a large data set?

1 point

- Newtonâ€™s method
- Taylor series expansion
**Stochastic gradient descent**- Gradient descent

7.

Question 7

Which of the following are benefits of stochastic gradient descent?

1 point

**With stochastic gradient descent, the update time does not scale with data size.**- Stochastic gradient descent finds the solution more accurately.
**Stochastic gradient descent can update many more times than gradient descent.**- Stochastic gradient descent gets near the solution quickly.
- Stochastic gradient descent finds a more exact gradient than gradient descent.

8.

Question 8

Why is gradient descent computationally expensive for large data sets?

1 point

- Large data sets do not permit computing the loss function, so a more expensive measure is used.
**Calculating the gradient requires looking at every single data point.**- Large data sets require deeper models, which have more parameters.
- There are too many local minima for an algorithm to find.

9.

Question 9

What are the two main benefits of early stopping?

1 point

**It helps save computation cost.****It performs better in the real world.**- It improves the training loss.
- There is rigorous statistical theory on it.

10.

Question 10

Why are optimization and validation at odds?

1 point

**Optimization seeks to do as well as possible on a training set, while validation seeks to generalize to the real world.**- Optimization seeks to generalize to the real world, while validation seeks to do as well as possible on a validation set.
- Optimization seeks to do as well as possible on a training set, while validation seeks to do as well as possible on a validation set.
- They are not at oddsâ€”they have the same goal.

Week 3 Comprehensive

1.

Question 1

Which of the following indicates whether a doctor or machine is doing well at finding positive examples in a data set?

1 point

- Positive Predictive Value
- Likelihood Ratio
**Sensitivity**- Specificity

2.

Question 2

Which of the following is used to distinguish the false positive rate from the false negative rate?

1 point

- Sensitivity
- False Negative
- Negative Predictive Value
**Specificity**

3.

Question 3

Which of the following is the best conceptual definition of one dimensional convolution?

1 point

- â€śInvertingâ€ť of a shape, where the inversion matches a feature.
**â€śSlidingâ€ť of two signals, where a matched feature gives a high value of convolution.**- â€śIntertwiningâ€ť of two signals, where one wraps around the other to form a feature.
- â€śDistortionâ€ť of one signal, according to the feature shape

4.

Question 4

Which of the following can a user choose when designing a convolutional layer? (Choose all that are correct.)

1 point

**Filter depth****Filter size****Filter number****Filter stride**- Filter weights

5.

Question 5

What is a fully connected readout?

1 point

- A layer with ten classifications.
- A layer with connections to all feature maps.
- The vectorization of a pooling layer.
**A layer with a single neuron for each output class.**

6.

Question 6

Why are nonlinear activation functions preferable?

1 point

- Nonlinear activation functions are preferable because they are used in generalized linear models in statistics.
**Nonlinear activation functions increase the functional capacity of the neural network by allowing the representation of nonlinear relationships between features in input.**- Nonlinear activation functions are preferable because they have been used historically.
- Nonlinear activation functions are NOT preferable to linear ones, as they lose information in systems with high variance.

7.

Question 7

Which of the following are benefits of pooling? (Choose all that are correct.)

1 point

**Decreases bias.****Combats overfitting.****Vectorizes the data.****Encourages translational invariance.**- Reduces computational complexity.

8.

Question 8

How are parameters that minimize the loss function found in practice?

1 point

- Fractal geometry
- Gradient descent
- Simplex algorithm
**Stochastic gradient descent**

9.

Question 9

Which of the following is an advantage of hierarchical representation of image features?

1 point

- Eliminating bias.
- Decreasing the computational complexity.
**Better leveraging all training data.**- Decreasing variance in the model.

10.

Question 10

Why does transfer learning work?

1 point

**Top-level features are specialized for a particular task, while low-level features are universal to all images.**- All layers of filters can be learned by studying the mammalian receptive fields.
- Low-level features are specialized for a particular task, while top-level features are universal to all images.
- All images are composed of pixels with three color channels.

Week 4 Comprehensive

1.

Question 1

What is meant by â€śword vectorâ€ť?

1 point

- The latitude and longitude of the place a word originated.
**A vector of numbers associated with a word.**- Assigning a corresponding number to each word.
- A vector consisting of all words in a vocabulary.

2.

Question 2

Which word is a synonym for â€śword vectorâ€ť?1 point

- Norm
- Array
**Embedding**- Stack

3.

Question 3

What is the term for a set of vectors, with one vector for each word in the vocabulary?

1 point

- Space
- Array
**Codebook**- Embedding

4.

Question 4

What is natural language processing?

1 point

- Making natural text conform to formal language standards.
- Translating natural text characters to unicode representations.
- Translating human-readable code to machine-readable instructions.
**Taking natural text and making inferences and predictions.**

5.

Question 5

What is the goal of learning word vectors?

1 point

- Find the hidden or latent features in a text.
- Labelling a text corpus, so a human doesnâ€™t have to do it.
- Determine the vocabulary in the codebook.
**Given a word, predict which words are in its vicinity.**

6.

Question 6

What function is the generalization of the logistic function to multiple dimensions?

1 point

- Hyperbolic tangent function
- Exponential log likelihood
- Squash function
**Softmax function**

7.

Question 7

What is the continuous bag of words (CBOW) approach?

1 point

**Vectors for the neighborhood of words are averaged and used to predict word n.**- Word n is used to predict the words in the neighborhood of word n.
- Word n is learned from a large corpus of words, which a human has labeled.
- The code for word n is fed through a CNN and categorized with a softmax.

8.

Question 8

What is the Skip-Gram approach?

1 point

**Word n is used to predict the words in the neighborhood of word n.**- The code for word n is fed through a CNN and categorized with a softmax.
- Word n is learned from a large corpus of words, which a human has labeled.
- Vectors for the neighborhood of words are averaged and used to predict word n.

9.

Question 9

What is the goal of the recurrent neural network?

1 point

- Learn a series of images that form a video.
- Predict words more efficiently than Skip-Gram.
**Synthesize a sequence of words.**- Classify an unlabeled image.

10.

Question 10

Which model is the state-of-the-art for text synthesis?

1 point

**Long short-term memory**- CNN
- Multilayer perceptron
- CBOW

**Conclusion**

Hopefully, this article will be useful for you to find all the **Week, final assessment and Peer Graded Assessment Answers of Introduction to Machine Learning Quiz of Coursera** and grab some premium knowledge with less effort. If this article really helped you in any way then make sure to share it with your friends on social media and let them also know about this amazing training. You can also check out our other course Answers. So, be with us guys we will share a lot more free courses and their exam/quiz solutions also and follow our Techno-RJ **Blog** for more updates.