Hello Learners, Today, we are going to share **Data Science with Scala Cognitive Class Course Exam Answer** launched by **IBM**. This certification course is totally **free of cost**✅✅✅ for you and available on **Cognitive Class** platform.

Here, you will find **Data Science with Scala Exam Answers** in **Bold Color** which are given below.

**These answers are updated recently and are** **100% correct**✅ **answers of all modules and final exam answers of** **Data Science with Scala****from Cognitive Class** **Certification Course.**

Course Name | Data Science with Scala |

Organization | IBM |

Skill | Online Education |

Level | Beginner |

Language | English |

Price | Free |

Certificate | Yes |

For participating in quiz/exam, first you will need to **enroll **yourself in the given link mention below and learn **Data Science with Scala** launched by IBM. Interested students must enroll for this courses and grab this golden opportunity which will definitely enhance their technical skills and you will learn more things in brief.

**Link for Course Enrollment: **`Enroll Now`

**Use “Ctrl+F” To Find Any Questions Answer. & For Mobile User, You Just Need To Click On Three dots In Your Browser & You Will Get A “Find” Option There. Use These Option to Get Any Random Questions Answer.**

**Data Science with Scala Cognitive Class Course Exam Answer**

**Module 1: Basic Statistics and Data Types**

**Question 1 : You import MLlib’s vectors from ?**

- org.apache.spark.mllib.TF
- org.apache.spark.mllib.numpy
**org.apache.spark.mllib.linalg**- org.apache.spark.mllib.pandas

**Question 2 :Select the types of distributed Matrices :**

**RowMatrix****IndexedRowMatrix****CoordinateMatrix**

**Question 3 :How would you caculate the mean of the following ?**

**val observations: RDD[Vector] = sc.parallelize(Array(**

**Vectors.dense(1.0, 2.0),**

**Vectors.dense(4.0, 5.0),**

**Vectors.dense(7.0, 8.0)))**

**val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)**

- summary.normL1
- summary.numNonzeros
**summary.mean**- summary.normL2

**Question 4 :what task does the following lines of code?**

**import org.apache.spark.mllib.random.RandomRDDs._**

**val million = poissonRDD(sc, mean=1.0, size=1000000L, numPartitions=10)**

- Calculate the variance
- calculate the mean
**generate random samples**- Calculate the variance

**Question 5 : MLlib uses the compressed sparse column format for sparse matrices, as Such it only keeps the non-zero entrees?**

**True**- False

**Module 2: Preparing Data**

**Question 1 : WFor a dataframe object the method describe calculates the ?**

- count
- mean
- standard deviation
- max
- min
**all of the above**

**Question 2:What line of code drops the rows that contain null values, select the best answer ?**

- val dfnan = df.withColumn(“nanUniform”, halfTonNaN(df(“uniform”)))
- dfnan.na.replace(“uniform”, Map(Double.NaN -> 0.0))
**dfnan.na.drop(minNonNulls = 3)**- dfnan.na.fill(0.0)

**Question 3:What task does the following lines of code perform ?**

**val lr = new LogisticRegression()**

**lr.setMaxIter(10).setRegParam(0.01)**

**val model1 = lr.fit(training)**

- perform one hot encoding
- Train a linear regression model
**Train a Logistic regression model**- Perform PCA on the data

**Question 4: The StandardScaleModel transforms the data such that ?**

- each feature has a max value of 1
- each feature is Orthogonal
**each feature to have a unit standard deviation and zero mean**- each feature has a min value of -1

**Module 3: Feature Engineering**

**Question 1: Spark ML works with?**

- tensors
- vectors
**dataframes**- lists

**Question 2:the function IndexToString() performs One hot encoding?**

- True
**False**

**Question 3: Principal Component Analysis is Primarily used for ?**

- to convert categorical variables to integers
- to predict discrete values
**dimensionality reduction**

**Question 4: one import set prior to using PCA is ?**

**normalizing your data**- making sure every feature is not correlated
- taking the log for your data
- subtracting the mean

** Module 4: Fitting a Model**

**Question 1 : You can use decision trees for ?**

- regression
- classification
**classification and regression**- data normalization

** Question 2 : the following lines of code: val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))**

- split the data into training and testing data
- train the model
- use 70% of the data for testing
- use 30% of the data for training
- make a prediction

** Question 3 : in the Random Forest Classifier constructor .setNumTrees() ?**

- sets the max depth of trees
- sets the minimum number of classes before a split
**set the number of trees**

** Question 4 : Elastic net regularization uses ?**

- L0-norm
- L1-norm
- L2-norm
**a convex combination of the L1 norm and L2 norm**

** Module 5: Pipeline and Grid Search**

**Question 1 : what task does the following code perform: withColumn(“paperscore”, data(“A2”) * 4 + data(“A”) * 3) ?**

- add 4 colunms to A2
- add 3 colunms to A1
- add 4 to each elment in colunm A2
**assign a higher weight to A2 and A journals**

**Question 2: In an estimator ?**

- there is no need to call the method fit
**fit function is called**- transform fuction is only called

**Question 3: Which is not a valid type of Evaluator in MLlib?**

- RegressionEvaluator
- MultiClassClassificationEvaluator
**MultiLabelClassificationEvaluator**- BinaryClassificationEvaluator
- All are valid

**Question 4: In the following lines of code, the last transform in the pipeline is a:**

**val rf = new RandomForestClassifier().setFeaturesCol(“assembled”).setLabelCol(“status”).setSeed(42)**

**import org.apache.spark.ml.Pipeline**

**val pipeline = new Pipeline().setStages(Array(value_band_indexer,category_indexer,label_indexer,assembler,rf))**

- principal component analysis
- Vector Assembler
- String Indexer
- Vector Assembler
**Random Forest Classifier**

**Final Exam Answers**

**Question 1**: **What is not true about labeled points?**

- They associate sparse vectors with a corresponding label/response
- They associate dense vectors with a corresponding label/response
**They are used in unsupervised machine learning algorithms**- All are true
- None are true

**Question 2**: **Which is true about column pointers in sparse matrices?**

**By themselves, they do not represent the specific physical location of a value in the matrix**- They never repeat values
- They have the same number of values as the number of columns
- All are true
- None are true

**Question 3**: **What is the name of the most basic type of distributed matrix?**

- CoordinateMatrix
- IndexedRowMatrix
- SparseMatrix
- SimpleMatrix
**RowMatrix**

**Question 4**: **A perfect correlation is represented by what value?**

- 3
**1**- -1
- 100
- 0

**Question 5**: **A MinMaxScaler is a transformer which:**

**Rescales each feature to a specific range**- Takes no parameters
- Makes zero values remain untransformed
- All are true
- None are true

**Question 6**: **Which is not a supported Random Data Generation distribution?**

- Poisson
- Uniform
- Exponential
**Delta**- Normal

**Question 7**: **Sampling without replacement means:**

- The expected number of times each element is chosen is randomized
**The expected size of the sample is a fraction of the RDDs size**- The expected number of times each element is chosen
- The expected size of the sample is unknown
- The expected size of the sample is the same as the RDDs size

**Question 8**: **What are the supported types of hypothesis testing?**

- Pearson’s Chi-Squared Test for goodness of fit
- Pearson’s Chi-Squared Test for independence
- Kolmogorov-Smirnov test for equality of distribution
**All are supported**- None are supported

**Question 9**: **For Kernel Density Estimation, which kernel is supported by Spark?**

- KDEMultivariate
- KDEUnivariate
**Gaussian**- KernelDensity
- All are supported

**Question 10**: **Which DataFrames statistics method computes the pairwise frequency table of the given columns?**

- freqItems()
- cov()
**crosstab()**- pairwiseFreq()
- corr()

**Question 11**: **Which is not true about the fill method for DataFrame NA functions?**

- It is used for replacing NaN values
**It is used for replacing nil values**- It is used for replacing null values
- All are true
- None are true

**Question 12**: **Which transformer listed below is used for Natural Language processing?**

- StandardScaler
- OneHotEncoder
- ElementwiseProduct
- Normalizer
**None are used for Natural Language processing**

**Question 13**: **Which is true about the Mahalanobis Distance?**

- It is a scale-variant distance
- It does not take into account the correlations of the dataset
**It is measured along each Principle Component axis**- It is a multi-dimensional generalization of measuring how many standard deviations a point is away from the median
- It has units of distance

**Question 14**: **Which is true about OneHotEncoder?**

- It must be told which column to create for its output
- It creates a Sparse Vector
- It must be told which column is its input
**All are true**- None are true

**Question 15**: **Principle Component Analysis is:**

- Is never used for feature engineering
- Used for supervised machine learning
**A dimension reduction technique**- All are true
- None are true

**Question 16**: **MLlib’s implementation of decision trees:**

- Supports only multiclass classification
- Does not support regressions
**Partitions data by rows, allowing distributed training**- Supports only continuous features
- None are true

**Question 17**: **Which is not a tunable of SparkML decision trees?**

- maxBins
- maxMemoryInMB
- minInstancesPerNode
**minDepth**- minInfoGain

**Question 18**: **Which is true about Random Forests?**

- They support non-categorical features
**They combine many decision trees in order to reduce the risk of overfitting**- They do not support regression
- They only support binary classification
- None are true

**Question 19**: **When comparing Random Forest versus Gradient-Based Trees, what must you consider?**

- How the number of trees affects the outcome
- Depth of Trees
- Parallelization abilities
**All of these**- None of these

**Question 20**: **Which is not a valid type of Evaluator in MLlib?**

**MultiLabelClassificationEvaluator**- RegressionEvaluator
- BinaryClassificationEvaluator
- MultiClassClassificationEvaluator
- All are valid

**Conclusion**

Hopefully, this article will be useful for you to find all the **Modules and Final Quiz Answers of Data Science with Scala of Cognitive Class** and grab some premium knowledge with less effort. If this article really helped you in any way then make sure to share it with your friends on social media and let them also know about this amazing training. You can also check out our other course Answers. So, be with us guys we will share a lot more free courses and their exam/quiz solutions also and follow our Techno-RJ **Blog** for more updates.

**FAQs**

**Can I get a Printable Certificate?**

**Certificate of Learning after successful completion of course. You can download a printed certificate or share completion certificates with others and add them to your LinkedIn profile.**

**Data Science with Scala****Why should you choose online courses?**

**Is this course is free?**

**Course is totally free for you. The only thing is needed i.e. your dedication towards learning this course.**

**Data Science with Scala**
Very interesting information!Perfect just what I was looking for! “Better and ugly face than an ugly mind.” by James.

fantástico este conteúdo. Gostei muito. Aproveitem e vejam este conteúdo. informações, novidades e muito mais. Não deixem de acessar para se informar mais. Obrigado a todos e até a próxima. 🙂

This actually answered my problem, thank you!