Hello Peers, Today we are going to share all week’s assessment and quizzes answers of the Statistics for Genomic Data Science course launched by Coursera totally free of cost✅✅✅. This is a certification course for every interested student.
In case you didn’t find this course for free, then you can apply for financial ads to get this course for totally free.
Check out this article – “How to Apply for Financial Ads?”
About The Coursera
Coursera, India’s biggest learning platform launched millions of free courses for students daily. These courses are from various recognized universities, where industry experts and professors teach in a very well manner and in a more understandable way.
Here, you will find Statistics for Genomic Data Science Exam Answers in Bold Color which are given below.
These answers are updated recently and are 100% correct✅ answers of all week, assessment, and final exam answers of Statistics for Genomic Data Science from Coursera Free Certification Course.
Use “Ctrl+F” To Find Any Questions Answer. & For Mobile User, You Just Need To Click On Three dots In Your Browser & You Will Get A “Find” Option There. Use These Option to Get Any Random Questions Answer.
About Statistics for Genomic Data Science Course
An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.
SKILLS YOU WILL GAIN
- Statistics
- Data Analysis
- R Programming
- Biostatistics
Course Apply Link – Statistics for Genomic Data Science
Statistics for Genomic Data Science Quiz Answers
Week 1 Quiz Answers
Quiz 1: Module 1 Quiz
Q1. Reproducibility is defined informally as the ability to recompute data analytic results conditional on an observed data set and knowledge of the statistical pipeline used to calculate them Peng 2011, Science. Replicability of a study is the chance that a new experiment targeting the same scientific question will produce a consistent result Asendorpf 2013 European Journal of Personality.
- Susan asks Joe for his data shared according to the data sharing plan discussed in the lectures. Which of the following are reasons the study may be reproducible, but not replicable?
- Susan only uses Python and Joe uses R so when she runs a new experiment, she will definitely get different results.
- Joe doesn’t make the raw data accessible so Susan can’t re-run his code.
- All the data and code are available but the codebook does not fully explain the experimental design and all protocols for patient recruitment.
- The code and data are available so the study is always replicable if it is reproducible.
Q2. Put the following code chunk at the top of an R markdown document called test.Rmd but set \verb|eval=TRUE|eval=TRUE
{r setup, eval=FALSE}knitr::opts_chunk$set(cache=TRUE)
Then create the following code chunks
{r }x = rnorm(10)plot(x,pch=19,col="dodgerblue")
{r }y = rbinom(20,size=1,prob=0.5)table(y)
- The plot is random the first time you knit the document. It is identical to the first time the second time you knit the document. After removing the folders \verb|test_cache|test_cache and \verb|test_files|test_files they generate new random versions.
- The plot and table are random the first time you knit the document. They are identical the second time you knit the document. After removing the folders \verb|test_cache|test_cache and \verb|test_files|test_files they are still identical.
- The table is random each time you knit the document, but the plot is always the same after you knit it the first time.
- The plot and table are random every time you kint the document, except for the last time.
Q3. Create a \verb|summarizedExperiment|summarizedExperiment object with the following code
library(Biobase)library(GenomicRanges)data(sample.ExpressionSet, package = "Biobase")se = makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet)
Look up the help files for \verb|summarizedExperiment|summarizedExperiment with the code \verb|?summarizedExperiment|?summarizedExperiment. How do you access the genomic data for this object? How do you access the phenotype table? How do you access the feature data? What is the unique additional information provided by \verb|rowRanges(se)|rowRanges(se)?
- Get the genomic table with \verb|assay(se)|assay(se), get the phenotype table with \verb|colData(se)|colData(se), get the feature data with \verb|rowData(se)|rowData(se). \verb|rowRanges(se)|rowRanges(se) gives information on the genomic location and structure of the measured features.
- Get the genomic table with \verb|assay(se)|assay(se), get the phenotype table with \verb|colData(se)|colData(se), get the feature data with \verb|rowRanges(se)|rowRanges(se). \verb|rowRanges(se)|rowRanges(se) gives the range of possible values for the expression data.
- Get the genomic table with
assay(se)
, get the phenotype table withpData(se)
, get the feature data withrowData(se)
.rowRanges(se)
gives information on the genomic location and structure of the measured features. - Get the genomic table with \verb|assay(se)|assay(se), get the phenotype table with \verb|colData(se)|colData(se), get the feature data with \verb|rowData(se)|rowData(se). \verb|rowRanges(se)|rowRanges(se) gives the range of possible values for the expression data.
Q4. Suppose that you have measured ChIP-Seq data from 10 healthy individuals and 10 metastatic cancer patients. For each individual you split the sample into two identical sub-samples and perform the ChIP-Seq experiment on each sub-sample. How can you measure (a) biological variability, (b) technical variability and (c) phenotype variability.
- (a) By looking at variation across samples from 10 different healthy individuals
- (b) By looking at variability between the measurements on the two sub-samples from the same sample and
- (c) by comparing the average measurements on the healthy individuals to the measurements on the individuals with cancer.
- (a) By looking at variation across samples from 10 different individuals with cancer
- (b) By comparing the average variability in the cancer and normal individuals
- (c) By comparing the average measurements on the healthy individuals to the measurements on the individuals with cancer.
- (a) By looking at variation across replicate sub-samples within the normal individuals
- (b) By looking at variation across samples from 10 different healthy individuals
- (c) By comparing the average measurements on the healthy individuals to the measurements on the individuals with cancer.
- (a) & (b) By looking at variation across samples from 10 different healthy individuals.
- (c) by comparing the average measurements on the healthy individuals to the measurements on the individuals with cancer.
Q5. Load the Bottomly and the Bodymap data sets with the following code:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bottomly_eset.RData")load(file=con)close(con)bot = bottomly.esetpdata_bot=pData(bot)
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.eset
- Just considering the phenotype data what are some reasons that the Bottomly data set is likely a better experimental design than the Bodymap data? Imagine the question of interest in the Bottomly data is to compare strains and in the Bodymap data it is to compare tissues.
- The Bottomly data has biological replicates for each group but the Bodymap data does not.
- The Bodymap data has measured more levels of the outcome of interest (tissues) than the Bottomly data has measured (strains).
- The Bottomly data set does not measure the age of the mice.
- Most of the tissues in the Bodymap data have a consistent number of technical replicates (2).
Q6. What are some reasons why this plot is not useful for comparing the number of technical replicates by tissue (you may need to install the plotrix package).
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetpdata_bm=pData(bm)
library(plotrix)
pie3D(pdata_bm$num.tech.reps,labels=pdata_bm$tissue.type)
- The plot is in 3-d so it makes it hard to compare the angles.
- There are a large number of data points underlying each wedge and you can’t see them.
- The plot would be much easier to see if the pie chart were rotated by 90 degrees from its current position.
- There is nothing wrong with the plot, it accurately shows how many replicates of each type there are.
Q7. Load the Bottomly data:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetedata = exprs(bm)Which of the following code chunks will make a heatmap of the 500 most highly expressed genes (as defined by total count), without re-ordering due to clustering? Are the highly expressed samples next to each other in sample order?
row_sums = rowSums(edata)index = which(rank(-row_sums) < 500 )heatmap(edata[index,],Rowv=NA,Colv=NA)The highly expressed samples are next to each other.
row_sums = rowSums(edata)index = which(rank(-row_sums) < 500 )heatmap(edata[index,],Rowv=NA)The highly expressed samples are next to each other.
row_sums = rowSums(edata)index = which(rank(-row_sums) < 500 )heatmap(edata[index,],Rowv=NA)The highly expressed samples are not next to each other.
row_sums = rowSums(edata)edata = edata[order(row_sums),]index = which(rank(-row_sums) < 500 )heatmap(edata[index,],Rowv=NA,Colv=NA)No they are not next to each other.
Q8. Load the Bodymap data using the following code:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetpdata = pData(bm)edata = exprs(bm)Make an MA-plot of the first sample versus the second sample using the log2 transform (hint: you may have to add 1 first) and the \verb|rlog|rlog transform from the DESeq2 package. How are the two MA-plots different? Which kind of genes appear most different in each plot?
The plots are very different, there are two strong diagonal stripes (corresponding to the zero count genes) in the \verb|log2|log2 plot and the high abundance genes are most different, but the low abundance genes seem to show smaller differences with the \verb|rlog|rlog transform
- The plots look pretty similar, but the \verb|rlog|rlog transform seems to shrink the low abundance genes more. In both cases, the genes in the middle of the expression distribution show the biggest differences.
- The plots look pretty similar, but there are two strong diagonal stripes (corresponding to the zero count genes) in the \verb|rlog|rlog plot. In both cases, the genes in the middle of the expression distribution show the biggest differences, but the low abundance genes seem to show smaller differences with the \verb|log2|log2 transform.
- The plots look pretty similar, but the \verb|log2|log2 plot seems to do a better job of shrinking the low abundance genes toward each other. In both cases, the genes in the middle of the expression distribution show the biggest differences.
Q9. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData") load(file=con) close(con) mp = montpick.eset pdata=pData(mp) edata=as.data.frame(exprs(mp)) fdata = fData(mp) Cluster the data in three ways:
With no changes to the data
After filtering all genes with \verb|rowMeans|rowMeans less than 100
After taking the \verb|log2|log2 transform of the data without filtering
Color the samples by which study they came from (Hint: consider using the function \verb|myplclust.R|myplclust.R in the package \verb|rafalib|rafalib available from CRAN and looking at the argument \verb|lab.col|lab.col.)
How do the methods compare in terms of how well they cluster the data by study? Why do you think that is?
- Clustering with or without log2 transform is about the same. Clustering after filtering shows better clustering with respect to the study variable. The reason is that the lowly expressed genes have some extreme outliers that skew the calculation.
- Clustering is identical with all three approaches and they show equal clustering. The \verb|log2|log2 transform is a monotone transformation so it doesn’t affect the clustering.
- Clustering is identical with all three approaches and they show equal clustering. The distance is an average over all the dimensions so it doesn’t change.
- Clustering with or without filtering is about the same. Clustering after the log2 transform shows better clustering with respect to the study variable. The likely reason is that the highly skewed distribution doesn’t match the Euclidean distance metric being used in the clustering example.
Q10. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))fdata = fData(mp)Cluster the samples using k-means clustering after applying the \verb|log2|log2 transform (be sure to add 1). Set a seed for reproducible results (use \verb|set.seed(1235)|set.seed(1235)). If you choose two clusters, do you get the same two clusters as you get if you use the \verb|cutree|cutree function to cluster the samples into two groups? Which cluster matches most closely to the study labels?
- They produce different answers. The k-means clustering matches study better. Hierarchical clustering would look better if we went farther down the tree but the top split doesn’t perfectly describe the study variable.
- They produce the same answers and match the study variable equally well.
- They produce the same answers except for three samples that hierarchical clustering correctly assigns to the right study but k-means does not.
- They produce different answers, with k-means clustering giving a much more unbalanced clustering. The hierarchical clustering matches study better.
Week 2
Module 2 Quiz
Q1. Load the Montgomery and Pickrell eSet:
7con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))fdata = fData(mp)
What percentage of variation is explained by the 1st principal component in the data set if you:
Do no transformations?
log2(data + 1) transform?
log2(data + 1) transform and subtract row means?
- a. 0.97 b. 0.97 c. 0.97
- a. 0.89 b. 0.97 c. 0.35
- a. 0.97 b. 0.97 c. 0.35
- a. 0.35 b. 0.35 c. 0.35
Q2. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))fdata = fData(mp)
Perform the log2(data + 1) transform and subtract row means from the samples. Set the seed to \verb|333|333 and use k-means to cluster the samples into two clusters. Use \verb|svd|svd to calculate the singular vectors. What is the correlation between the first singular vector and the sample clustering indicator?
- 0.33
- 0.85
- 0.87
- -0.52
Q3. Load the Bodymap data with the following command
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetedata = exprs(bm)pdata_bm=pData(bm)
Fit a linear model relating the first gene’s counts to the number of technical replicates, treating the number of replicates as a factor. Plot the data for this gene versus the covariate. Can you think of why this model might not fit well?
- The difference between 2 and 5 technical replicates is not the same as the difference between 5 and 6 technical replicates.
- There is only one data point with a value of 6 so it is likely that the estimated value for that number of technical replicates is highly variable.
- There may be different numbers of counts for different numbers of technical replicates.
- The data are right skewed.
Q4. Load the Bodymap data with the following command
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetedata = exprs(bm)pdata_bm=pData(bm)
Fit a linear model relating he first gene’s counts to the age of the person and the sex of the samples. What is the value and interpretation of the coefficient for age?
- -23.91. This coefficient means that for each additional year of age, the count goes down by an average of 23.91 for a fixed sex.
- -207.26. This coefficient means that for each additional year of age, the count goes down by an average of 207.26 for a fixed sex.
- -23.25. This coefficient means that there is an average decrease of 23.91 in the count variable per year within each gender.
- -22.26. This coefficient means that for each additional year of age, the count goes down by an average of 207.26 for a fixed sex.
Q5. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))fdata = fData(mp)
Perform the log2(data + 1) transform. Then fit a regression model to each sample using population as the outcome. Do this using the \verb|lm.fit|lm.fit function (hint: don’t forget the intercept). What is the dimension of the residual matrix, the effects matrix and the coefficients matrix?
- Residual matrix: 52580 x 129
- Effects matrix: 52580 x129
- Coefficients matrix: 2 x 52580
- Residual matrix: 129 x 52580
- Effects matrix: 129 x 52580
- Coefficients matrix: 2 x 52580
- Residual matrix: 52580 x 129
- Effects matrix: 129 x 52580
- Coefficients matrix: 2 x 52580
- Residual matrix: 129 x 52580
- Effects matrix: 129 x 52580
- Coefficients matrix: 1 x 52580
Q6. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))fdata = fData(mp)
Perform the log2(data + 1) transform. Then fit a regression model to each sample using population as the outcome. Do this using the \verb|lm.fit|lm.fit function (hint: don’t forget the intercept). What is the effects matrix?
- The estimated fitted values for all samples for each gene, with the values for each gene stored in the rows of the matrix.
- The model coefficients for all samples for each gene, with the values for each gene stored in the columns of the matrix.
- The estimated fitted values for all samples for each gene, with the values for each gene stored in the columns of the matrix.
- The model residuals for all samples for each gene, with the values for each gene stored in the columns of the matrix.
Q7. Load the Bodymap data with the following command
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetedata = exprs(bm)pdata_bm=pData(bm)
Fit many regression models to the expression data where \verb|age|age is the outcome variable using the \verb|lmFit|lmFit function from the \verb|limma|limma package (hint: you may have to subset the expression data to the samples without missing values of age to get the model to fit). What is the coefficient for age for the 1,000th gene? Make a plot of the data and fitted values for this gene. Does the model fit well?
- -27.61. The model fits well since there seems to be a flat trend in the counts.
- 2469.87. The model doesn’t fit well since there appears to be a non-linear trend in the data.
- -27.61. The model doesn’t fit well since there are two large outlying values and the rest of the values are near zero.
- -27.61. The model doesn’t fit well since there appears to be a non-linear trend in the data.
Q8. Load the Bodymap data with the following command
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetedata = exprs(bm)pdata_bm=pData(bm)
Fit many regression models to the expression data where \verb|age|age is the outcome variable and \verb|tissue.type|tissue.type is an adjustment variable using the \verb|lmFit|lmFit function from the \verb|limma|limma package (hint: you may have to subset the expression data to the samples without missing values of age to get the model to fit). What is wrong with this model?
- Since \verb|tissue.type|tissue.type is a factor variable with many levels, this model has more coefficients to estimate per gene (18) than data points per gene (16).
- Normally this model wouldn’t fit well since we have more coefficients (18) than data points per gene (16). But since we have so many genes to estimate with, the model fits well.
- The model doesn’t fit well since \verb|age|age should be treated as a factor variable.
- The model doesn’t fit well because there are a large number of outliers for the white blood cell tissue.
Q9. Why is it difficult to distinguish the study effect from the population effect in the Montgomery Pickrell dataset from ReCount?
- The study effects and population effects are difficult to distinguish because the population effect is not adjusted for study.
- The study effects and population effects are difficult to distinguish because the study effects are stronger.
- The effects are difficult to distinguish because each study only measured one population.
- The study effects and population effects are not difficult to distinguish since they are the same effect.
Q10. Load the Bodymap data with the following command
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")load(file=con)close(con)bm = bodymap.esetedata = exprs(bm)pdata_bm=pData(bm)
Set the seed using the command \verb|set.seed(33353)|set.seed(33353) then estimate a single surrogate variable using the \verb|sva|sva function after log2(data + 1) transforming the expression data, removing rows with rowMeans less than 1, and treating age as the outcome (hint: you may have to subset the expression data to the samples without missing values of age to get the model to fit). What is the correlation between the estimated surrogate for batch and age? Is the surrogate more highly correlated with \verb|race|race or \verb|gender|gender?
- Correlation with age: 0.33
- More highly correlated with gender.
- Correlation with age: 0.99
- More highly correlated with race.
- Correlation with age: 0.99
- More highly correlated with gender.
- Correlation with age: 0.20
- More highly correlated with gender.
Week 3
Module 3 Quiz
Q1. Load the example SNP data with the following code:
library(snpStats)library(broom)data(for.exercise)use <- seq(1, ncol(snps.10), 10)sub.10 <- snps.10[,use]snpdata = sub.10@.Datastatus = subject.support$cc
Fit a linear model and a logistic regression model to the data for the 3rd SNP. What are the coefficients for the SNP variable? How are they interpreted? (Hint: Don’t forget to recode the 0 values to NA for the SNP data)
- Linear Model = 0.54
- Logistic Model = 0.18
Both models are fit on the additive scale. So in both cases the coefficient is the decrease in probability associated with each additional copy of the minor allele.
- Linear Model = 0.54
- Logistic Model = 0.18
Both models are fit on the additive scale. So in the linear model case, the coefficient is the decrease in probability associated with each additional copy of the minor allele. In the logistic regression case, it is the decrease in the log odds ratio associated with each additional copy of the minor allele.
- Linear Model = -0.16
- Logistic Model = -0.04
Both models are fit on the additive scale. So in the linear model case, the coefficient is the decrease in probability associated with each additional copy of the minor allele. In the logistic regression case, it is the decrease in the log odds ratio associated with each additional copy of the minor allele.
- Linear Model = -0.04
- Logistic Model = -0.16
Both models are fit on the additive scale. So in the linear model case, the coefficient is the decrease in probability associated with each additional copy of the minor allele. In the logistic regression case, it is the decrease in the log odds ratio associated with each additional copy of the minor allele.
Q2. In the previous question why might the choice of logistic regression be better than the choice of linear regression?
- If you included more variables it would be possible to get negative estimates for the probability of being a case from the linear model, but this would be prevented with the logistic regression model.
- The linear model only allows modeling relationships on the additive scale but we might want to consider a dominant or recessive model.
- It is customary to use logistic regression for case-control data like those obtained from genome-wide association studies.
- The log odds is always more interpretable than a change in probability on the additive scale.
Q3. Load the example SNP data with the following code:
library(snpStats)library(broom)data(for.exercise)use <- seq(1, ncol(snps.10), 10)sub.10 <- snps.10[,use]snpdata = sub.10@.Datastatus = subject.support$cc
Fit a logistic regression model on a recessive (need 2 copies of minor allele to confer risk) and additive scale for the 10th SNP. Make a table of the fitted values versus the case/control status. Does one model fit better than the other?
- No, in all cases, the fitted values are near 0.5 and there are about an equal number of cases and controls in each group. This is true regardless of whether you fit a recessive or additive model.
- The additive model fits much better since there are fewer parameters to fit and the effect size is so large.
- The recessive model shows a strong effect, but the additive model shows no difference so the recessive model is better.
- The recessive model fits much better since it appears that once you aggregate the heterozygotes and homozygous minor alleles, there is a bigger difference in the proportion of cases and controls.
Q4. Load the example SNP data with the following code:
library(snpStats)library(broom)data(for.exercise)use <- seq(1, ncol(snps.10), 10)sub.10 <- snps.10[,use]snpdata = sub.10@.Datastatus = subject.support$cc
Fit an additive logistic regression model to each SNP. What is the average effect size? What is the max? What is the minimum?
- Average effect size = 0.02, minimum = -0.88, maximum = 0.88
- Average effect size = -0.02, minimum =-3.59 , maximum = 4.16
- Average effect size = 1.35, minimum =-6.26 , maximum = 6.26
- Average effect size = 0.007, minimum = -4.25, maximum = 3.90
Q5. Load the example SNP data with the following code:
library(snpStats)library(broom)data(for.exercise)use <- seq(1, ncol(snps.10), 10)sub.10 <- snps.10[,use]snpdata = sub.10@.Datastatus = subject.support$cc
Fit an additive logistic regression model to each SNP and square the coefficients. What is the correlation with the results from using \verb|snp.rhs.tests|snp.rhs.tests and \verb|chi.squared|chi.squared? Why does this make sense?
- 0.81 They are both testing for the same association using the same additive regression model on the logistic scale. But it doesn’t make sense since they should be perfectly correlated.
- 0.99. They are both testing for the same association using the same additive regression model on the logistic scale. But it doesn’t make sense since they should be perfectly correlated.
- 0.99. It doesn’t make sense since they are both testing for the same association using the same additive regression model on the logistic scale but using slightly different tests.
- 0.99. They are both testing for the same association using the same additive regression model on the logistic scale but using slightly different tests.
Q6. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))fdata = fData(mp)
Do the log2(data + 1) transform and fit calculate F-statistics for the difference between studies/populations using genefilter:rowFtests and using genefilter:rowttests. Do you get the same statistic? Do you get the same p-value?
- You get different p-values and statistics. The F-statistic and t-statistic are testing the same thing but do it totally differently.
- You get different p-values and statistics. The F-statistic and t-statistic are testing totally different things.
- You get the same p-values and statistics. This is because the F-statistic and t-statistic are the exact same in this case.
- You get the same p-value but different statistics. This is because the F-statistic and t-statistic test the same thing when doing a two group test and one is a transform of the other.
Q7. Load the Montgomery and Pickrell eSet:
con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")load(file=con)close(con)mp = montpick.esetpdata=pData(mp)edata=as.data.frame(exprs(mp))edata = edata[rowMeans(edata) > 100,]fdata = fData(mp)
First test for differences between the studies using the \verb|DESeq2|DESeq2 package using the \verb|DESeq|DESeq function. Then do the log2(data + 1) transform and do the test for differences between studies using the \verb|limma|limma package and the \verb|lmFit|lmFit, \verb|ebayes|ebayes and \verb|topTable|topTable functions. What is the correlation in the statistics between the two analyses? Are there more differences for the large statistics or the small statistics (hint: Make an MA-plot).
- 0.63. There are more differences for the large statistics.
- 0.93. There are more differences for the small statistics.
- 0.93. There are more differences for the large statistics.
- 0.85. There are more differences for the large statistics.
Q8. Apply the Benjamni-Hochberg correction to the P-values from the two previous analyses. How many results are statistically significant at an FDR of 0.05 in each analysis?
- DESeq = 1119 significant;
- limma = 2328 significant
- DESeq = 0 significant;
- limma = 0 significant
- DESeq = 12 significant;
- limma = 3significant
- DESeq = 1995 significant;
- limma = 2807 significant
Q9. Is the number of significant differences surprising for the analysis comparing studies from Question 8? Why or why not?
- Yes and no. It is surprising because there is a large fraction of the genes that are significantly different, but it isn’t that surprising because we would expect that when comparing measurements from very different batches.
- Yes. This is a very large number of genes different between studies and we don’t have a good explanation.
- Yes and no. It is surprising because there very few genes that are significantly different, but it isn’t that surprising because we would expect that when comparing measurements from very different batches.
- No. There are very few genes different between studies and that is what we would expect.
Q10. Suppose you observed the following P-values from the comparison of differences between studies. Why might you be suspicious of the analysis?
- There are too many small p-values so there are too may statistically significant results.
- This p-value histogram appears correct in the case where there is very little signal in the data .
- The p-values should have a spike near zero (the significant results) and be flat to the right hand side (the null results) so the distribution pushed toward one suggests something went wrong.
- This p-value histogram appears correct in the case where there is a large number of statistically significant results.
Week 4
Module 4 Quiz
Q1. When performing gene set analysis it is critical to use the same annotation as was used in pre-processing steps. Read the paper behind the Bottomly data set on the ReCount database: http://www.ncbi.nlm.nih.gov/pubmed?term=21455293
Using the paper and the function: \verb|supportedGenomes()|supportedGenomes() in the \verb|goseq|goseq package can you figure out which of the Mouse genome builds they aligned the reads to.
- UCSC mm9
- UCSC hg18
- UCSC hg19
- NCBI Build 35
Q2. Load the Bottomly data with the following code and perform a differential expression analysis using \verb|limma|limma with only the strain variable as an outcome. How many genes are differentially expressed at the 5% FDR level using Benjamini-Hochberg correction? What is the gene identifier of the first gene differentially expressed at this level (just in order, not the smallest FDR) ? (hint: the \verb|featureNames|featureNames function may be useful)
library(Biobase)library(limma)con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bottomly_eset.RData")load(file=con)close(con)bot = bottomly.esetpdata_bot=pData(bot)fdata_bot = featureData(bot)edata = exprs(bot)fdata_bot = fdata_bot[rowMeans(edata) > 5]
- 90 at FDR 5%; ENSMUSG00000000001 first DE gene
- 9431 at FDR 5%; ENSMUSG00000027855 first DE gene
- 223 at FDR 5%; ENSMUSG00000027855 first DE gene
- 223 at FDR 5%;
- ENSMUSG00000000402 first DE gene
Q3. Use the \verb|nullp|nullp and \verb|goseq|goseq functions in the \verb|goseq|goseq package to perform a gene ontology analysis. What is the top category that comes up as over represented? (hint: you will need to use the genome information on the genome from question 1 and the differential expression analysis from question 2.
- GO:0008528
- GO:0038023
- GO:0004888
- GO:0001653
Q4. Look up the GO category that was the top category from the previous question. What is the name of the category?
- peptide receptor activity
- G-protein coupled peptide receptor activity
- transmembrane signaling receptor activity
- signaling receptor activity
Q5. Load the Bottomly data with the following code and perform a differential expression analysis using \verb|limma|limma and treating strain as the outcome but adjusting for lane as a factor. Then find genes significant at the 5% FDR rate using the Benjamini Hochberg correction and perform the gene set analysis with \verb|goseq|goseq following the protocol from the first 4 questions. How many of the top 10 overrepresented categories are the same for the adjusted and unadjusted analysis?
library(Biobase)library(limma)con =url("https://bowtie-bio.sourceforge.net/recount/ExpressionSets/bottomly_eset.RData")load(file=con)close(con)bot = bottomly.esetpdata_bot=pData(bot)fdata_bot = featureData(bot)edata = exprs(bot)fdata_bot = fdata_bot[rowMeans(edata) > 5]
- 10
- 0
- 3
- 2
Conclusion
Hopefully, this article will be useful for you to find all the Week, final assessment, and Peer Graded Assessment Answers of Statistics for Genomic Data Science Quiz of Coursera and grab some premium knowledge with less effort. If this article really helped you in any way then make sure to share it with your friends on social media and let them also know about this amazing training. You can also check out our other course Answers. So, be with us guys we will share a lot more free courses and their exam/quiz solutions also, and follow our Techno-RJ Blog for more updates.
An interesting discussion is worth comment. I think that you should write more on this topic, it might not be a taboo subject but generally people are not enough to speak on such topics. To the next. Cheers
Very interesting points you have noted, thanks for posting. “The only thing worse than a man you can’t control is a man you can.” by Margo Kaufman.
Wow! This could be one particular of the most helpful blogs We’ve ever arrive across on this subject. Basically Fantastic. I’m also a specialist in this topic therefore I can understand your effort.
Greetings! Very helpful advice on this article! It is the little changes that make the biggest changes. Thanks a lot for sharing!
so much excellent information on here, : D.
I think this is among the most significant information for me. And i’m glad reading your article. But wanna remark on some general things, The web site style is wonderful, the articles is really nice : D. Good job, cheers
magnificent points altogether, you simply gained a brand new reader. What would you suggest about your post that you made a few days ago? Any positive?
I have learn several just right stuff here. Definitely worth bookmarking for revisiting. I wonder how so much effort you put to make such a fantastic informative website.
I am impressed with this internet site, real I am a big fan .
Merely wanna input on few general things, The website style and design is perfect, the articles is very superb. “The way you treat yourself sets the standard for others.” by Sonya Friedman.
Hi, i think that i saw you visited my weblog so i got here to “return the want”.I’m attempting to find things to enhance my website!I assume its ok to make use of some of your concepts!!
I really appreciate this post. I?¦ve been looking all over for this! Thank goodness I found it on Bing. You’ve made my day! Thanks again
Hi! I’ve been following your site for a long time now and finally got the bravery to go ahead and give you a shout out from Dallas Tx! Just wanted to mention keep up the great work!
You really make it seem so easy together with your presentation but I find this matter to be really something which I think I might by no means understand. It seems too complicated and very large for me. I am having a look forward in your next put up, I will try to get the dangle of it!
I gotta favorite this website it seems very useful very helpful
Good – I should definitely pronounce, impressed with your website. I had no trouble navigating through all tabs and related information ended up being truly simple to do to access. I recently found what I hoped for before you know it in the least. Quite unusual. Is likely to appreciate it for those who add forums or anything, web site theme . a tones way for your client to communicate. Excellent task..
You have noted very interesting points! ps nice site.
I am always thought about this, thankyou for putting up.
Excellent website. Plenty of useful info here. I am sending it to several buddies ans additionally sharing in delicious. And obviously, thanks to your sweat!
Hello, i feel that i saw you visited my website thus i got here to “return the desire”.I am trying to in finding issues to improve my site!I suppose its good enough to use a few of your ideas!!
tadalafil 20mg us buy cialis 10mg for sale buy ed pills generic
order generic cefadroxil 500mg purchase epivir online cheap proscar 1mg ca
estrace medication generic lamictal 50mg minipress 1mg sale
buy diflucan generic fluconazole 100mg for sale cipro where to buy
mebendazole brand order generic tretinoin gel buy tadalis generic
metronidazole 200mg us where can i buy keflex cephalexin 250mg uk
buy generic avana online buy avanafil 200mg online voltaren medication
buy generic cleocin 300mg where can i buy cleocin sildenafil pills
order indocin online order indocin pill cefixime 100mg sale
buy tamoxifen 10mg without prescription buy budesonide pills cefuroxime 250mg sale
buy amoxicillin pills for sale purchase anastrozole for sale order clarithromycin pill
clonidine drug buy catapres 0.1 mg for sale order tiotropium bromide 9 mcg
minocycline 50mg sale order pioglitazone 30mg pioglitazone canada
suhagra 100mg ca sildenafil over the counter purchase sildalis pill
order absorica pills amoxil for sale azithromycin 500mg price
order cialis 10mg pill cialis 20mg order generic tadalafil 20mg
buy azithromycin 500mg pill azipro where to buy gabapentin for sale online
stromectol 3 mg price buy generic stromectol for sale deltasone 40mg price
furosemide 100mg generic albuterol without prescription buy albuterol generic
buy vardenafil 10mg generic purchase levitra sale order plaquenil
Good write-up, I?¦m normal visitor of one?¦s blog, maintain up the nice operate, and It is going to be a regular visitor for a lengthy time.
generic altace 5mg amaryl 1mg tablet order arcoxia 120mg pills
buy cheap generic levitra buy tizanidine generic hydroxychloroquine 200mg ca
Nice blog here! Also your site loads up very fast! What web host are you using? Can I get your affiliate link to your host? I wish my website loaded up as quickly as yours lol
asacol online order avapro 300mg pills purchase irbesartan generic
I wanted to thank you for this great read!! I definitely enjoying every little bit of it I have you bookmarked to check out new stuff you post…
buy benicar sale benicar for sale depakote 500mg tablet
You made several nice points there. I did a search on the matter and found the majority of folks will consent with your blog.
baricitinib 2mg for sale olumiant 2mg oral lipitor 10mg price
An interesting discussion is worth comment. I think that you should write more on this topic, it might not be a taboo subject but generally people are not enough to speak on such topics. To the next. Cheers
buy norvasc 10mg online prinivil for sale online prilosec 20mg uk
Glad to be one of the visitants on this awful site : D.
buy lopressor 50mg sale atenolol 100mg brand methylprednisolone 8 mg tablets
buy aristocort 10mg online cheap purchase desloratadine generic order claritin without prescription
ampicillin ca ampicillin uk buy generic flagyl
buy trimethoprim for sale cleocin sale buy cleocin without prescription
There is noticeably a bundle to know about this. I assume you made certain nice points in features also.
buy erythromycin best drug for ed nolvadex 10mg without prescription
rhinocort sale purchase ceftin sale bimatoprost price
buy methocarbamol 500mg buy suhagra 50mg pills order suhagra 100mg pill
sildenafil buy online buy estradiol 1mg online order estradiol 1mg for sale
retin cream uk retin gel drug buy avanafil 200mg online cheap
Great paintings! This is the type of info that are supposed to be shared around the net. Shame on Google for no longer positioning this publish higher! Come on over and talk over with my site . Thanks =)
There is noticeably a bundle to learn about this. I assume you made certain good points in options also.
buy terbinafine pill order terbinafine 250mg online cheap trimox 500mg for sale
There’s noticeably a bundle to know about this. I assume you made sure nice factors in features also.
anastrozole 1mg uk clarithromycin 500mg price brand catapres 0.1mg
purchase antivert minocin capsules minocycline 50mg price
buying ed pills online cialis generic order cialis 10mg sale
Great write-up, I¦m normal visitor of one¦s web site, maintain up the nice operate, and It’s going to be a regular visitor for a long time.
purchase hytrin pills generic arava tadalafil dosage
buy amiodarone 200mg online cheap order carvedilol 6.25mg generic buy generic dilantin
buy ditropan 5mg sale oxytrol without prescription fosamax order online
purchase nitrofurantoin order pamelor 25mg for sale nortriptyline where to buy
buy acetaminophen 500mg for sale paroxetine medication famotidine uk
purchase tacrolimus pills mirtazapine usa order requip online cheap
purchase rocaltrol buy calcitriol pills for sale purchase tricor generic
buy oxcarbazepine 300mg online cheap cost actigall buy urso generic
Enjoyed looking through this, very good stuff, regards. “It requires more courage to suffer than to die.” by Napoleon Bonaparte.
hi!,I love your writing so so much! percentage we be in contact more approximately your article on AOL? I require an expert on this house to resolve my problem. May be that’s you! Taking a look ahead to look you.
bupropion 150mg oral zyban 150mg oral buy strattera no prescription
Thanks for sharing excellent informations. Your web-site is so cool. I’m impressed by the details that you?¦ve on this blog. It reveals how nicely you perceive this subject. Bookmarked this web page, will come back for more articles. You, my friend, ROCK! I found simply the information I already searched everywhere and just couldn’t come across. What a perfect web-site.
buy seroquel 100mg for sale brand zoloft 50mg lexapro for sale
prozac drug letrozole ca letrozole 2.5 mg usa
buy bisoprolol 5mg zebeta 5mg uk oxytetracycline pills
vantin 100mg for sale buy cefaclor 250mg pills flixotide for sale
brand cialis pills sildenafil without a doctor’s prescription viagra 100mg price
zaditor 1 mg us order generic ziprasidone 80mg order imipramine 75mg sale
buy minoxidil no prescription ed pills for sale buy generic ed pills for sale
cost acarbose 25mg griseofulvin 250mg for sale buy griseofulvin
aspirin cheap buy generic levoflox imiquimod price
You have brought up a very good details , appreciate it for the post.
order melatonin 3 mg online buy meloset sale buy danazol for sale
order dipyridamole 25mg pill order dipyridamole generic order pravastatin 20mg
brand duphaston 10 mg sitagliptin ca buy empagliflozin 10mg sale
I’m not that much of a online reader to be honest but your blogs really
nice, keep it up! I’ll go ahead and bookmark your site to
come back later on. Cheers
monograph over the counter buy cheap pletal cilostazol over the counter
order prasugrel sale buy chlorpromazine 50 mg sale buy detrol 1mg online cheap
buy pyridostigmine pills order mestinon 60mg generic order rizatriptan 5mg generic
purchase enalapril online cheap order lactulose generic purchase duphalac bottles
cheap betahistine 16 mg benemid 500 mg brand buy cheap probenecid
premarin usa where to buy dostinex without a prescription sildenafil 100mg sale
order omeprazole 20mg online cheap cheap prilosec 20mg lopressor ca
order cialis generic purchase tadalafil pills order sildenafil 100mg for sale
telmisartan pills buy generic molnupiravir online molnupiravir 200mg pills
oral cenforce 50mg order naproxen 250mg for sale buy aralen 250mg generic
Great article. I am experiencing some of these issues as well..
provigil for sale online buy cheap generic provigil purchase prednisone online cheap
buy omnicef 300mg sale omnicef 300mg us prevacid 15mg price
isotretinoin 20mg oral amoxil 500mg price buy azithromycin 250mg
buy azithromycin 500mg for sale omnacortil 5mg price order gabapentin 100mg online cheap
lipitor 40mg brand atorvastatin 80mg for sale buy norvasc pills
san manuel casino online buy furosemide pill diuretic lasix uk
pantoprazole brand order protonix 20mg generic buy cheap phenazopyridine
red dog casino best online casino for real money strongest over the counter antihistamine
Wow! Finally I got a blog from where I know how to really
take helpful data regarding my study and knowledge.
Just want to say your article is as astounding.
The clearness in your post is simply spectacular and i could assume you’re an expert
on this subject. Fine with your permission allow me to grab your feed to keep updated with forthcoming post.
Thanks a million and please continue the enjoyable work.
online blackjack for real money usa order stromectol pill ivermectin 3mg tablets
symmetrel 100mg generic buy dapsone cheap how to buy avlosulfon