Try to answer these 100+ Data Analytics MCQs and check your understanding of the Data Analytics subject.
Scroll down and let's begin!
A.
A
B.
B
C.
c
D.
D
A. Explanative analysis
B. Spectral analysis
C. Forecasting
D. Descriptive analysis
A. PredictCaseLikelihood(
B. PredictCaseLikelihood([NORMALIZEDINONNORMALIZEDD
C. PredictCaseLikelihood(
D. PredictCaseLikelihood(
A. PredictVariance(
B. PredictVariance(
C. PredictVariance(
D. PredictVariance(
A. Catalog design
B. Basket data analysis
C. Cross-marketing
D. Loss-leader analysis
E. All of the above
F. None of the above
A. It can automatically process messages and emails.
B. It can investigate competitors by crawling their web sites.
C. It can analyze open-ended survey responses.
D. It can analyze warranty or insurance claims.
E. All of the above.
A.
A
B.
B
C.
c
D.
0
A.
A
B.
B
C.
C
D.
D
For what purpose is the following R function run?
print(getwd)
A.
To get and print the current working directory.
B.
To get and print all working directories.
C.
To count and print all working directories.
D.
To print the location of a working directory.
A. Input neuron
B. Hidden neuron
C. Output neuron
D. None of the above
A. It is used for calculating the conditional probability between input and predictable columns and it assumes that the columns are independent.
B. It is used for performing automatic feature selection to limit the number of values that are considered when building a model.
C. It is provided by Microsoft SQL Server analysis services for use in predictive modeling.
D. It is used for considering each pair of input attribute values and output attribute values.
E. All of the above.
A. It is used for encouraging group effect in case of highly correlated variables.
B. It is used for finding the probability of event=Success and event=Failure.
C. It is used for adding and removing predictors as needed for each step.
D. It is used for penalizing the absolute size of the regression coefficients.
A. It is used for predicting one or more continuous numeric variables; for example. profit or loss that is based on other attributes in a dataset.
B. It is used for finding correlations between different attributes in a dataset.
C. It is used for dividing data into groups or clusters of items that have similar properties.
D. It is used for summarizing frequent sequences or episodes in data; for example. a series of log events preceding machine maintenance.
A. ltemsets
B. Dependency Network
C. Rules
D. None of the above
A. It is used for finding whether an event can lead to a change in a time series.
B. It is used for finding a trend or pattern in a time series through the use of graphs or other tools.
C. It is used extensively in budgeting. which is based on historical trends.
D. It is used for studying the cross correlation between two time series and their dependence on another.
A.
PredictAssociation(<NodelD>)
B.
PredictAssociation(<cluster column reference>, [<predicted state>))
C.
PredictAssociation(<scalar column reference>)
D.
PredictAssociation(<tabIe column reference>, optionl, option2, n ...)
A. 10
B. 3
C. 1
D. 0.4
A. glm(formula, family=familytype(link=linkfunction), data=)
B. glm(formula, data=, method=,control=)
C. glm(vector, start=. end=, frequency=)
D. glm(bootobject. conf=, type=)
Find the output of the following R programming language code.
z1 <- c(7,5,8,4,4,16)
z2 <- c(9,6)
add.result <- 21+22
print(add.result)
sub.result <- 21-22
print(sub.result)
A.
[1]161184416
[1}2-184416
B.
[1]1511171313 25
[1] -2 -1 2 -2 -210
C.
[1]1611171013 22
[1] -2 -1 -1 -2 -5 10
D.
[1]151074814
[1]-1-216 -4 -3
Find the output of the following code of the R programming language.
z1 <- c(4,3,TRUE,2+6i)
z2 <- c(4,7,TRUE.2+7i)
print(z1&22)
A.
[1] TRUE TRUE TRUE TRUE
B.
[1] TRUE FALSE TRUE FALSE
C.
[1] FALSE TRUE FALSE TRUE
D.
[1] FALSE FALSE FALSE FALSE
What will be the output of the following R code?
c(4,7,TRUE,3+7i) -> v1
c(9,6,FALSE,3+7i) ->> v2
print(v1)
print(v2)
A.
[114+01 4+1i 7+01 3+7i
[1] 9+0i 9+1i 6+0i 3+7i
B.
[1]4+0i7+0i1+0i3+7i
[1] 9101 6+0i 0+01 3+7i
C.
[1) 4+0i 7+7l1+1i 3+7i
[1] 9+Oi 9+1i 6+6i 3+7i
D.
[1]4+4i7+7i1+1i3+7i
[119+9i 6+6i1+1i 3+7i
A. grepl.any(installed.packages("xlsx")) library("xlsx")
B. any(grepl("xlsx“,installed.package())) library("xlsx")
C. any.grepl(xlsx,installed.package50) |ibrary(xlsx)
D. grepl(any(installed.packages(xlsx))) |ibrary(xlsx)
A. Cluster(
B. Cluster([
C. Cluster()
D. Cluster([
A.
A
B.
B
C.
C
D.
0
A. Clustering
B. Categorization
C. Visualization
D. Information extraction
What will be the output of the following code of the R programming language?
a <- c(9,0.FALSE,2+9i)
b <- c(8,0,TRUE,2+7i)
print(alb)
A.
[1] FALSE TRUE FALSE FALSE
B.
[1] TRUE TRUE TRUE FALSE
C.
[1] TRUE FALSE TRUE TRUE
D.
[1] FALSE FALSE FALSE TRUE
Find the output of the following R programming language code.
a <- c(7.5.FALSE.4+4i)
b <- c(6,0,TRUE,4+7i)
print(a&&b)
A.
[1] FALSE
B.
[1] FALSE TRUE
C.
(1] FALSE FALSE
D.
[1] TRUE
A. Segmentation algorithm
B. Classification algorithm
C. Sequence analysis algorithm
D. Association algorithm
What will the following R code do?
mydata$v2 <- mydata$v4 <- NULL
A.
It will replace the value of variable v2 with v4 and will delete the variable v4.
B.
It will replace the value of variable v4 with v2 and will delete the variable v2.
C.
It will delete the variables v2 and v4.
D.
None of the above.
A. match associations [as pattern_name] analyze {measure(s) }
B. mine associations [as pattern_name] analyze classifying_attribute_or_dimension
C. mine associations [as [pattern_name]] {matching {metapattern}}
D. mine associations [as pattern_name] analyze prediction_attribute_or_dimension {set [attribute_or_dimension_i= value_i}]
Choose True or False.
Text mining is used in spam filtering. content enrichment and contextual advertising.
A.
True
B.
False
A user wants to read and print the contents of a CSV file named myexample-csv that is present in his current working directory. Which of the following is the correct syntax of the command that should be executed by him to accomplish this task?
A.
data <- read(myexample.csv)
print(data)
B.
data <- read.file(”myexample.csv")
print(data)
C.
data <- read.csv("myexample.csv")
print(data)
D.
data <~ read.data(myexample.csv)
print(data)
A. Stepwise regression
B. Polynomial regression
C. Linear regression
D. Logistic regression
A. PredictSupport(
B. PredictSupport(
C. PredictSupport(
D. PredictSupport(
A. It supports the cyclical, key and table content types.
B. It supports the key, table and ordered content types.
C. It supports the continuous, key and table content types.
D. It supports the continuous, cyclical and ordered content types.
Using the following information, find the correct syntax of the R function used for creating binary files.
Assume object as the binary file to be written. n as the number Of bytes and con as the connection object.
A.
writeBin(object, n, con)
B.
writeBin(object)
C.
writeBin(object, n)
D.
writeBin(object, con)
A. It specifies how a model should be mixed for optimizing forecasting.
B. It specifies which algorithm to use for analysis and prediction.
C. It specifies a numeric value between 0 and 1 that detects periodicity.
D. It specifies the minimum number of time slices that are required to generate a split in each time series tree.
Find the output of the following code of the R programming language.
Iista <- Iist(5:7)
print(lista)
Iistb <-Iist(12:14)
print(listb)
x1 <- unlist(lista)
x2 <- unlist(listb)
print(xl)
print(x2)
r <- x1+x2
print(r)
A.
[[1]]
[1] 5 6 7
[[1]]
[1] 12 13 14
[1] 5 6 7
[1] 12 13 14
[1] 17 19 21
B.
[[1]]
[1] 5 6 7
[[1]]
[1] 12 13 14
[1] 5 6 7
[1] 12 13 14
[1] 15 16 17
C.
[[1]]
[1] 5 6 7
[[1]]
[1] 12 13 14
[1] 5 6 7
[1] 12 13 14
[1] 24 25 26
D.
[[1]]
[1] 5 6 7
[[1]]
[1] 12 13 14
[1] 5 6 7
[1] 12 13 14
[1] 11 12 13
A. 0.6
B. 0.1
C. 10
D. 1
A. total <~ merge(data myFrame1 with myFrame2, by=c(lD,Country))
B. total <- merge(data myFrame1,data myFrame2,by=c("lD","Country"))
C. total <- merge(data by=c("lD","Country") for myFrame1, myFrame2)
D. total <- merge(data for myFrame1, myFrame2,by=C(lD,Country))
A.
A
B.
B
C.
c
D.
D
Which of the following options represent correct application of the time series analysis?
i) Yield Projections
ii) Workload Projections
iii) Census Analysis
iv) Inventory Studies
A.
Only options i) and ii)
B.
Only options ii) and iv)
C.
Only Options i). ii) and iv)
D.
Only options ii). iii) and iv)
E.
All options i), ii). iii) and iv)
A. PredictAdjustedProbability(
B. PredictAdjustedProbability(
C. PredictAdjustedProbability(
D. PredictAdjustedProbabilityo
A. It can be used to produce an unrotated principal component analysis.
B. It can be used to produce maximum likelihood factor analysis.
C. It can be used to bootstrap the structural equation model.
D. It can be used to fit an autoregressive integrated moving average model.
A. F-score = recall - precision + (recall x precision) / 9
B. F-score = recall + precision - (recall x precision) I 7
C. F-score = recall x precision / (recall + precision) / 2
D. F-score = recall I precision x (recall - precision) / 5
A. 10
B. 1
C. 0
D. 5
A. Regression analysis
B. ANOVA
C. Factor analysis
D. Logistic regression
A. precision: l[Relevant] n [Retrieved]l / l[Retrieved]l
B. Precision= l[Retrieved} U [F-score]l + l[F-score}l
C. Precision= l[Recall] / [F-scorejl x l[RecalI]l
D. Precision= l[F-score] x [Recalljl - l[F—score)l
Which of the given options will be the output of the following code when it is executed in R?
var <— c(8.4.NA.12)
mean(var, na.rm=TRUE)
A.
[1) 2
B.
[114
C.
[1) 8
D.
[1110
E.
The code will throw an error.
A. Precision
B. Recall
C. F-score
D. None of the above
A. 200
B. 30
C. 255
D. 100
A. It is used to model binary variables.
B. It is used to model compositional data.
C. It is used to model rank variables.
D. It is used to model count variables.
A.
A
B.
B
C.
C
D.
0
A.
0
B.
3
C.
1
D.
2
A.
A
B.
B
C.
c
D.
D
What will be the output of the following code of the R programming language?
b1 <- 17
b2 <- 13
z <— 5:7
print(b1 96in96 z)
print(b2 %in% z)
A.
[1] TRUE
[1] FALSE
B.
[1] FALSE
[1] TRUE
C.
[1] TRUE
[1] TRUE
D.
[1] FALSE
[1] FALSE
A. Phrase-Based Method (PBM)
B. Term-Based Method (TBM)
C. Pattern Taxonomy Method (PTM)
D. Concept-Based Method (CBM)
A. Ridge regression
B. Beta regression
C. Loess regression
D. Isotonic regression
A. MINIMUM_SUPPORT
B. MINIMUM_PROBABILITY
C. MINIMUM_ITEMSET_SIZE
D. MINIMUM_ITEMSET_COUNT
A. lsDescendant(
B. lsDescendant(
C. lsDescendant(
D. lsDescendant(
A. l'_l Predict(
B. LI Predict( C. [El Predict( D. Ll Predict( A. (link = '’identity") B. (link = '’Iogit") C. (link = ‘'Iog") D. (link = ”inverse") Consider the following parameters: control - Optional parameters for controlling boot data. frequency - Specifies the number of observations per unit time. data - Specifies the data frame. bootobject - The Object returned by the boot function. conf- The desired confidence interval. type - The type of confidence interval returned. According to bootstrapping in advanced statistics. which of the following options is the correct syntax of the boot.cio function? A. boot.ci(type. data=. control=) B. boot.ci(frequency=.data=) C. boot.ci(data=,control=) D. boot.ci(bootobject. conf=. type=) A. lslnNode( B. lsDescendant( C. PredictNodeld (DMX) D. Both a and b E. None of the above A. lsInNode (DMX) B. PredictAssociation(DMX) C. PredictAdjustedProbability(DMX) D. PredictHistogram(DMX) A. Non-Scalable EM B. Scalable EM C. Scalable K-Means D. Non-Scalable K—Means A. Boolean type B. Cluster value C. Table D. Scalar value A. PREDICTION_SMOOTHING B. FORECAST_METHOD C. INSTABILITY_SENSITIVITY D. COMPLEXITY_PENALTY A. It applies to mining model columns. B. It applies to mining structure columns. C. It applies to both mining model columns and mining structure columns. D. It applies neither to mining model columns nor to mining structure columns. Consider the following parameters: Vector input = x Total number of digits displayed = digits Minimum number of digits to the right of the decimal point = nsmall Minimum width to be displayed by the padding blanks in the beginning = width Term to denote the option used to display scientific notation = scientific Term to denote the option used to display the string left. right or center =justify Option used for eliminating the space in between two strings = collapse Separator between the arguments = sep As per string manipulation in R programming language, which of the following options is the correct syntax Of the format() function for formatting numbers and strings? A. format(digits, nsmall, sep = ' ", width = NULL) B. format(x. nsmall, height. scientific = NULL. ,collapse = NULL. sep = " ") C. format( x, nsmall, collapse, digits, justify = C("Ieft", "right", "centre", "none"), width = NULL) D. format(x. digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) A. Simple random sampling B. Stratified random sampling C. Extensive sampling f D. Quota sampling A. Data can be collected faster in a sampling method. B. A sampling method provides the facility to organize and execute the research work conveniently. C. It is less expensive. D. No specialized knowledge is required to use a sampling method. Consider the following list: squares_list = [2, 3. S. 2. 8. 9. 7. 6} In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a fixed set of key terms or automatically from the documents? A. Vector model B. Boolean model C. Connectionist model D. probabilistic model A. It is well suited for tabular data with heterogeneously—typed columns. B. Only labelled data can be placed into a pandas data structure. C. It is suitable for arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels. D. Ordered and unordered (not necessarily fixed-frequency) time series data can also be analyzed with pandas. A. Recall B. F-score C. Precision D. Both a and c A. K-means B. C45 C. EM D. Apriori Consider the following list: squares_list = [2. 3. 5. 2. 8. 9. 7. 6} What will be the output of the following Python command? squares_list[-2] A. -2 B. 3 C. 7 D. It is an invalid command. A. matplotlib B. pandas C. numpy D. Both a and c Consider the following data: Average cost of wafers = Rs. 35 Average cost of chocolates = Rs. 37 Standard deviation of cost of wafers = 2.0 Standard deviation of cost of chocolates = 3.0 Correlation coefficient between the costs of chocolates and wafers = 0.7 What will be the expected cost of chocolates when the cost of wafers is Rs. 40? A. Rs. 42.25 B. Rs. 45 C. Rs. 39.85 D. Rs. 41.75 A. When all of its immediate supersets have the same support as the itemset. B. When none of its immediate subsets has the same support as the itemset. C. When all of its immediate subsets have the same support as the itemset. D. When none of its immediate supersets has the same support as the itemset. A. 1/1024 B. 1023/1024 C. 11512 D. 511/512 A. Features selection B. Text preprocessing C. Features generation D. Both a and b A. 0.60 B. 0.79 C. 0.45 D. 0.82 A. 0.88 B. 0.82 C. 0.95 D. 0.90 A. BIRCH B. K-means C. STING D. FCM A. 2.5 B. 2.8 C. 3.2 D. 3.4 A. 11/14 B. 13/14 C. 1/14 D. 3/14 A. Judgement sampling B. Stratified random sampling C. Cluster sampling D. Multistage random sampling A. L1 In a belief network, class conditional independencies can be defined between the subsets of variables. B. VJ Joint conditional probability distribution cannot be specified by Bayesian belief networks. C. VJ A trained Bayesian network cannot be used for classification. D. VJ A graphical model of casual relationship for performing learning is provided by Bayesian belief network. A. There is no possibility of personal prejudice in this method. B. It is more accurate and reliable. C. It is mostly used in those fields where almost similar units exist or some units are tOO important' to be left out of the sample. D. It is very expensive. A. n(xl0)p(x) B. n(0)p(x) C. n(0)p(xl0) D. nl(x)p(0lx) Which of the following commands is used to observe the way an R object is structured? It is given that mydata is a variable where a user's data is stored. A. library(mydata) B. describe(mydata) C. str(mydata) D. summary(mydata) A. Support for Hadoop B. ln-memory analytics C. Grid computing D. ln-database processing Which of the following challenges are faced in text mining? (i) No publication is in electronic form. (ii) Large textual database. (iii) Complex relationships between concepts in text. (iv) Limited number Of possible dimensions. A. Only (i) and (ii) B. Only (iii) and (iv) C. Only (ii) and (iii) D. Only (i) and (iv) A. ipython —pylab=in|ine B. ipython —pylab=inline -notebook C. ipython=notebook —pylab.in|ine D. ipython notebook —pylab=inline A. P(X/H) = P(H/X)P(H)/P(X) B. P(H/X) = P(X/H)P(H)/P(X) C. P(H/X) = P(X/H)P(X)/P(H) D. P(XIH) = P(H/X)/P(H)P(X) A. It allows only one outcome. B. A single-pass algorithm derived from binomial confidence limits is used by C45. C. It uses information-based criteria. A. 7 B. 8 C. 13 D. 11 A. Referrals traffic B. Organic traffic C. Direct traffic D. Social traffic A. Acquisition analysis B. Audience analysis C. Behavior analysis D. Conversion analysis], [
, [option1], [option2], [option n], [INCLUDE_NODE_ID], n)
62: According to advanced statistics generalized linear model, which of the following is the default link function for the gaussian family?
64: As per Microsoft association rules algorithm, which of the following prediction functions has/have a Boolean return type?
65: As per Microsoft association rules algorithm, which of the following Options is the prediction function with scalar value as the return type?
66: Which of the following options is the default CLUSTERING_METHOD used by the Microsoft clustering algorithm?
67: Which of the following options is the correct return type of the PredictHistogram (DMX) prediction function used by the Microsoft logistic regression algorithm?
68: Which of the following options is the parameter of the Microsoft time series algorithm, which is used for controlling the growth of a decision tree?
69: Which of the following statements is correct about the NOT NULL modeling flag used in the Microsoft time series algorithm?
71: Which of the following sampling methods is used for heterogeneous units of universe rather than the homogeneous units and can be adopted only when its population is known?
72: Which of the following statements is incorrect about sampling methods?
74: Which of the following statements is NOT correct about pandas?
75: Which of the following fundamental measures used for assessing the quality of text retrieval represent(s) the percentage of retrieved documents relevant to a query?
76: Which of the following data mining algorithms is applied to a database containing a large number of transactions and also learns association rules?
78: While working in a Pylab environment, which of the following options do NOT need to be imported?
80: In association rule mining, an itemset is considered to be closed in which of the following situations?
81: It is given that a and b are two independent binomial variables having parameters 3,114 and 2,1/4, respectively. Find P (a + b 21).
82: The bag-of-words model is used in which of the following text mining processes?
83: For a group of 12 students, the sum of squares of differences in their ranks for science and math is given as 60. On the basis of the given information. find the value of rank correlation coefficient.
84: While calculating rank correlation coefficient between sales and expenditure for a time period of12 years. the difference in rank for a year was mistakenly taken as 9 instead of 7 and as a result, the value Of rank correlation coefficient was calculated as 0.79. If the mistake is rectified, then what will be the approximate correct value of rank correlation coefficient?
85: Which of the following clustering algorithms is used for grid-based partitioning?
86: It is given that there are 15 pairs of readings on X and Y such that the coefficient of correlation is 0.87. It is also given that the standard deviation on is 5.60. What will be the approximate standard error of estimate of Y on X?
87: Sam is popular for hitting a target in 6 out of 12 shots, whereas John can hit the same target in 8 out of 14 shots. What will be the probability that the target will be hit when they both try?
88: Which of the following is a non-probability sampling method?
89: Which of the following statements are NOT correct about the Bayesian belief network?
90: Which of the following statements is correct about the judgement sampling method?
91: In the Baysian model, which of the following is the correct representation of the joint density of (6, X), if it is known that for a given 0, the observed data x are a realization of pa?
93: In which of the following Big Data technologies, moving relevant data management, analytics and reporting tasks to where the data resides, improves speed to insight, reduces data movement and promotes better data governance?
95: Which of the following commands is used for starting iPython interface in inline Pylab mode and opening iPython notebook in pylab environment?
96: ln data mining, according to Bayes‘ theorem, which of the following formulae represents posterior probability in terms of prior probability?
97: In data mining, which of the following statements is NOT correct about C45 algorithm?
99: If a user wants to learn about the top keywords that send traffic to his/her website, then which of the following acquisition segmentations should be preferred?
100: In Google Analytics tool, which of the following analysis should be performed in order to identify the origin of a user's web traffic?
List of Data Analytics MCQs Mu...