Data Analytics MCQs

Data Analytics MCQs

Try to answer these 100+ Data Analytics MCQs and check your understanding of the Data Analytics subject.
Scroll down and let's begin!

1:

The values of X and Y are given in f‌igure-1 Of the image. Choose the correct value of 2X — 5Y from figure-2.

A.  

A

B.  

B

C.  

c

D.  

D

2: Which of the following types of time series analysis aims at separating periodic or cyclical components in a time series?

A.   Explanative analysis

B.   Spectral analysis

C.   Forecasting

D.   Descriptive analysis

3: With respect to the Microsoft sequence clustering algorithm, which of the following options is the correct syntax of the PredictCaseLikelihood (DMX) function?

A.   PredictCaseLikelihood()

B.   PredictCaseLikelihood([NORMALIZEDINONNORMALIZEDD

C.   PredictCaseLikelihood(, [])

D.   PredictCaseLikelihood(, [))

4: Which of the following is the correct syntax of the PredictVariance (DMX) prediction function used in Microsoft logistic regression algorithm?

A.   PredictVariance( I )

B.   PredictVariance()

C.   PredictVariance()

D.   PredictVariance()

5: Which of the following options represent(s) the correct application of association rule mining?

A.   Catalog design

B.   Basket data analysis

C.   Cross-marketing

D.   Loss-leader analysis

E.   All of the above

F.   None of the above

6: Which of the following options is/are the correct application(s) of text mining?

A.   It can automatically process messages and emails.

B.   It can investigate competitors by crawling their web sites.

C.   It can analyze open-ended survey responses.

D.   It can analyze warranty or insurance claims.

E.   All of the above.

7:

Select the value of X given in f‌igure-1, from the options given in figure-2.

A.  

A

B.  

B

C.  

c

D.  

0

8:


Consider the matrix Z given in f‌igure-1 of the image. Using the matrix methods. find the 1x3 vector. 9

A.  

A

B.  

B

C.  

C

D.  

D

9:

For what purpose is the following R function run?

print(getwd)


A.  

To get and print the current working directory. 

B.  

To get and print all working directories.

C.  

To count and print all working directories.

D.  

To print the location of a working directory.

10: With respect to Microsoft neural network algorithm. which of the following options is the neuron type that represents predictable attribute values for a data mining model?

A.   Input neuron

B.   Hidden neuron

C.   Output neuron

D.   None of the above

11: Which of the following options is/are correct about the Microsoft naive bayes algorithm?

A.   It is used for calculating the conditional probability between input and predictable columns and it assumes that the columns are independent.

B.   It is used for performing automatic feature selection to limit the number of values that are considered when building a model.

C.   It is provided by Microsoft SQL Server analysis services for use in predictive modeling.

D.   It is used for considering each pair of input attribute values and output attribute values.

E.   All of the above.

12: Which of the following options is correct about the logistic regression technique?

A.   It is used for encouraging group effect in case of highly correlated variables.

B.   It is used for finding the probability of event=Success and event=Failure.

C.   It is used for adding and removing predictors as needed for each step.

D.   It is used for penalizing the absolute size of the regression coeff‌icients.

13: In data mining, which of the following options is correct about the regression algorithm?

A.   It is used for predicting one or more continuous numeric variables; for example. profit or loss that is based on other attributes in a dataset.

B.   It is used for finding correlations between different attributes in a dataset.

C.   It is used for dividing data into groups or clusters of items that have similar properties.

D.   It is used for summarizing frequent sequences or episodes in data; for example. a series of log events preceding machine maintenance.

14: As per the Microsoft association rules model. which of the following options is the correct viewer tab that combines information about itemsets and their relative value?

A.   ltemsets

B.   Dependency Network

C.   Rules

D.   None of the above

15: Which of the following statements is correct about the intervention analysis type of the time series analysis?

A.   It is used for f‌inding whether an event can lead to a change in a time series.

B.   It is used for f‌inding a trend or pattern in a time series through the use of graphs or other tools.

C.   It is used extensively in budgeting. which is based on historical trends.

D.   It is used for studying the cross correlation between two time series and their dependence on another.

16: Which of the following is the correct syntax of the PredictAssociation prediction function used in the Microsoft association rule algorithm?

A.  

PredictAssociation(<NodelD>)   

B.  

PredictAssociation(<cluster column reference>, [<predicted state>)) 

C.  

PredictAssociation(<scalar column reference>)

D.  

PredictAssociation(<tabIe column reference>, optionl, option2, n ...)       

17: Which of the following is the correct default value of the MAXIMUM_ITEMSET_SIZE parameter, which is used with the Microsoft association rules algorithm?

A.   10

B.   3

C.   1

D.   0.4

18: With respect to advanced statistics, which of the following options is the correct syntax Of the glm() function?

A.   glm(formula, family=familytype(link=linkfunction), data=)

B.   glm(formula, data=, method=,control=)

C.   glm(vector, start=. end=, frequency=)

D.   glm(bootobject. conf=, type=)

19:

Find the output of the following R programming language code.

z1 <- c(7,5,8,4,4,16)

z2 <- c(9,6)

add.result <- 21+22

print(add.result)

sub.result <- 21-22

print(sub.result)


A.  

[1]161184416

[1}2-184416


B.  

[1]1511171313 25

[1] -2 -1 2 -2 -210


C.  

[1]1611171013 22

[1] -2 -1 -1 -2 -5 10 


D.  

[1]151074814

[1]-1-216 -4 -3


20:

Find the output of the following code of the R programming language.

z1 <- c(4,3,TRUE,2+6i)

z2 <- c(4,7,TRUE.2+7i)

print(z1&22)


A.  

[1] TRUE TRUE TRUE TRUE

B.  

[1] TRUE FALSE TRUE FALSE

C.  

[1] FALSE TRUE FALSE TRUE

D.  

[1] FALSE FALSE FALSE FALSE

21:

What will be the output of the following R code?

c(4,7,TRUE,3+7i) -> v1

c(9,6,FALSE,3+7i) ->> v2

print(v1)

print(v2)


A.  

[114+01 4+1i 7+01 3+7i

[1] 9+0i 9+1i 6+0i 3+7i 


B.  

[1]4+0i7+0i1+0i3+7i

[1] 9101 6+0i 0+01 3+7i  


C.  

[1) 4+0i 7+7l1+1i 3+7i

[1] 9+Oi 9+1i 6+6i 3+7i


D.  

[1]4+4i7+7i1+1i3+7i

[119+9i 6+6i1+1i 3+7i


22: Which of the following is the correct syntax of the command that will verify the installation of the xlsx package and load the library into R workspace?

A.   grepl.any(installed.packages("xlsx")) library("xlsx")

B.   any(grepl("xlsx“,installed.package())) library("xlsx")

C.   any.grepl(xlsx,installed.package50) |ibrary(xlsx)

D.   grepl(any(installed.packages(xlsx))) |ibrary(xlsx)

23: As per the Microsoft sequence clustering algorithm, which of the following options is the correct syntax of the Cluster (DMX) prediction function?

A.   Cluster()

B.   Cluster([])

C.   Cluster()

D.   Cluster([])

24:

In the given image, which set of vectors is linearly independent?


A.  

A

B.  

B

C.  

C

D.  

0

25: Which of the following text mining techniques can be used for f‌inding groups of documents with similar content?

A.   Clustering

B.   Categorization

C.   Visualization

D.   Information extraction

26:

What will be the output of the following code of the R programming language?

a <- c(9,0.FALSE,2+9i)

b <- c(8,0,TRUE,2+7i)

print(alb)


A.  

[1] FALSE TRUE FALSE FALSE

B.  

[1] TRUE TRUE TRUE FALSE 

C.  

[1] TRUE FALSE TRUE TRUE 

D.  

[1] FALSE FALSE FALSE TRUE

27:

Find the output of the following R programming language code.

a <- c(7.5.FALSE.4+4i)

b <- c(6,0,TRUE,4+7i)

print(a&&b)


A.  

[1] FALSE

B.  

[1] FALSE TRUE

C.  

(1] FALSE FALSE

D.  

[1] TRUE

28: IN SOL Server data mining, which of the following algorithm types predicts one or more discrete variables that are based on other attributes in a dataset?

A.   Segmentation algorithm

B.   Classif‌ication algorithm

C.   Sequence analysis algorithm

D.   Association algorithm

29:

What will the following R code do?

mydata$v2 <- mydata$v4 <- NULL


A.  

It will replace the value of variable v2 with v4 and will delete the variable v4.

B.  

It will replace the value of variable v4 with v2 and will delete the variable v2.

C.  

It will delete the variables v2 and v4.

D.  

None of the above.

30: In data mining, which of the following options is the correct syntax for association?

A.   match associations [as pattern_name] analyze {measure(s) }

B.   mine associations [as pattern_name] analyze classifying_attribute_or_dimension

C.   mine associations [as [pattern_name]] {matching {metapattern}}

D.   mine associations [as pattern_name] analyze prediction_attribute_or_dimension {set [attribute_or_dimension_i= value_i}]

31:

Choose True or False.

Text mining is used in spam filtering. content enrichment and contextual advertising.


A.  

True 

B.  

False

32:

A user wants to read and print the contents of a CSV file named myexample-csv that is present in his current working directory. Which of the following is the correct syntax of the command that should be executed by him to accomplish this task?


A.  

data <- read(myexample.csv)

print(data) 


B.  

data <- read.f‌ile(”myexample.csv")
print(data)  

C.  

data <- read.csv("myexample.csv")
print(data)   

D.  

data <~ read.data(myexample.csv)
print(data)

33: Which of the following regression techniques attempts maximizing the prediction power with minimum number of predictor variables?

A.   Stepwise regression

B.   Polynomial regression

C.   Linear regression

D.   Logistic regression

34: Which of the following is the correct syntax of the PredictSupport (DMX) prediction function used with Microsoft linear regression algorithm?

A.   PredictSupport(, [])

B.   PredictSupport( l )

C.   PredictSupport( I )

D.   PredictSupport( l )

35: Which of the following statements is correct about the Predictable column supported by the Microsoft linear regression algorithm?

A.   It supports the cyclical, key and table content types.

B.   It supports the key, table and ordered content types.

C.   It supports the continuous, key and table content types.

D.   It supports the continuous, cyclical and ordered content types.

36:

Using the following information, find the correct syntax of the R function used for creating binary f‌iles.

Assume object as the binary file to be written. n as the number Of bytes and con as the connection object.


A.  

writeBin(object, n, con)  

B.  

writeBin(object)  

C.  

writeBin(object, n) 

D.  

writeBin(object, con)

37: Which of the following statements is correct about the PREDICTION_SMOOTHING parameter used in the Microsoft time series algorithm?

A.   It specif‌ies how a model should be mixed for optimizing forecasting.

B.   It specifies which algorithm to use for analysis and prediction.

C.   It specif‌ies a numeric value between 0 and 1 that detects periodicity.

D.   It specif‌ies the minimum number of time slices that are required to generate a split in each time series tree.

38:

Find the output of the following code of the R programming language.

Iista <- Iist(5:7)

print(lista)

Iistb <-Iist(12:14)

print(listb)

x1 <- unlist(lista)

x2 <- unlist(listb)

print(xl)

print(x2)

r <- x1+x2

print(r)


A.  

[[1]]

[1] 5 6 7

[[1]]

[1] 12 13 14

[1] 5 6 7

[1]  12 13 14

[1]  17 19 21 


B.  

[[1]]

[1] 5 6 7

[[1]]

[1] 12 13 14

[1] 5 6 7

[1]  12 13 14

[1]  15 16 17


C.  

[[1]]

[1] 5 6 7

[[1]]

[1] 12 13 14

[1] 5 6 7

[1]  12 13 14

[1] 24 25 26


D.  

[[1]]

[1] 5 6 7

[[1]]

[1] 12 13 14

[1] 5 6 7

[1]  12 13 14

[1]  11 12 13


39: Which of the following is the correct default value for the INSTABILITY_SENSITIVITY parameter used with the Microsoft time series algorithm?

A.   0.6

B.   0.1

C.   10

D.   1

40: Which of the following is the correct syntax of the command used for merging two data frames, myFrame1 and myFrame2, by ID and Country?

A.   total <~ merge(data myFrame1 with myFrame2, by=c(lD,Country))

B.   total <- merge(data myFrame1,data myFrame2,by=c("lD","Country"))

C.   total <- merge(data by=c("lD","Country") for myFrame1, myFrame2)

D.   total <- merge(data for myFrame1, myFrame2,by=C(lD,Country))

41:

From f‌igure-2 Of the given image, select the Option representing the inverse of the matrix given in f‌igure-1.


A.  

A

B.  

B

C.  

c

D.  

D

42:

Which of the following options represent correct application of the time series analysis?

i) Yield Projections

ii) Workload Projections

iii) Census Analysis

iv) Inventory Studies


A.  

Only options i) and ii)

B.  

Only options ii) and iv)

C.  

Only Options i). ii) and iv)

D.  

Only options ii). iii) and iv)

E.  

All options i), ii). iii) and iv)

43: Which of the following is the correct syntax for the PredictAdjustedProbability (DMX) prediction function used with the Microsoft association rules algorithm?

A.   PredictAdjustedProbability(. [))

B.   PredictAdjustedProbability()

C.   PredictAdjustedProbability()

D.   PredictAdjustedProbabilityo

44: With respect to advanced statistics, which of the following options is correct about the arimaO function?

A.   It can be used to produce an unrotated principal component analysis.

B.   It can be used to produce maximum likelihood factor analysis.

C.   It can be used to bootstrap the structural equation model.

D.   It can be used to f‌it an autoregressive integrated moving average model.

45: In data mining, which of the following options is correct about the F-score measure for text retrieval?

A.   F-score = recall - precision + (recall x precision) / 9

B.   F-score = recall + precision - (recall x precision) I 7

C.   F-score = recall x precision / (recall + precision) / 2

D.   F-score = recall I precision x (recall - precision) / 5

46: Which of the following is the default value of the parameter HISTORICAL_MODEL_GAP used in Microsoft time series algorithm?

A.   10

B.   1

C.   0

D.   5

47: Which of the following advanced statistics techniques is used for identifying latent variables that form groups?

A.   Regression analysis

B.   ANOVA

C.   Factor analysis

D.   Logistic regression

48: In data mining, which of the following options correctly def‌ines Precision, which is used for assessing the quality of text retrieval?

A.   precision: l[Relevant] n [Retrieved]l / l[Retrieved]l

B.   Precision= l[Retrieved} U [F-score]l + l[F-score}l

C.   Precision= l[Recall] / [F-scorejl x l[RecalI]l

D.   Precision= l[F-score] x [Recalljl - l[F—score)l

49:

Which of the given options will be the output of the following code when it is executed in R?

var <— c(8.4.NA.12)

mean(var, na.rm=TRUE)


A.  

[1) 2 

B.  

[114

C.  

[1) 8

D.  

[1110

E.  

The code will throw an error.

50: Which of the following text retrieval measures is the percentage of documents, which are relevant to the query and were actually retrieved?

A.   Precision

B.   Recall

C.   F-score

D.   None of the above

51: Which of the following is the correct default value of the HOLDOUT_PERCENTAGE parameter of the Microsoft logistic regression algorithm, which is used for specifying the percentage of cases within the training data used to calculate a holdout error?

A.   200

B.   30

C.   255

D.   100

52: In advanced statistics, which of the following statements is correct about the Dirichlet Regression method?

A.   It is used to model binary variables.

B.   It is used to model compositional data.

C.   It is used to model rank variables.

D.   It is used to model count variables.

53:

In the given image, which matrix is in the row echelon form?

A.  

A

B.  

B

C.  

C

D.  

0

54:

What is the rank of the matrix shown in the image?

A.  

0

B.  

3

C.  

1

D.  

2

55:

Which of the matrices given in figure-2 is the reduced row echelon form of the matrix given in figure-1 of the image?


A.  

A

B.  

B

C.  

c

D.  

D

56:

What will be the output of the following code of the R programming language?

b1 <- 17

b2 <- 13

z <— 5:7

print(b1 96in96 z)

print(b2 %in% z)


A.  

[1] TRUE

[1] FALSE


B.  

[1] FALSE

[1] TRUE 


C.  

[1] TRUE

[1] TRUE


D.  

[1] FALSE

[1] FALSE   


57: In which of the following text mining methods, terms are analyzed on the sentence and document level?

A.   Phrase-Based Method (PBM)

B.   Term-Based Method (TBM)

C.   Pattern Taxonomy Method (PTM)

D.   Concept-Based Method (CBM)

58: In advanced statistics. which of the following regression methods is used to model variables within the (0, 1) range?

A.   Ridge regression

B.   Beta regression

C.   Loess regression

D.   Isotonic regression

59: As per the Microsoft association rules algorithm, which of the following parameters specif‌ies the minimum number of cases that must contain an itemset before the algorithm generates a rule?

A.   MINIMUM_SUPPORT

B.   MINIMUM_PROBABILITY

C.   MINIMUM_ITEMSET_SIZE

D.   MINIMUM_ITEMSET_COUNT

60: Which of the following is the correct syntax of the lsDescendant (DMX) prediction function used in data mining?

A.   lsDescendant()

B.   lsDescendant(, [])

C.   lsDescendant()

D.   lsDescendant(, [))

61: As per the Microsoft naive bayes algorithm, which two of the following options are the correct syntax of the Predict (DMX) prediction function?

A.   l'_l Predict(, [option1], [option2], [option n], [INCLUDE_NODE_ID], IT)

B.   LI Predict(, [

], [])

C.   [El Predict(

, [option1], [option2], [option n], [INCLUDE_NODE_ID], n)

D.   Ll Predict(, [])

A.   (link = '’identity")

B.   (link = '’Iogit")

C.   (link = ‘'Iog")

D.   (link = ”inverse")

63:

Consider the following parameters:

control - Optional parameters for controlling boot data.

frequency - Specif‌ies the number of observations per unit time.

data - Specifies the data frame.

bootobject - The Object returned by the boot function.

conf- The desired conf‌idence interval.

type - The type of confidence interval returned.

According to bootstrapping in advanced statistics. which of the following options is the correct syntax of the boot.cio function?


A.  

boot.ci(type. data=. control=) 

B.  

boot.ci(frequency=.data=)

C.  

boot.ci(data=,control=)

D.  

boot.ci(bootobject. conf=. type=)

64: As per Microsoft association rules algorithm, which of the following prediction functions has/have a Boolean return type?

A.   lslnNode()

B.   lsDescendant()

C.   PredictNodeld (DMX)

D.   Both a and b

E.   None of the above

65: As per Microsoft association rules algorithm, which of the following Options is the prediction function with scalar value as the return type?

A.   lsInNode (DMX)

B.   PredictAssociation(DMX)

C.   PredictAdjustedProbability(DMX)

D.   PredictHistogram(DMX)

66: Which of the following options is the default CLUSTERING_METHOD used by the Microsoft clustering algorithm?

A.   Non-Scalable EM

B.   Scalable EM

C.   Scalable K-Means

D.   Non-Scalable K—Means

67: Which of the following options is the correct return type of the PredictHistogram (DMX) prediction function used by the Microsoft logistic regression algorithm?

A.   Boolean type

B.   Cluster value

C.   Table

D.   Scalar value

68: Which of the following options is the parameter of the Microsoft time series algorithm, which is used for controlling the growth of a decision tree?

A.   PREDICTION_SMOOTHING

B.   FORECAST_METHOD

C.   INSTABILITY_SENSITIVITY

D.   COMPLEXITY_PENALTY

69: Which of the following statements is correct about the NOT NULL modeling flag used in the Microsoft time series algorithm?

A.   It applies to mining model columns.

B.   It applies to mining structure columns.

C.   It applies to both mining model columns and mining structure columns.

D.   It applies neither to mining model columns nor to mining structure columns.

70:

Consider the following parameters:

Vector input = x

Total number of digits displayed = digits

Minimum number of digits to the right of the decimal point = nsmall

Minimum width to be displayed by the padding blanks in the beginning = width

Term to denote the option used to display scientif‌ic notation = scientific

Term to denote the option used to display the string left. right or center =justify

Option used for eliminating the space in between two strings = collapse

Separator between the arguments = sep

As per string manipulation in R programming language, which of the following options is the correct syntax Of the format() function for formatting numbers and strings?


A.  

format(digits, nsmall, sep = ' ", width = NULL)   

B.  

format(x. nsmall, height. scientif‌ic = NULL. ,collapse = NULL. sep = " ")

C.  

format( x, nsmall, collapse, digits, justify = C("Ieft", "right", "centre", "none"), width = NULL)

D.  

format(x. digits, nsmall, scientif‌ic, width, justify = c("left", "right", "centre", "none")) 

71: Which of the following sampling methods is used for heterogeneous units of universe rather than the homogeneous units and can be adopted only when its population is known?

A.   Simple random sampling

B.   Stratified random sampling

C.   Extensive sampling f

D.   Quota sampling

72: Which of the following statements is incorrect about sampling methods?

A.   Data can be collected faster in a sampling method.

B.   A sampling method provides the facility to organize and execute the research work conveniently.

C.   It is less expensive.

D.   No specialized knowledge is required to use a sampling method.

73:

Consider the following list:

squares_list = [2, 3. S. 2. 8. 9. 7. 6}

In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a f‌ixed set of key terms or automatically from the documents?


A.  

Vector model

B.  

Boolean model

C.  

Connectionist model

D.  

probabilistic model

74: Which of the following statements is NOT correct about pandas?

A.   It is well suited for tabular data with heterogeneously—typed columns.

B.   Only labelled data can be placed into a pandas data structure.

C.   It is suitable for arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels.

D.   Ordered and unordered (not necessarily f‌ixed-frequency) time series data can also be analyzed with pandas.

75: Which of the following fundamental measures used for assessing the quality of text retrieval represent(s) the percentage of retrieved documents relevant to a query?

A.   Recall

B.   F-score

C.   Precision

D.   Both a and c

76: Which of the following data mining algorithms is applied to a database containing a large number of transactions and also learns association rules?

A.   K-means

B.   C45

C.   EM

D.   Apriori

77:

Consider the following list:

squares_list = [2. 3. 5. 2. 8. 9. 7. 6}

What will be the output of the following Python command?

squares_list[-2]


A.  

-2

B.  

3

C.  

7

D.  

It is an invalid command.

78: While working in a Pylab environment, which of the following options do NOT need to be imported?

A.   matplotlib

B.   pandas

C.   numpy

D.   Both a and c

79:

Consider the following data:

Average cost of wafers = Rs. 35

Average cost of chocolates = Rs. 37

Standard deviation of cost of wafers = 2.0

Standard deviation of cost of chocolates = 3.0

Correlation coeff‌icient between the costs of chocolates and wafers = 0.7

What will be the expected cost of chocolates when the cost of wafers is Rs. 40?


A.  

Rs. 42.25 

B.  

Rs. 45

C.  

Rs. 39.85

D.  

Rs. 41.75

80: In association rule mining, an itemset is considered to be closed in which of the following situations?

A.   When all of its immediate supersets have the same support as the itemset.

B.   When none of its immediate subsets has the same support as the itemset.

C.   When all of its immediate subsets have the same support as the itemset.

D.   When none of its immediate supersets has the same support as the itemset.

81: It is given that a and b are two independent binomial variables having parameters 3,114 and 2,1/4, respectively. Find P (a + b 21).

A.   1/1024

B.   1023/1024

C.   11512

D.   511/512

82: The bag-of-words model is used in which of the following text mining processes?

A.   Features selection

B.   Text preprocessing

C.   Features generation

D.   Both a and b

83: For a group of 12 students, the sum of squares of differences in their ranks for science and math is given as 60. On the basis of the given information. find the value of rank correlation coefficient.

A.   0.60

B.   0.79

C.   0.45

D.   0.82

84: While calculating rank correlation coeff‌icient between sales and expenditure for a time period of12 years. the difference in rank for a year was mistakenly taken as 9 instead of 7 and as a result, the value Of rank correlation coefficient was calculated as 0.79. If the mistake is rectified, then what will be the approximate correct value of rank correlation coeff‌icient?

A.   0.88

B.   0.82

C.   0.95

D.   0.90

85: Which of the following clustering algorithms is used for grid-based partitioning?

A.   BIRCH

B.   K-means

C.   STING

D.   FCM

86: It is given that there are 15 pairs of readings on X and Y such that the coeff‌icient of correlation is 0.87. It is also given that the standard deviation on is 5.60. What will be the approximate standard error of estimate of Y on X?

A.   2.5

B.   2.8

C.   3.2

D.   3.4

A.   11/14

B.   13/14

C.   1/14

D.   3/14

88: Which of the following is a non-probability sampling method?

A.   Judgement sampling

B.   Stratified random sampling

C.   Cluster sampling

D.   Multistage random sampling

89: Which of the following statements are NOT correct about the Bayesian belief network?

A.   L1 In a belief network, class conditional independencies can be defined between the subsets of variables.

B.   VJ Joint conditional probability distribution cannot be specif‌ied by Bayesian belief networks.

C.   VJ A trained Bayesian network cannot be used for classification.

D.   VJ A graphical model of casual relationship for performing learning is provided by Bayesian belief network.

90: Which of the following statements is correct about the judgement sampling method?

A.   There is no possibility of personal prejudice in this method.

B.   It is more accurate and reliable.

C.   It is mostly used in those fields where almost similar units exist or some units are tOO important' to be left out of the sample.

D.   It is very expensive.

91: In the Baysian model, which of the following is the correct representation of the joint density of (6, X), if it is known that for a given 0, the observed data x are a realization of pa?

A.   n(xl0)p(x)

B.   n(0)p(x)

C.   n(0)p(xl0)

D.   nl(x)p(0lx)

92:

Which of the following commands is used to observe the way an R object is structured? It is given that mydata is a variable where a user's data is stored.

A.   library(mydata)

B.   describe(mydata)

C.   str(mydata)

D.   summary(mydata)

93: In which of the following Big Data technologies, moving relevant data management, analytics and reporting tasks to where the data resides, improves speed to insight, reduces data movement and promotes better data governance?

A.   Support for Hadoop

B.   ln-memory analytics

C.   Grid computing

D.   ln-database processing

94:

Which of the following challenges are faced in text mining?

(i) No publication is in electronic form.

(ii) Large textual database.

(iii) Complex relationships between concepts in text.

(iv) Limited number Of possible dimensions.


A.  

Only (i) and (ii)

B.  

Only (iii) and (iv) 

C.  

Only (ii) and (iii)

D.  

Only (i) and (iv)

95: Which of the following commands is used for starting iPython interface in inline Pylab mode and opening iPython notebook in pylab environment?

A.   ipython —pylab=in|ine

B.   ipython —pylab=inline -notebook

C.   ipython=notebook —pylab.in|ine

D.   ipython notebook —pylab=inline

96: ln data mining, according to Bayes‘ theorem, which of the following formulae represents posterior probability in terms of prior probability?

A.   P(X/H) = P(H/X)P(H)/P(X)

B.   P(H/X) = P(X/H)P(H)/P(X)

C.   P(H/X) = P(X/H)P(X)/P(H)

D.   P(XIH) = P(H/X)/P(H)P(X)

97: In data mining, which of the following statements is NOT correct about C45 algorithm?

A.   It allows only one outcome.

B.   A single-pass algorithm derived from binomial conf‌idence limits is used by C45.

C.   It uses information-based criteria.

98:

Consider the given information.


What should be the expenses budget (in Rs. thousands). if the salary of an individual is increased to Rs. 70 thousand?


A.  

7

B.  

8

C.  

13

D.  

11

99: If a user wants to learn about the top keywords that send traff‌ic to his/her website, then which of the following acquisition segmentations should be preferred?

A.   Referrals traff‌ic

B.   Organic traff‌ic

C.   Direct traff‌ic

D.   Social traff‌ic

100: In Google Analytics tool, which of the following analysis should be performed in order to identify the origin of a user's web traff‌ic?

A.   Acquisition analysis

B.   Audience analysis

C.   Behavior analysis

D.   Conversion analysis