These Data Mining multiple-choice questions and their answers will help you strengthen your grip on the subject of Data Mining. You can prepare for an upcoming exam or job interview with these 100+ Data Mining MCQs.
So scroll down and start answering.
A. All of these
B. Retail
C. Manufacturing
D. Finance/Banking
A. Output Layer
B. Hidden Layer
C. Transparent layer
D. Input layer
A. inconsistent
B. dirty
C. nonintegrated
D. granular
A. The range of variables in a set
B. The number of nodes utilized
C. The graphical visualization of the data
D. The number of layers and the number of nodes in each layer
A. Single-Link
B. DSBSCAN
C. Both of these
D. None of these
A. False
B. True
A. CHAID
B. artificial
C. pruning
D. associative
A. <body answer="valid">This One</body>
B. <valid>This One</valid>
C. <valid>"This One"</valid>
D. All are valid
A. All of the above
B. Apache Cassandra
C. Google Big Table
D. MongoDB
A. The technical term for the act of data being stored in a server
B. A structured and developed prediction of data results
C. The visual interpretation of complex relationships in multidimensional data
A. Differential Decryption
B. Knoop-hardness measured through high-impact dimension
C. Knowledge Discovery in Databases
D. K-mean Data Discovery
A. All are valid types
B. Neural network
C. Statistical
D. Machine learning
A. False
B. True
A. All of the above
B. Artificial Intelligence
C. Statistics
D. Linguistics
A. Dependent
B. All of these
C. Response
D. Target variables
A. Classification
B. Regression
C. Segmentation
A. Predictable Sets
B. Punctional Organizations
C. Degrees of Fit
D. Clusters
A. Complex reports generated by a qualified data scientist
B. Hierarchical dimensions that can be created with a hyper cube browser
C. Data not collected by the organization, such as data available from a reference book
D. Structures that generate rules for the classification of a dataset
A. Relational Learning Models
B. Decision Trees and Rules
C. All of these
D. Probabilistic Graphical Dependency Models
A. False
B. True
A. A decision tree developed in the 1980's but almost entirely replaced by the CART method today
B. A six phase method for predicting e-commerce buying habits
C. Microsoft's linear regression algorithm
D. A cross-industry standard process for data mining
A. Antecedent
B. Activation Function
C. Confusion matrix
D. Chi-square
A. True
B. False
A. binary standard deviation
B. covariance
C. polyconvergence
D. stochastic inertia
A. Using business experience and gut instinct to design a new floorplan in a grocery store
B. Reorganizing your basketball team's starting lineup based on an analysis of performance
C. Placing two frequently purchased items next to each other on the shelf
D. Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes
A. Segmentation
B. Classification
C. Regression
A. An intuitive and user friendly user interface
B. Firewalls established to protect data from malicious sources
C. The hardware designed specifically for storage of massive amounts of data
D. The team of programmers who designed the software utilized in a particular mining project
A. decision boundary separating classes of data
B. variant of the C4.5 algorithm
C. collection of linked hypertext files
D. non-terminating error condition
A. Overlay
B. Overfitting
C. Noise
D. Non-applicable date
A. Price
B. Economic downturns
C. Staff Skills
D. Product Positioning
A. Sequential Patterning
B. Clustering
C. Classification
D. Gamification
A. Structural Level
B. Qualitative Level
C. Primary Level
D. Quantitative Level
A. Decrease the size of the training dataset
B. Increase the size of the training dataset
C. Increase the size of the test dataset
D. Decrease the size of the test dataset
A. AdaBoost
B. The Brin-Page Method
C. GoogleCrawler
D. PageRank
A. The antecedent is always a very complex variable
B. Nothing, they are interchangeable
C. The antecedent is on the right, the consequent is on the left.
D. The antecedent is on the left, the consequent on the right
A. partial average
B. unbiased mean
C. compounded mean
D. moving average
A. Learning a function that maps a data item into one of several predefined groups.
B. An expression E in a language L describing facts in a subset FE of F.
C. A descriptive task where one seeks to identify a finite set of categories to describe the data.
D. Learning a function that maps a data item to a real-valued prediction variable.
A. A multi-step process involving data preparation, pattern searching, knowledge evaluation, and refinement with iteration after modification.
B. Learning a function that maps a data item into one of several predefined groups or clusters.
C. The process of finding a model which describes significant dependencies between variables
D. A task which consists of techniques for estimating, from data, the joint multi-variate probability density function of all of the variables/fields in the database.
A. Hidden
B. Input
C. Output
D. Functional
A. a measure of the noise in a database's contents
B. partioning a database for distribution across different servers
C. simultaneously accessing multiple object databases over SSH
D. none of the above
A. A task focusing on discovering the most significant changes in the data from previously measured or normative values
B. Methods for finding a compact description for a subset of data.
C. The process of finding a model which describes significant dependencies between variables
D. A task which consists of techniques for estimating, from data, the joint multi-variate probability density function of all of the variables/fields in the database.
A. Fuzzy Logic
B. Association Learning
C. Anomaly Detection
D. Clustering Algorithms
A. Restricted Boltzmann machine
B. info-fuzzy networks
C. k-nearest neighbor
D. k-means algorithm
A. MongoDB
B. SQLite
C. MySQL
D. MariaDB
A. (None of these)
B. Disjoint training
C. Test Datasets
D. disjoint training and test datasets
A. Overfit
B. Parametric analysis
C. Underfit
D. Poorly defined Chernoff Bound
A. Heuristic algorithms
B. Bayesian inference algorithms
C. Genetic algorithms
D. Clustering algorithms
A. none of the above
B. easier to train via online learning
C. more resistent to local minima convergence
D. parametric
A. Node
B. SAP source
C. UDC
D. DB Connect
A. Nearest Neighbor
B. Logistic Regression
C. Association Model Query
D. Decision Treeing
A. Preliminary Method Mapping
B. Rule Induction
C. Fuzzy Logic Application
D. Dynamic Information Inference
A. Methods for finding a compact description for a subset of data.
B. Learning a function that maps a data item into one of several predefined groups.
C. A discovered pattern that is true on new data with some degree of certainty, and generalizes to other data.
D. A descriptive task where one seeks to identify a finite set of categories to describe the data.
A. Cleaning dirty data
B. Extracting data
C. Cleaning data
D. Storing purchased data
A. True
B. False
A. k-means algorithm
B. Markov chains
C. Dijkstra's algorithm
D. Neural Networks
A. Description
B. Performance
C. Prediction
A. A search algorithm that enables us to locate optimal binary string by processing an initial random population of binary strings by performing operations such as artificial mutation, crossover and selection.
B. An algorithm that estimates how well a particular pattern (a model and its parameters) meet the criteria of the KDD process. Evaluation of predictive accuracy (validity) is based on cross validation. Evaluation of descriptive quality involves predictive a
C. A classic algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item s
A. An overall measure of pattern value, combining validity, novelty, usefulness, and simplicity.
B. An expression E in a language L describing facts in a subset FE of F.
C. A multi-step process involving data preparation, pattern searching, knowledge evaluation, and refinement with iteration after modification.
D. A discovered pattern that is true on new data with some degree of certainty, and generalizes to other data.
A. MySQL matrices
B. linked lists
C. relational databases
D. key-value pair
A. checks the validity of a token
B. splits the stream of input characters into tokens
C. generates a context-free grammar
D. processes the parse tree for semantic meaning
A. A task which consists of techniques for estimating, from data, the joint multi-variate probability density function of all of the variables/fields in the database.
B. A descriptive task where one seeks to identify a finite set of categories to describe the data.
C. Learning a function that maps a data item into one of several predefined groups or clusters.
D. The process of finding a model which describes significant dependencies between variables
A. Utilizing a data dictionary
B. uncoupling program and data
C. Minimizing isolated files with repeated data
D. Enforcing referential integrity
A. Descriptive modeling analysis
B. Cluster analysis
C. Exploratory data analysis
D. Predictive analysis
A. DSBSCAN and Single Link
B. k-means and CLARANS
C. k-means only
D. Subspace Clustering Algorithms
A. Linear regression
B. Clustering
C. Knowledge
D. Meta-data
A. backpropagation
B. random initalization of weights
C. continuous output
D. able to learn non-linear separations
A. Voting
B. Stacking
C. Averaging
D. Bootstrapping
A. A task focusing on discovering the most significant changes in the data from previously measured or normative values
B. A descriptive task where one seeks to identify a finite set of categories to describe the data.
C. The process of finding a model which describes significant dependencies between variables
D. Methods for finding a compact description for a subset of data.
A. Validation
B. Support
C. Supervised learning
D. Topology
A. logistic function
B. multi-layered NN cannot compute continuous output
C. hyperbolic function
D. logarithmic function
A. A programming language that enables Hadoop to operate as a data warehouse.
B. None of these
C. A programming language that simplifies the common tasks of working with Hadoop.
A. Fuzzy Sampling
B. Binning
C. Boosting
D. Clustering
A. //a/[contains(@href, "profile")]
B. //a/[contains(@href, "profile")]/@href
C. //href/profile
D. //a/profile
A. DBSCAN
B. ID3
C. none of the above
D. logistic regression
A. stateless
B. linearly seperable
C. returns JSON output
D. stateful
A. Datanode
B. FS Shell
C. DFSAdmin
D. Namenode
A. Multi-faceted
B. Multi-leafed
C. Multivariated
D. Multi-modal
A. Firmly grasp business objectives and needs
B. Assess the current situation by finding out the resources, assumptions, constraints etc.
C. Create data mining goals to achieve the business objectives
D. Create a list of all relevant algorithms to be applied to the task
A. A command-line tool for retrieving files
B. A methodology for classifying hidden features of data
C. The part of HTTP that specifies access permission
D. Combinatorial Unsupervised Recursive Learning algorithm
A. Numeric Level
B. Primary Level
C. Dependency Level
D. Quantitative Level
A. Normal mixture models
B. Candidate generation
C. Overfitting methods
D. None of these
A. HTTPS
B. PGP
C. OAuth
D. SSL
A. Data Integration
B. Data Mining
C. Data Cleaning
D. Data Quantification
A. Cluster analysis
B. If...then... analysis
C. Regression analysis
D. Market-basket analysis
A. 1/n^2
B. 1/n
C. 1-1/n^2
D. 1/2n
A. All of the Above
B. Logistic Regression
C. ARIMA
D. Non-Linear Regression
E. Regression
A. Sort
B. Reduce
C. Map
D. Shuffle
A. No-coupling
B. Magnetic coupling
C. Transitive coupling
D. Quickstart coupling
A. True
B. False
A. Noise
B. Outliers
C. Range
D. Non-applicable data
A. Techniques to improve the efficiency of an Apriori algorithm
B. Method to repeatedly scan the scan the database and check a large set of candidates by pattern matching.
C. Methods of generating frequent item sets without candidate generation.
D. Methods for finding a compact description for a subset of data.
A. customer testimonials
B. holiday sale
C. money-back guarantee
D. loyalty cards
A. ID3 (Iterative Dichotomiser 3)
B. C4.5 algorithm
C. CART (Classification and Regression Trees)
D. CHAID (Chi Square Automatic Interaction Detection)
A. uses iterative refinement
B. more resistant to outliers
C. all of the above
D. represents clusters by center
A. Processing and management
B. Source and results
C. Management and delivery
D. Application and delivery
A. All of the above are appropriate
B. Selenium
C. PhantomJS
D. wget
A. Predictive analysis
B. Function activation
C. Link analysis
D. Clustering
A. measure variance
B. measure relevance
C. meaure accuracy
D. measure lift
A. {"answer": "this one"}
B. {"answer": ["this one"]}
C. {["answer": "this one"]}
D. All are valid
A. HTTP request headers
B. cookies
C. server logfiles
D. all of the above