Hadoop MCQs

Hadoop MCQs

These Hadoop multiple-choice questions and their answers will help you strengthen your grip on the subject of Hadoop You can prepare for an upcoming exam or job interview with these 100+ Hadoop MCQs.
So scroll down and start answering.

1: Which of the following is the correct syntax for the command used for setting replication for an existing file in Hadoop WebHDFS REST API?

A.   setReplication(string replication, Path p, FsPermission permission)

B.   setReplication(FsPermission permission, short replication)

C.   setReplication(string replication, Path p)

D.   setReplication(Path src, short replication) 

2: In Pig, which of the following types of join operations can be performed using the Replicated join?
i) Inner join
ii) Outer join
iii) Right-outer join
iv) Left-outer join

A.   Only i) and ii)

B.   Only ii) and iii)

C.   Only i) and iv)

D.   Only iii) and iv)

E.   Only i), ii), and iii) 

3: Which of the following is the master process that accepts the job submissions from the clients and schedule the tasks to run on worker nodes?

A.   Tasktracker

B.   Jobtracker

C.   YARN

D.   Node Manager

4: Which of the following environment variables is used for determining the Hadoop cluster for Pig, in order to run the MapReduce jobs?

A.   YARN_CONF_DIR

B.   HADOOP_PREFIX

C.   HADOOP_CONF_DIR

D.   HADOOP_HOME

5: Which of the following HDFS commands is used for checking inconsistencies and reporting problems with various files?

A.   fetchdt

B.   dfs

C.   oiv

D.   fsck 

6: Which of the following commands is used to view the content of a file named /newexample/example1.txt?

A.   bin/hdfs dfs −cat /newexample/example1.txt

B.   bin/hadoop dfsadmin -ddir /newexample/example1.txt

C.   bin/hadoop dfs -cat /newexample/example1.txt

D.   bin/hdfs dfsadmin -ddir /newexample/example1.txt 

7: In the Hadoop architecture, which of the following components is responsible for planning and execution of a single job?

A.   Resource Manager

B.   Node Manager

C.   Application Master

D.   Container

8: Which of the following interfaces is used for accessing the Hive metastore?

A.   Transform

B.   Command Line

C.   ObjectInspector

D.   Thrift

9: What is the function of the following Hadoop command?
du

A.   It displays the summary of file lengths.

B.   In case of a file, it displays the length of the file, while in case of a directory, it displays the sizes of the files and directories present in that directory.

C.   It displays the number of files in a particular directory.

D.   It displays the numbers and names of files present in a particular directory. 

10: In Hadoop, HDFS snapshots are taken for which of the following reasons?
i) For providing protection against user error.
ii) For providing backup.
iii) For disaster recovery.
iv) For copying data from data nodes.

A.   Only i) and iii)

B.   Only ii) and iv)

C.   Only i), ii), and iii)

D.   Only i), iii), and iv)

E.   All i), ii), iii), and iv) 

11: Before being inserted to a disk, the intermediate output records emitted by the Map task are buffered in the local memory, using a circular buffer. Which of the following properties is used to configure the size of this circular buffer?

A.   mapreduce.task.io.sort.mb

B.   io.record.sort.percent

C.   mapreduce.partitioner.class

D.   mapreduce.task.io.mb 

12: Which of the following interfaces can decrease the amount of memory required by UDFs, by accepting the input in chunks?

A.   PigReducerEstimator interface

B.   StoreFunc interface

C.   FilterFunc interface

D.   Accumulator interface 

13: Which of the following Hive clauses should be used for imposing a total order on the query results?

A.   ORDER BY

B.   SORT BY

C.   Either a or b

D.   None of the above 

14: Which of the following Hive commands is used for creating a database named MyData?

A.   CREATE MyData DATABASE

B.   CREATE DATABASE MyData

C.   CREATE NEW MyData DATABASE

D.   CREATE NEW DATABASE MyData 

15: Which of the given options is the function of the following Hadoop command?
-a

A.   It is used for checking whether all the libraries are available.

B.   It is used for expanding the wildcards.

C.   It is used for specifying a resource manager.

D.   It is used for assigning a value to a property

16: In which of the following Pig execution modes, a Java program is capable of invoking Pig commands by importing the Pig libraries?

A.   Interactive mode

B.   Batch mode

C.   Embedded mode

D.   Either Interactive or Batch mode 

17: Which of the given Pig data types has the following characteristics?
i) It is a collection of data values.
ii) These data values are ordered and have a fixed length.

A.   bytearray

B.   Bag

C.   Map

D.   Tuple 

18: Which of the following functions is performed by the Scheduler of Resource Manager in the YARN architecture?

A.   It provides insights on the status of the application.

B.   It guarantees the restart on the application and hardware failures.

C.   It allocates resources to the applications running in the cluster.

D.   It handles the applications submitted by the clients. 

19: Which of the following are the characteristics of the UNION operator of Pig?

A.   It does not impose any restrictions on the schema of the two datasets that are being concatenated.

B.   It removes the duplicate tuples while concatenating the datasets.

C.   It preserves the ordering of the tuples while concatenating the datasets.

D.   It uses the ONSCHEMA qualifier for giving a schema to the result 

20: Which of the following operators is necessary to be used for theta-join?

A.   Cogroup

B.   Foreach

C.   Cross

D.   Union 

21: Which of the following is the correct line command syntax of the Hadoop streaming command?

A.   hadoop command [streamingOptions]

B.   command ∼ Hadoop [genericOptions] [streamingOptions]

C.   hadoop command [genericOptions] [streamingOptions]

D.   command ∼ Hadoop [streamingOptions] [genericOptions] 

22: Which of the following statements is correct about the Hive joins?

A.   The joins in Hive are commutative.

B.   In Hive, more than two tables can be joined.

C.   The first table participating in the join is streamed to the reduce task by default.

D.   All are correct.

23: Which of the following is used for changing the group of a file?

A.   hdfs chgrp [owner] [:[group] ] [-R] <filepath><newgroup>

B.   hdfs chgrp [-R] <group> <filepath>

C.   hdfs chgrp [-R] <[group[: [owner]> <filepath>

D.   hdfs chgrp <group> <filepath>[-R] <newgroup>

E.   hdfs chgrp <group>[-R] <newgroup> 

24: Suppose you need to select a storage system that supports Resource Manager High Availability (RM HA). Which of the following types of storage should be selected in this case?

A.   LevelDB based state-store

B.   FileSystem based state-store

C.   Zookeeper based state-store

D.   Either option a or b could be used 

25: Which of the following statements is correct about the Hadoop file system namespace?

A.   User access permission is not implemented in HDFS.

B.   In HDFS, a user is not allowed to create directories.

C.   HDFS supports hard links.

D.   HDFS implements user quotas. 

26: Which of the following Hadoop commands is used for creating a file of zero length?

A.   touchz

B.   tail

C.   text

D.   test

27: Consider an input file named abc.dat.txt with default block size 128 MB. Which of the following is the correct command that will upload this file into an HDFS, with a block size of 512 MB?

A.   hadoop fs ∼ D blocksize=536870912 -put abc.dat.txt to abc.dat.newblock.txt

B.   hadoop fs.blocksize=536870912 -put abc.dat.txt abc.dat.newblock.txt

C.   hadoop fs -D dfs.blocksize=536870912 -put abc.dat.txt abc.dat.newblock.txt

D.   hadoop fs.blocksize −D=536870912 -put abc.dat.txt to abc.dat.newblock.txt 

28: What is the function of the following Configuration property of YARNs Resource Manager?
yarn.resourcemanager.ha.id

A.   It is used for identifying the class to be used by the client.

B.   It used for listing the logical ID's used by Resource Managers.

C.   It is used for specifying the corresponding host name for Resource Manager.

D.   It is used for identifying the Resource Manager in ensemble. 

29: Which two of the following are the correct differences between MapReduce and traditional RDBMS?

A.   The scaling in MapReduce is non-linear, whereas in RDBMS it is linear.

B.   In MapReduce, the read operation can be performed many times but the write operation can be performed only once. In traditional RDBMS, both read and write operations can be performed many times.

C.   The integrity of MapReduce is higher as compared to RDBMS.

D.   The access pattern of MapReduce is Batch, whereas the access pattern of RDBMS is Interactive and Batch 

30: Which of the following statements is correct about YARNs Web Application Proxy?

A.   It prevents the Application Manager from providing links to the malicious external sites.

B.   It prevents the execution of the malicious JavaScript code.

C.   It strips the cookies from the user and replaces them with a single cookie, providing the user name of the logged in user.

D.   It runs as a part of Resource Manger but cannot be configured to run in the stand-alone mode. 

31: Which two of the following parameters of the Hadoop streaming command are optional?

A.   −output directoryname

B.   −cmdenv name = value

C.   −combiner streamingCommand

D.   −reducer JavaClassName 

32: Which of the following commands is used for setting an environment variable in a streaming command?

A.   -file ABC =/home/example/

B.   −mapper ABC = /home/inputReader/example/dictionaries/

C.   −input ABC = /home/directories/example

D.   -cmdenv ABC = /home/example/dictionaries/ 

33: Which of the following are the advantages of the disk-level encryption in HDFS?

A.   It provides high performance.

B.   It can be deployed easily.

C.   It is highly flexible.

D.   It can protect against software as well as physical threats.

34: For a file named abc, which of the following Hadoop commands is used for setting all the permissions for owner, setting read permissions for group, and setting no permissions for other users in the system?

A.   hadoop fs −chmod abc 310

B.   hadoop fs −chmod 740 abc

C.   hadoop fs ∼chmod 420 abc

D.   hadoop fs −chmod abc ∼ 860

35: Which of the following permission levels is NOT allowed in HDFS authorization?

A.   Read

B.   Write

C.   Execute

D.   All three permission levels are allowed

36: Which of the following join operations are NOT supported by Hive?

A.   Left semi-join

B.   Inner join

C.   Theta join

D.   Fuzzy join 

37: Which of the following commands is used for creating a keytab file used in Kerberos authentication?

A.   kinit

B.   klist

C.   ktutil

D.   mradmin

E.   dfsadmin 

38: Which of the following is the advanced server-side configuration property of YARN that is used for enabling the deletion of aged data present in the timeline store?

A.   yarn.timeline-service.ttl-enable

B.   yarn.timeline-service.enabled

C.   yarn.timeline-service.generic-application-history.enabled

D.   yarn.timeline-service.recovery.enabled 

39: In order to execute a custom, user-built JAR file, the jar command is used. Which of the following is the correct syntax of this command?

A.   yarn node -jar [main class name] <jar file path>[arguments…]

B.   yarn jar <jar file path> [main class name] [arguments…]

C.   yarn application -jar [main class name] <jar file path> [arguments…]

D.   yarn logs jar <jar file path> [main class name] [arguments…] 

40: Which of the following YARN commands is used for overwriting the default Configuration directory ${HADOOP_PREFIX}/conf?

A.   --config confdir

B.   --config. YarnConfiguration

C.   daemonlog −getlevel

D.   daemonlog confdir 

41: Which of the following HiveQL commands is used for printing the list of configuration variables overridden by Hive or the user?

A.   set

B.   set −v

C.   dfs

D.   reset 

42: Which of the given options is the correct function of the following HiveQL command?
!

A.   It is used for executing a dfs command from the Hive shell.

B.   It is used for executing a shell command inside the CLI.

C.   It is used for executing a shell command from the Hive shell.

D.   It is used for executing a dfs command inside the CLI. 

43: Which of the following are the functions of Hadoop?
i) Data Search
ii) Data Retention
iii) Recommendation systems
iv) Analytics

A.   Only i) and iii)

B.   Only i) and ii)

C.   Only i), ii), and iv)

D.   All i), ii), iii), and iv)

44: Which of the following configuration properties of YARNs Resource Manager is used for specifying the host:port for clients to submit jobs?

A.   yarn.resourcemanager.ha.rm-ids

B.   yarn.resourcemanager.address.rm-id

C.   yarn.resourcemanager.hostname.rm-id

D.   yarn.resourcemanager.scheduler.address.rm-id

45: Which of the following HDFS commands is used for setting an extended attribute name and value for a file or a directory?

A.   setgid

B.   setFile

C.   setfattr

D.   setQuota

E.   setConf 

46: Which of the following Pig commands is/are used for sampling a data and applying a query on it?

A.   DESCRIBE

B.   ILLUSTRATE

C.   EXPLAIN

D.   Both a and b 

47: Which of the following is the correct syntax for the docs Maven profile that is used for creating documentation in Hadoop Auth?

A.   $ mvn package − Pdocs

B.   $ mvn Pdocs

C.   $ curl − mvn Pdocs

D.   $ curl − mvn Pdocs − package 

48: In case of service-level authorization in Hadoop, which of the following properties is used for determining the ACEs used for granting permissions for the DataNodes to communicate and access the NameNode?

A.   security.client.datanode.protocol.acl

B.   security.namenode.protocol.acl

C.   security.client.protocol.acl

D.   security.datanode.protocol.acl

49: What is the default value of the following security configuration property of the YARN architecture?
yarn.timeline-service.delegation.token.renew-interval

A.   1 day

B.   3 days

C.   5 days

D.   7 days 

50: While configuring HTTP authentication in Hadoop, which of the following is set as the value of the "hadoop.http.filter.initializers" property?

A.   org.apache.hadoop.security.AuthenticationInitializer class name

B.   org.apache.hadoop.security.ShellBasedUnixGroupsMapping class name

C.   org.apache.hadoop.security.LdapGroupsMapping class

D.   org.apache.hadoop.security.ssl class name