Cloudera Hadoop Developer Certification(CCD-410)

I have cleared Cloudera certification in May 2015 . I would like to share how to work towards preparing yourself for the cloudera certification –

  • Hadoop Definitive Guide(3rd Edition/4th Edition)
  • Good Training : way2learnonline.com
  • HadoopExam “Simulator” from http://www.hadoopexam.com will guide you very well
  • https://developer.yahoo.com/hadoop/tutorial/
  • There will be some questions from sqoop as well.Check various ways of Import/Export options.
  • 1-2 questions from Hive/Pig/Flume/Oozie.
  • Go Thru http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
  • Functionality of classic map reduce and yarn daemons should be known
  • Most of the questions will be Map-Reduce related and some questions will be exercises of Map-Reduce.
  • Have a look at regular expressions as some questions may include regular expressions
  • Check Streaming,JVM Reuse,Speculative Execution,Skipping Bad Records
  • In MapReduce , have a look at following –
  • MapSide Join(Replicated Join)
  • Composite Join
  • Reducer side join(Repartition join)
  • Check Secondary sort,Inverted Index,Total Order Partitioner,NGRAM
  • Use various InputFormats(SequenceFileInputFormat,TextInputFormat ,NLineInputFormat etc), various OutputFormats
  • MultipleInputs
  • MultipleOutputs – http://www.lichun.cc/blog/2013/11/how-to-use-hadoop-multipleoutputs/
  • CustomInputFormat,RecordReader etc –
  • Combiner
  • Custom Writable,WritableComaparable,RawComparator
  • ControlledJobs
  • CustomPartitioner
  • MRUnit
  • Do some basic exercises in MapReduce for practice
  • Write MR code equivalent to queries like –
    • select field1,field2 from table
    • select distinct field1,field2 from table
    • select field1,count(*) from table group by field1
    • select field1,field2 from table order by field1,field2
    • …you can think of many more
  • ALL THE BEST ….

84 thoughts on “Cloudera Hadoop Developer Certification(CCD-410)

  1. hi Ruchi,
    Congrats on clearing the certification.
    Could you tell me what kind of questions are asked in the exam?any coding related?are there only 1-2 TOTAL questions on the ecosystem, in that case can I just have an understanding of pig,hive,flume,oozie and appear for exam?any other ecosystem understanding required?
    Thanks for your post

    Like

  2. Hi Ruchi,

    Did you gave test through a test center or from your personal laptop, as now cloudera don’t have any tie up with any 3rd party for delivering exam?

    Like

  3. Thank you so much for your post Ruchi
    Could you please clarify me whether there will be questions on hadoop streaming, python map reduce coding and avro

    Like

  4. Thanks again for the response Ruchi. Could you please clarify us on what are all the ecosystem components questions are being asked

    Like

  5. Thanks a lot Madam., It will do a great Work for me.,., Could you please clarify that if one can write Mapreduce programs on joins ,sorting and other concepts etc.,., howmuch time it may take for him to answer code related questions.,., and did u get MR code snipptes as options which are equivalent to SQL queries as u mentioned above??

    Like

    • yes you can write such MR programs…how much time it takes, depends on your practice&knowledge ….And Yes,based on a particular hive query or a sql query, they may give you 4 options and ask you to select one

      Like

  6. Hi Ruchi,
    Thanks for giving us details of the CDH exam. I have query regarding regular expression . What type of regular expression question ? like Hive’s LIKE and RLIKE keywords questions ?

    Like

  7. Hi Ruchi …
    I found your exam pointers to be extremely useful. Thank you so much !
    I have a doubt .. when you mentioned “Write MR code equivalent to queries like ā€“ select field1,field2 from table” , did you mean writing a hive query ?

    Like

  8. Agree with Ruchi’s blog (indeed a good one!) here..I took it Aug 1st week successfully and happy/lucky to emerge on the passing side (just). Code outcome prediction (which obviously ate up most of my 90 mins) formed a reasonable chunk. Requires an unshakeable understanding (object-oriented, not just a theoritical knowhow) of MR functioning for sure as every moment it challenges your basic funda amassed over last couple of months or so. Shell, Hive, Scoop questions also seemed beyond basic to me. Lastly, Tom White rocks! Hope this helps and all the best!

    Like

  9. Hi Ruchi,
    What does controlled jobs mean? Also would like to know how the questions was asked in hive, sqoop, pig, hbase?

    Like

    • Controlled Job : You can define many jobs and with the help of ControlledJob, you can define job1 will run then job2 will run and so on…no questions on hbase(Check if they have included hbase in certification course now), for sqoop various import/export options…for pig-basic level and for hive-although 2-3 questions only but they may ask anything-i can not tell anything specific..

      Like

  10. Hi Ruchi, congrats for clearing the exam. To take Cloudera certification, do we need to take their classroom course which cost $3000,00 In US , somebody said, it is must. Did you took cloudera classroom course before attempting the certification exam?

    Like

    • No, classroom course is not must and I did not take it. I took the course which i have mentioned in my blog and that too is not a mandatory one, i took it for my knowledge enhancement.

      Like

  11. Hi Ruchi,

    I have a question regarding the sql queries that are to be performed using MR code. In this case, do we have to consider the table to be existing in a simple text file or do we have to connect to some database and extract these columns?

    Like

    • When we write Map-Reduce code, the target is files stored in HDFS- file is a simple text file or sequence file or avro file etc , it depends……The way you query a database, in the same way , you can query files stored in HDFS system using MapReduce,Hive,Pig etc

      Like

  12. Hi Ruchi,
    You mentioned using simulator from hadoopexam.com for practice. Were the questions provided in those similar to the ones in actual test? Can we rely on them for correct answers?

    Like

  13. Hello Ruchi,

    I am planning to learn hadoop developer course and certification, i have very little programming experience, currently working in networking domain and have little bit software manual testing experience, now i am planning to change my domain into hadoop, just wondering how tough the programming and the certification will be for the beginners like me?

    Like

  14. HI Ruchi,
    Thanks for your blog,it is very helpful.
    Going through MR I see number of properties,do we need to learn all these?(like compression,max tasks etc etc)?
    Is it ok to have an overview of the ecosystem projects and in depth knowledge of MR and MR2?
    Apart from normal wordcount and weather programs can you suggest any link which has more scenarios and project related data.(If necessary for certification).

    Thanks in advance

    Like

    • Hi Saurabh,

      You will find many links here
      https://sites.google.com/site/continuelearning/Home/hadoop-class
      Although I didnt go thru it for certification but seems to be useful(all are not related to course of certification)
      Well, its better that you do some training
      Regarding ecosystem, i have already said earlier that – For sqoop various import/export optionsā€¦for pig-basic level and for hive-although 2-3 questions only but they may ask anything-i can not tell anything specific..So you can not rely on reading very basic level
      Again for MR, there is no specific course content but whatever u have asked are basic things

      Like

  15. Hi Ruchi,
    Thanks for sharing your test experience with all. I am also preparing for the exam and got hadoopexam dump on my machine. How much score in hadoopexam is good confidence buildup before going to real exam. I got between 72% to 82% in first 3-4 paper on hadoopexam. Is it good score? Exam no 7 in there is real tough do you think exam 7 is more close to real exam or is it too hard?

    Thanks,
    Ravi

    Like

    • Hi Ravi, Yes..I think 71 or 72% is the passing marks only and if i remember correctly first 2-3 papers doesn’t have MR exercises …you need to see how you solve MR exercises as well…i agree Exam 7 was difficult, it should not be that difficult ..probably they have taken it from different exams and added it in single exam…but you never know…why dont you ask from hadoop exam support people

      Like

  16. Hi Ruchi,

    It’s a great help for all which you provided above.
    I have one question on MR code.
    When you are saying to write MR code for HiveQL , so what you mean ?. They give one hive query and we need to write MR code similar to that or they give MRcode in options and we need to select the right one as similar to HIveQL.

    Thanks

    Like

  17. Hi Ruchi,
    Were there any questions related to Hadoop I/O? Topics like Compression, How to use compression in MR, Serialization, File Based Data Structures?

    Like

  18. Hi Ruchi,
    I am Preparing For Certification. From Definitive Guide can you tell what all Chapters need to refer for Certification. Also I have seen in the guide there are more of examples. How many questions on Java Comes.

    Thanks

    Like

    • Hi Vineet, i read the complete book.In case you want to read some specific chapters, you can compare with exam course contents. Examples are helpful as coding questions come in exam And with Java , if you mean – Java language specific questions then no such questions and if you mean Map Reduce questions then yes they do come but i dont have count as such

      Like

    • yes sheetal…its required…i have already mentioned all the sources in my blog.I took training, i have given site addresses,simulator name and book names also. Do you want to know something else?

      Like

  19. Hi Ruchi, I am prepairing for the ccd410 and just wanted to clarify one point do we get all the question on Hadoop 2.0 or we might get on Hadoop 1.0 as well (i.e. Job Tracker, Task Tracker etc.)
    Thanks for your help in advance.

    Like

  20. Hello Ruchi ,

    I have gone through many hadoop tutorials .My concern is that I dont have java background/experience . Since you have mentioned there are questions related to coding, would it be difficult for me or they are very basic to be understood

    Like

  21. Hi Ruchi,
    If Possible can U share the Study Material from Hadoop Exam Site. For Current Year Syllabus is going to change so I need to refer the old study material & planning to buy new in Month of Jan 2016.
    If possible mail me @ anand_vin@hotmail.com.

    Thanks
    Kind Rgd’s

    Like

  22. Hi Ruchi & Other Team Members.

    Seeing so many PPl. interested in HADOOP & some are certified I would like to take advantage of this forum (Thanks To Ruchi) to clarify my Doubt.I have few queries request you to clear the same.

    Query 1: I am having a Cluster of 50 Node. I would like to Deploy Pig & Hive Script so that it should run in Multi Node cluster what changes are required. Where to Deploy the Script.

    Query 2: I have written a function in C++ / Python / Pearl / PHP, would like to access / implement same in Pig & Hive what is required for same OR we can write Functions only in JAVA only.

    Query 3: To Process Un Structured Data normally we refer JAVA API’s. How to Use / Refer JAVA API’s to process Unstructured Data.

    Query 4: I have written Script in Pig / Have. In the Script I am referring a Function written in JAVA. To use function in PIG I need to Define & for Hive it’s Register. But how to include the JAVA Code I mean JAR file.

    I appreciate if any one of you can reply & mail me to: anand_vin@hotmail.com.

    Thanks
    Kind Rgd’s

    Like

Leave a reply to RuchiSaini Cancel reply