I have cleared Cloudera certification in May 2015 . I would like to share how to work towards preparing yourself for the cloudera certification –
- Hadoop Definitive Guide(3rd Edition/4th Edition)
- Good Training : way2learnonline.com
- HadoopExam “Simulator” from http://www.hadoopexam.com will guide you very well
- https://developer.yahoo.com/hadoop/tutorial/
- There will be some questions from sqoop as well.Check various ways of Import/Export options.
- 1-2 questions from Hive/Pig/Flume/Oozie.
- Go Thru http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
- Functionality of classic map reduce and yarn daemons should be known
- Most of the questions will be Map-Reduce related and some questions will be exercises of Map-Reduce.
- Have a look at regular expressions as some questions may include regular expressions
- Check Streaming,JVM Reuse,Speculative Execution,Skipping Bad Records
- In MapReduce , have a look at following –
- MapSide Join(Replicated Join)
- Composite Join
- Reducer side join(Repartition join)
- Check Secondary sort,Inverted Index,Total Order Partitioner,NGRAM
- Use various InputFormats(SequenceFileInputFormat,TextInputFormat ,NLineInputFormat etc), various OutputFormats
- MultipleInputs
- MultipleOutputs – http://www.lichun.cc/blog/2013/11/how-to-use-hadoop-multipleoutputs/
- CustomInputFormat,RecordReader etc –
- Combiner
- Custom Writable,WritableComaparable,RawComparator
- ControlledJobs
- CustomPartitioner
- MRUnit
- Do some basic exercises in MapReduce for practice
- Write MR code equivalent to queries like –
- select field1,field2 from table
- select distinct field1,field2 from table
- select field1,count(*) from table group by field1
- select field1,field2 from table order by field1,field2
- …you can think of many more
- ALL THE BEST ….
Great
LikeLike
thanks great post on the topic i have seen.
LikeLike
hi Ruchi,
Congrats on clearing the certification.
Could you tell me what kind of questions are asked in the exam?any coding related?are there only 1-2 TOTAL questions on the ecosystem, in that case can I just have an understanding of pig,hive,flume,oozie and appear for exam?any other ecosystem understanding required?
Thanks for your post
LikeLike
hi saurabh…yes coding questions are asked, MapReduce coding questions,sqoop commands…there are few questions on ecosystem
LikeLike
Hi Ruchi, could you please share hadoop simulators @ sarvesh.sood@gmail.com?
LikeLike
Sarvesh, you need to buy simulator…it is machine specific and my simulator will not work on your machine
LikeLike
Thanks..
I want to know if hadoopexam sim. was really helpful or not ?
LikeLike
yes it was…The training that I have mentioned and the simulator – both were useful…hence i mentioned
LikeLike
Hi,
Can you please share your mail id..
LikeLike
From my about site, you can contact me thru linkedin and mail me
LikeLike
Are there many questions based on code examples?
LikeLike
yes
LikeLike
How any querying questions(map reduce & Hiveql) would come in the exam ?
LikeLike
1-2 questions on hive and more(around 30%-can vary) on map-reduce
LikeLike
Hi Ruchi,
Did you gave test through a test center or from your personal laptop, as now cloudera don’t have any tie up with any 3rd party for delivering exam?
LikeLike
Personal Laptop
LikeLike
Thanks Ruchi.
LikeLike
Thank you so much for your post Ruchi
Could you please clarify me whether there will be questions on hadoop streaming, python map reduce coding and avro
LikeLike
none in my paper
LikeLike
Thanks again for the response Ruchi. Could you please clarify us on what are all the ecosystem components questions are being asked
LikeLike
Apart from MR, Pig,Hive,Oozie,Streaming,Flume,Sqoop…please check if they have not added anything else…if you doubts then you can contact their support
LikeLike
Thank you so much for all the information š
LikeLike
Thanks a lot Madam., It will do a great Work for me.,., Could you please clarify that if one can write Mapreduce programs on joins ,sorting and other concepts etc.,., howmuch time it may take for him to answer code related questions.,., and did u get MR code snipptes as options which are equivalent to SQL queries as u mentioned above??
LikeLike
yes you can write such MR programs…how much time it takes, depends on your practice&knowledge ….And Yes,based on a particular hive query or a sql query, they may give you 4 options and ask you to select one
LikeLike
Hi Ruchi,
Thanks for giving us details of the CDH exam. I have query regarding regular expression . What type of regular expression question ? like Hive’s LIKE and RLIKE keywords questions ?
LikeLike
Example Regular expression is
(green|blue)?+.+
I read regular expressions from following sites :
https://docs.oracle.com/javase/tutorial/essential/regex/
http://www.regular-expressions.info/tutorial.html
Questions related to regular expressions didnt come in my paper
but it may come as i have seen it in cloudera test papers
LikeLike
Hi Ruchi,
Thanks for the guidance. I also just cleared the certification on 15 Aug 2015 . Your blog helped me a lot thanks .
LikeLike
Great!! Congratulations…Nice to know that blog could help
LikeLike
Hi Ruchi …
I found your exam pointers to be extremely useful. Thank you so much !
I have a doubt .. when you mentioned “Write MR code equivalent to queries like ā select field1,field2 from table” , did you mean writing a hive query ?
LikeLike
MR Code – MapReduce Code….you can get a question on MapReduce code equivalent to a hive/sql query…hive,pig all these are built on top of MapReduce only
LikeLike
Ok. Thanks again!
LikeLike
How much time you have give total to read and practice for this exam, means 1 month or so?
LikeLike
well…it took me around 1.5 months to understand hadoop concepts and then 2-3 weeks to prepare for certification
LikeLike
Hi Ruchi, your post was really helpful. Thanks
LikeLike
nice to know that Ritam š
LikeLike
Agree with Ruchi’s blog (indeed a good one!) here..I took it Aug 1st week successfully and happy/lucky to emerge on the passing side (just). Code outcome prediction (which obviously ate up most of my 90 mins) formed a reasonable chunk. Requires an unshakeable understanding (object-oriented, not just a theoritical knowhow) of MR functioning for sure as every moment it challenges your basic funda amassed over last couple of months or so. Shell, Hive, Scoop questions also seemed beyond basic to me. Lastly, Tom White rocks! Hope this helps and all the best!
LikeLike
Hi Ruchi,
What does controlled jobs mean? Also would like to know how the questions was asked in hive, sqoop, pig, hbase?
LikeLike
Controlled Job : You can define many jobs and with the help of ControlledJob, you can define job1 will run then job2 will run and so on…no questions on hbase(Check if they have included hbase in certification course now), for sqoop various import/export options…for pig-basic level and for hive-although 2-3 questions only but they may ask anything-i can not tell anything specific..
LikeLike
Hi Ruchi, congrats for clearing the exam. To take Cloudera certification, do we need to take their classroom course which cost $3000,00 In US , somebody said, it is must. Did you took cloudera classroom course before attempting the certification exam?
LikeLike
No, classroom course is not must and I did not take it. I took the course which i have mentioned in my blog and that too is not a mandatory one, i took it for my knowledge enhancement.
LikeLike
Hi Ruchi,
I have a question regarding the sql queries that are to be performed using MR code. In this case, do we have to consider the table to be existing in a simple text file or do we have to connect to some database and extract these columns?
LikeLike
When we write Map-Reduce code, the target is files stored in HDFS- file is a simple text file or sequence file or avro file etc , it depends……The way you query a database, in the same way , you can query files stored in HDFS system using MapReduce,Hive,Pig etc
LikeLike
Got it. Thanks a lot !
LikeLike
Can you please suggest where can I find more details on JVM reuse and skipping bad records?
LikeLike
Kisan, i read books which i have mentioned above and did some internet search, i dont have any specific link with me
LikeLike
Ok. Thanks.
LikeLike
Can you please suggest any materials for sqoop, hive specifically which u referred?? Thanks in advance..
LikeLike
I have taken training which i mentioned in my blog – it was really good.Apart from that, for Sqoop – i referred Apache site for command options for more practice.
https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html
LikeLike
Hi Ruchi,
You mentioned using simulator from hadoopexam.com for practice. Were the questions provided in those similar to the ones in actual test? Can we rely on them for correct answers?
LikeLike
Yes, similar ones …They have lot of questions and answers are correct except few ones(around 5%) and that was my doubt and I can be wrong
LikeLike
Ok. Thank you.
LikeLike
Hi Ruchi,
Are we allowed to carry any scribbling sheets during the exam? if not, how do we do the rough work?
LikeLike
no we are not allowed to carry any sheets…we just have to think thru it
LikeLike
Oh ok. Thanks !
LikeLike
Hello Ruchi,
I am planning to learn hadoop developer course and certification, i have very little programming experience, currently working in networking domain and have little bit software manual testing experience, now i am planning to change my domain into hadoop, just wondering how tough the programming and the certification will be for the beginners like me?
LikeLike
Well,certainly.. it would be tough..it will take time.But nothing is impossible if you are sure that this is what you want to do.
LikeLike
HI Ruchi,
Thanks for your blog,it is very helpful.
Going through MR I see number of properties,do we need to learn all these?(like compression,max tasks etc etc)?
Is it ok to have an overview of the ecosystem projects and in depth knowledge of MR and MR2?
Apart from normal wordcount and weather programs can you suggest any link which has more scenarios and project related data.(If necessary for certification).
Thanks in advance
LikeLike
Hi Saurabh,
You will find many links here
https://sites.google.com/site/continuelearning/Home/hadoop-class
Although I didnt go thru it for certification but seems to be useful(all are not related to course of certification)
Well, its better that you do some training
Regarding ecosystem, i have already said earlier that – For sqoop various import/export optionsā¦for pig-basic level and for hive-although 2-3 questions only but they may ask anything-i can not tell anything specific..So you can not rely on reading very basic level
Again for MR, there is no specific course content but whatever u have asked are basic things
LikeLike
Hi Ruchi,
Thanks for sharing your test experience with all. I am also preparing for the exam and got hadoopexam dump on my machine. How much score in hadoopexam is good confidence buildup before going to real exam. I got between 72% to 82% in first 3-4 paper on hadoopexam. Is it good score? Exam no 7 in there is real tough do you think exam 7 is more close to real exam or is it too hard?
Thanks,
Ravi
LikeLike
Hi Ravi, Yes..I think 71 or 72% is the passing marks only and if i remember correctly first 2-3 papers doesn’t have MR exercises …you need to see how you solve MR exercises as well…i agree Exam 7 was difficult, it should not be that difficult ..probably they have taken it from different exams and added it in single exam…but you never know…why dont you ask from hadoop exam support people
LikeLike
Hi Ruchi, thanks for your guidence. I have passed Hadoop Certification today. Appreciate it.
LikeLike
Thats great!!..All the best!!..Nice to know that blog is helpful to you
LikeLike
Hi Ruchi,
It’s a great help for all which you provided above.
I have one question on MR code.
When you are saying to write MR code for HiveQL , so what you mean ?. They give one hive query and we need to write MR code similar to that or they give MRcode in options and we need to select the right one as similar to HIveQL.
Thanks
LikeLike
Select the right one from options
LikeLike
I passed the exam and your blog helped a lot. Thanks.
LikeLike
Hi Ruchi,
Were there any questions related to Hadoop I/O? Topics like Compression, How to use compression in MR, Serialization, File Based Data Structures?
LikeLike
Souvik, you should go through these topics
LikeLike
Hi Ruchi,
I am Preparing For Certification. From Definitive Guide can you tell what all Chapters need to refer for Certification. Also I have seen in the guide there are more of examples. How many questions on Java Comes.
Thanks
LikeLike
Hi Vineet, i read the complete book.In case you want to read some specific chapters, you can compare with exam course contents. Examples are helpful as coding questions come in exam And with Java , if you mean – Java language specific questions then no such questions and if you mean Map Reduce questions then yes they do come but i dont have count as such
LikeLike
Thanks Ruchi for Input.
LikeLike
I wanted to practice some programs. Can you tell me if it is required? Where did you practice the programs from?
LikeLike
yes sheetal…its required…i have already mentioned all the sources in my blog.I took training, i have given site addresses,simulator name and book names also. Do you want to know something else?
LikeLike
Hi Ruchi, I am prepairing for the ccd410 and just wanted to clarify one point do we get all the question on Hadoop 2.0 or we might get on Hadoop 1.0 as well (i.e. Job Tracker, Task Tracker etc.)
Thanks for your help in advance.
LikeLike
both
LikeLike
Thanks for this!
LikeLike
Hello Ruchi ,
I have gone through many hadoop tutorials .My concern is that I dont have java background/experience . Since you have mentioned there are questions related to coding, would it be difficult for me or they are very basic to be understood
LikeLike
Learn core java ..its required for map-reduce.Not very basic but not very difficult also..hope u have programming experience in other language atleast
LikeLike
Hi Ruchi,
If Possible can U share the Study Material from Hadoop Exam Site. For Current Year Syllabus is going to change so I need to refer the old study material & planning to buy new in Month of Jan 2016.
If possible mail me @ anand_vin@hotmail.com.
Thanks
Kind Rgd’s
LikeLike
My material is machine specific..so you need to buy it from respective sites
LikeLike
Anyone here who has taken the new CCA 175 exam? Please share you experiences.
LikeLike
Hi Ruchi & Other Team Members.
Seeing so many PPl. interested in HADOOP & some are certified I would like to take advantage of this forum (Thanks To Ruchi) to clarify my Doubt.I have few queries request you to clear the same.
Query 1: I am having a Cluster of 50 Node. I would like to Deploy Pig & Hive Script so that it should run in Multi Node cluster what changes are required. Where to Deploy the Script.
Query 2: I have written a function in C++ / Python / Pearl / PHP, would like to access / implement same in Pig & Hive what is required for same OR we can write Functions only in JAVA only.
Query 3: To Process Un Structured Data normally we refer JAVA API’s. How to Use / Refer JAVA API’s to process Unstructured Data.
Query 4: I have written Script in Pig / Have. In the Script I am referring a Function written in JAVA. To use function in PIG I need to Define & for Hive it’s Register. But how to include the JAVA Code I mean JAR file.
I appreciate if any one of you can reply & mail me to: anand_vin@hotmail.com.
Thanks
Kind Rgd’s
LikeLike
Hi Vineet, this is certification blog. In case you dont get any answer, you can subscribe to any one of the mailing list of hadoop mentioned in this link: http://hadoop.apache.org/mailing_lists.html , many people discuss their problems here
LikeLike
excellent issues altogether, you just won a new reader. What could you recommend about your submit that you simply made some days in the past? Any certain?
LikeLike
may 2015, updated the blog..check for any changes in exam pattern now
LikeLike