Home > Tutorials > Hadoop Tutorial: Intro to HDFS

Hadoop Tutorial: Intro to HDFS

In this presentation, Sameer Farooqui is going to introduce the Hadoop Distributed File System, an Apache open source distributed file system designed to run on commodity hardware.

He’ll cover:

  • Origins of HDFS and Google File System / GFS
  • How a file breaks up into blocks before being distributed to a cluster
  • NameNode and DataNode basics
  • Technical architecture of HDFS
  • Sample HDFS commands
  • Rack Awareness
  • Synchronous write pipeline
  • How a client reads a file