Wednesday, April 12, 2017

Snakebite: A Python client library for accessing HDFS

Snakebite is a Python package, created by Spotify, that provides a Python client library allowing HDFS to be accessed programmatically from Python applications. Snakebite package also comes with a command line interface for HDFS that is based on client library. The client library uses Protobuf messages to communicate directly with Namenode.

Snakebite requires Python 2.0 and python-protobuf  2.4.1 or higher. Python 3 is not currently supported. Snakebite can be installed using pip. This command installs snakebite, python-protobuf and other dependent packages.

$ pip install snakebite

Snakebite is distributed through PyPI. Snakebite CLI accesses HDFS similar to hdfs dfs. Here are some of the CLI commands

$ snakebite mkdir /user/root/snakebite
OK: /user/root/snakebite

$  snakebite ls snakebite
Found 1 items
-rw-r--r--   3 root       root               46 2017-04-12 23:58 /user/root/snakebite/sample.txt

Snakebite, however doesn't support copying a file from local filesystem to HDFS yet. We can copy from hadoop to local filesystem.

$ snakebite get snakebite/sample.txt ./sample_bkp.txt
OK: /root/sample_bkp.txt

The major difference between snakebite and hdfs dfs is that snakebite is pure python client and doesn't need to load any Java libraries to communicate with HDFS. 

Now, lets see how snakebite can communicate with HDFS programmatically with a sample example.

from snakebite.client import Client
client = Client('localhost',9000)
for x in client.ls(['snakebite/']):
   print x

returns

{'group': u'root', 'permission': 420, 'file_type': 'f', 'access_time': 1492041536789L, 'block_replication': 3, 'modification_time': 1492041537040L, 'length': 46L, 'blocksize': 134217728L, 'owner': u'root', 'path': '/user/root/snakebite/sample.txt'}

Snakebite ls method takes a list of paths and returns a list of maps that contain the file information. Every program that uses the client library must make a connection to HDFS Namenode. The hostname and portnumber values provided in Client() method are dependent on HDFS configuration. These values can be found in hadoop/conf/core-site.xml file under the property fs.defaultFS:

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
      <final>true</final>
    </property>
 
For more information on Snakebite, refer to the official documentation page.



No comments:

Post a Comment

Amazon S3: Basic Concepts

Amazon S3 is an reliable, scalable, online object storage that stores files. Bucket: A bucket is a container in Amazon S3 where the fil...