Overview

The HttpFS-Proxy service is an authenticating proxy in front of a HttpFS/HDFS service. It offers a REST API for manipulating files on an HDFS filesystem, simplifying the underlying webhdfs API exposed from the HttpFS/HDFS service.

It acts as a gateway service (similarly to the what HttpFS does) i.e. it does not require that the client has network access to the Hadoop cluster.

Authentication

The API authenticates incoming requests using HTTP Basic Authentication scheme. So, it expects an Authorization request header to be present in each request.

A simple example using curl and authenticating as user someone (proper headers will be added by curl because of the -u flag):

curl -XGET -u someone -i 'https://httpfsproxy.example.net/files/status?path=temp/hello.txt'

Each proxy-level user maps to a Hadoop-level user, not necessarily with the same name. The proxy maintains the mapping between its users and Hadoop-level users in order to be able to form proper requests to the underlying webhdfs API (e.g. for translating relative paths).

From now on, we will assume that all example requests are performed by a user someone mapping to a Hadoop-level user user.

Resources

This section describes some common resources exchanged by the API.

File Status

An object that represents the status of a file/directory in a HDFS filesystem. Is a rough equivalent to the output of the stat command in a POSIX filesystem.

Path Type Description

type

String

The type of this enty. One of: FILE, DIRECTORY, SYMLINK

length

Number

The length of a file (in bytes), or zero if a directory

pathSuffix

String

A target-relative path

permission

String

The octal permission, e.g 644

blockSize

Number

The block size (in bytes)

replication

Number

The replication factor for a file, or zero if a directory

accessTime

Number

The timestamp of last access (in Epoch milliseconds)

modificationTime

Number

The timestamp of last modification (in Epoch milliseconds)

owner

String

The owning user

group

String

The owning group

Files API

This is an overview of the files API. All API endpoints, except /f/home-directory, require a path request parameter.

If a path does not correspond to an existing file (or when a regular file is expected but a directory is given), the server responds with 404 Not Found along with a detailed error message (as a JSON payload).

Operation Request Details/Comments

Get home directory

GET /f/home-directory

Get status of file/directory

GET /f/file/status

Get checksum of a file

GET /f/file/checksum

List a directory

GET /f/listing

List children of a directory (no recursion)

Get content summary of a file/directory

GET /f/file/summary

Download a file

GET /f/file/content

Stream file content to the client

Create a directory

PUT /f/directory

Rename a file/directory

PUT /f/name

Upload a file

PUT /f/file/content

Upload to create (or replace) content of target path

Append to a file

POST /f/file/content

Upload to append content to a target path

Concatenate sources into a file

POST /f/file/content

Do not upload, use existing file sources to be concatenated.

Truncate a file

DELETE /f/file/content

Delete a file/directory

DELETE /f/file

Set permission for a file/directory

PUT /f/file/permission

Set replication factor for a file

PUT /f/file/replication

Get home directory

This is a GET request to /f/home-directory.

The request parameters

No parameters

The response

The overall response is as follows:

Path Type Description

status

String

An overall status (SUCCESS or FAILURE)

error

Varies

An array of error messages, or null on success

result

Object

The request-specific result (on success)

The result part in detail:

Path Type Description

path

String

The absolute path to the home directory

A request/response example

GET /f/home-directory HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Length: 90
Content-Type: application/json;charset=UTF-8

{
  "status" : "SUCCESS",
  "result" : {
    "path" : "/user/user"
  },
  "error" : null
}

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/home-directory' -i -X GET

Get status of a file/directory

This is a GET request to /f/file/status.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a file or a directory

The response

The overall response is as follows:

Path Type Description

status

String

An overall status (SUCCESS or FAILURE)

error

Varies

An array of error messages, or null on success

result

Object

The request-specific result (on success)

The result part in detail:

Path Type Description

status

Object

An object holding the file status of target path

A request/response example

A typical request/response exhange is the following (note that, for this kind of request, the result pathSuffix will always be empty):

GET /f/file/status?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Content-Length: 364

{
  "status" : "SUCCESS",
  "result" : {
    "status" : {
      "type" : "FILE",
      "pathSuffix" : "",
      "permission" : "644",
      "length" : 13,
      "blockSize" : 67108864,
      "accessTime" : 1570383375309,
      "modificationTime" : 1570383375330,
      "owner" : "user",
      "group" : "user",
      "replication" : 2
    }
  },
  "error" : null
}

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello.txt' -i -X GET

Get checksum of a file

This is a GET request to /f/file/checksum. It returns a variation of an MD5 checksum as computed on the HDFS filesystem.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a (regular) file

The response

The overall response is as follows:

Path Type Description

status

String

An overall status (SUCCESS or FAILURE)

error

Varies

An array of error messages, or null on success

result

Object

The request-specific result (on success)

The result part in detail:

Path Type Description

checksum

Object

An object holding the details on a computed cheksum

checksum.algorithm

String

The name of the checksum algorithm (an MD5 variation)

checksum.bytes

String

The checksum as a hex-encoded string

checksum.length

Number

The length (in bytes) of the checksum

A request/response example

A typical request/response exhange is as follows:

GET /f/file/checksum?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Content-Length: 233

{
  "status" : "SUCCESS",
  "result" : {
    "checksum" : {
      "algorithm" : "MD5-of-0MD5-of-512CRC32C",
      "bytes" : "0000020000000000000000002cc8ec24690acd32cd03e9439ba0760d",
      "length" : 28
    }
  },
  "error" : null
}

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/checksum?path=temp/1570383373547/hello.txt' -i -X GET

List a directory

This is a GET request to /f/listing.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a directory

The response

The overall response is as follows:

Path Type Description

status

String

An overall status (SUCCESS or FAILURE)

error

Varies

An array of error messages, or null on success

result

Object

The request-specific result (on success)

The result part in detail:

Path Type Description

statuses

Array

An array of file status objects

A request/response example

A typical request/response exhange is the following (note that, for this kind of request, pathSuffix will always a non-empty file/directory name):

GET /f/listing?path=temp/1570383373547/ HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Length: 1240
Content-Type: application/json;charset=UTF-8

{
  "status" : "SUCCESS",
  "result" : {
    "statuses" : [ {
      "type" : "DIRECTORY",
      "pathSuffix" : "baz",
      "permission" : "755",
      "length" : 0,
      "blockSize" : 0,
      "accessTime" : 0,
      "modificationTime" : 1570383374702,
      "owner" : "user",
      "group" : "user",
      "replication" : 0
    }, {
      "type" : "FILE",
      "pathSuffix" : "hello-and-then-goodbye.txt",
      "permission" : "644",
      "length" : 28,
      "blockSize" : 67108864,
      "accessTime" : 1570383375487,
      "modificationTime" : 1570383375617,
      "owner" : "user",
      "group" : "user",
      "replication" : 2
    }, {
      "type" : "FILE",
      "pathSuffix" : "hello.txt",
      "permission" : "644",
      "length" : 12,
      "blockSize" : 67108864,
      "accessTime" : 1570383375683,
      "modificationTime" : 1570383375703,
      "owner" : "user",
      "group" : "user",
      "replication" : 3
    }, {
      "type" : "DIRECTORY",
      "pathSuffix" : "sub1",
      "permission" : "755",
      "length" : 0,
      "blockSize" : 0,
      "accessTime" : 0,
      "modificationTime" : 1570383375857,
      "owner" : "user",
      "group" : "user",
      "replication" : 0
    } ]
  },
  "error" : null
}

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/listing?path=temp/1570383373547/' -i -X GET

Get content summary

This is a GET request to /f/file/summary.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a file or directory

The response

The response is a summary of space/quota utilization for a given subtree (path) of the HDFS filesystem.

The overall response is as follows:

Path Type Description

status

String

An overall status (SUCCESS or FAILURE)

error

Varies

An array of error messages, or null on success

result

Object

The request-specific result (on success)

The result part in detail:

Path Type Description

summary

Object

An object representing a usage summary of a subtree

summary.directoryCount

Number

The number of directories

summary.fileCount

Number

The number of regular files

summary.length

Number

The number of bytes used by the content

summary.quota

Number

The quota on the number of entries under this directory

summary.spaceConsumed

Number

The disk space consumed by the content

summary.spaceQuota

Number

The quota on the total disk space

A request/response example

A typical request/response exhange is the following:

GET /f/file/summary?path=temp/1570383373547/ HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Content-Length: 233

{
  "status" : "SUCCESS",
  "result" : {
    "summary" : {
      "directoryCount" : 3,
      "fileCount" : 3,
      "length" : 95,
      "quota" : -1,
      "spaceConsumed" : 202,
      "spaceQuota" : -1
    }
  },
  "error" : null
}

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/summary?path=temp/1570383373547/' -i -X GET

Download a file

This is a GET request to /f/file/content.

Because content can be quite large, this method supports compressed responses; it examines the Accept-Encoding request header and outputs compressed content (if, of course, is requested so). Currently, the only supported compression algorithm is gzip.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a regular file

length

The number of bytes to be returned

offset

The starting byte position

The response

The response always comes as generic binary content (i.e. content of MIME type application/octet-stream).

A request/response example

A typical request/response exhange is the following (file content not shown):

GET /f/file/content?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/octet-stream

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello.txt' -i -X GET

In order to reduce the size of the transfer, you can use the --compressed flag which will send the appropriate Accept-Encoding: gzip request header and will let the server switch to output compression.

Create a directory

This is a PUT request to /f/directory.

This method will recursively create all parent directories (if needed so). If target directory already exists, it will do nothing and will report a success.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a directory (to be created)

permission

The octal permission for this directory (default is 775)

The response

On success, the server responds with 201 Created, an empty body, and the Location header set to the status URI of the newly created directory.

A request/response example

A typical request/response exhange is the following:

PUT /f/directory?path=temp/1570383373547/ HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/directory?path=temp/1570383373547/' -i -X PUT

Rename a file/directory

This is a PUT request to /f/name.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing file or directory

destination

Yes

The new path to move to

The response

On success, the server responds with 201 Created, an empty body, and the Location header set to the status URI of the new path.

A request/response example

A typical request/response exhange is the following:

PUT /f/name?path=temp/1570383373547/foo&destination=temp/1570383373547/baz HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/baz

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/name?path=temp/1570383373547/foo&destination=temp/1570383373547/baz' -i -X PUT

Upload a file

This is a PUT request to /f/file/content.

Because content can be quite large, this method supports request body compression: it examines the Content-Encoding request header and acts accordingly (by de-compressing content on the server side). Currently, the only supported compression algorithm is gzip.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of a regular file (to be created or replaced)

overwrite

A flag that indicates if an existing file should be replaced (default is false)

replication

The replication factor

permission

The octal permission for this file (default is 664)

The response

On success, the server responds with 201 Created, an empty body, and the Location header set to the status URI of the newly created (or replaced) file.

A request/response example

A typical request/response exhange is the following:

PUT /f/file/content?path=temp/1570383373547/hello.txt&overwrite=true HTTP/1.1
Host: httpfsproxy.example.net:8443
Content-Type: application/octet-stream
Content-Length: 13

Hello Hadoop!
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello.txt

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello.txt&overwrite=true' -i -X PUT \
    -H 'Content-Type: application/octet-stream' \
    -d 'Hello Hadoop!'

For uploading real-world files, the recommended way is by using the --upload-file <FILE> switch (instead of supplying a --data 'DATA' switch).

To reduce the amount of transferred data, compress the request body: pipe the output of gzip to curl and add the Content-Encoding: gzip request header. For example:

gzip -c ~/data/road-network.csv | \
   curl -T "-" -H "Content-Encoding: gzip" -H "Content-Type: application/octet-stream" -X POST ...

Append to a file

This is a POST request to /f/file/content.

Note that a request header of Content-Type: application/octet-stream must be present to let the server know that the source of our data (to be appended) comes from the request body (instead of concatenating sources already present in the HDFS filesystem).

This method also supports request body compression (same as in the upload case).

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing regular file (to append content to)

The response

On success, the server responds with 201 Created, an empty body, and the Location header set to the status URI of the modified file.

A request/response example

A typical request/response exhange is the following:

POST /f/file/content?path=temp/1570383373547/hello-and-then-goodbye.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
Content-Length: 16
Content-Type: application/octet-stream

You say goodbye!
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello-and-then-goodbye.txt

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello-and-then-goodbye.txt' -i -X POST \
    -H 'Content-Type: application/octet-stream' \
    -d 'You say goodbye!'

For appending large amounts of data, the recommended way is using --upload-file <FILE> switch (instead of supplying a --data 'DATA' switch).

Concatenate sources into a file

This is a POST request to /f/file/content.

Note that (unlike the append scenario) the request body must be absent (and, of course, any relevant header of Content-* must also be absent).

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing regular file (the target file)

sources

Yes

The list of comma-separated names of source files to be concatenated into the target file. These names must be plain file names and will be resolved relative to parent of the target file

The response

On success, the server responds with 201 Created, an empty body, and the Location header set to the status URI of the target file.

A request/response example

Let hello-part.txt and goodbye-part.txt be the source files to be concatenated into an existing target file hello-goodbye.txt. The source files are located in the same directory with the target file (this is a limitation from the underlying HttpFS operation).

A typical request/response exhange is the following:

POST /f/file/content?path=temp/1570383373547/hello-goodbye.txt&sources=hello-part.txt,goodbye-part.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello-goodbye.txt

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello-goodbye.txt&sources=hello-part.txt,goodbye-part.txt' -i -X POST

Truncate a file

This is a DELETE request to /f/file/content.

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing regular file (to be truncated to zero length)

The response

On success, the server responds with 204 No Content and an empty body.

A request/response example

A typical request/response exhange is the following:

DELETE /f/file/content?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello.txt' -i -X DELETE

Delete a file/directory

This is a DELETE request to /f/file

The request parameters

It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing file or directory

recursive

A flag that indicates if a directory should be deleted recursively (default is false)

The response

On success, the server responds with 204 No Content and an empty body.

A request/response example

A typical request/response exhange is the following:

DELETE /f/file?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file?path=temp/1570383373547/hello.txt' -i -X DELETE

Set permission for a file/directory

This is a PUT request to /f/file/permission

The request parameters

parameters It accepts the following request parameters:

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing file or directory

permission

Yes

An octal permission to set

The response

On success, the server responds with 204 No Content and an empty body.

A request/response example

A typical request/response exhange is the following:

PUT /f/file/permission?path=temp/1570383373547/hello.txt&permission=640 HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/permission?path=temp/1570383373547/hello.txt&permission=640' -i -X PUT

Set replication factor for a file

This is a PUT request to /f/file/replication

The request parameters

Parameter

Required

Description

path

Yes

A user-relative or absolute path of an existing file or directory

replication

Yes

The replication factor (>= 2) to apply

The response

On success, the server responds with 204 No Content and an empty body.

A request/response example

A typical request/response exhange is the following:

PUT /f/file/replication?path=temp/1570383373547/hello.txt&replication=3 HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content

A request example with CURL

$ curl 'https://httpfsproxy.example.net:8443/f/file/replication?path=temp/1570383373547/hello.txt&replication=3' -i -X PUT