Overview
The HttpFS-Proxy
service is an authenticating proxy in front of a HttpFS/HDFS service. It offers a REST API for manipulating
files on an HDFS filesystem, simplifying the underlying webhdfs
API exposed from the HttpFS
/HDFS
service.
It acts as a gateway service (similarly to the what HttpFS
does) i.e. it does not require that the client has network
access to the Hadoop cluster.
Authentication
The API authenticates incoming requests using HTTP Basic Authentication
scheme. So, it expects an Authorization
request header to be present in each request.
A simple example using curl
and authenticating as user someone
(proper headers will be added by curl
because of the -u
flag):
curl -XGET -u someone -i 'https://httpfsproxy.example.net/files/status?path=temp/hello.txt'
Each proxy-level user maps to a Hadoop-level user, not necessarily with the same name. The proxy maintains the mapping between its users
and Hadoop-level users in order to be able to form proper requests to the underlying webhdfs
API (e.g. for translating relative paths).
From now on, we will assume that all example requests are performed by a user someone
mapping to a Hadoop-level user user
.
Resources
This section describes some common resources exchanged by the API.
File Status
An object that represents the status of a file/directory in a HDFS filesystem. Is a rough equivalent to the output
of the stat
command in a POSIX filesystem.
Path | Type | Description |
---|---|---|
|
|
The type of this enty. One of: |
|
|
The length of a file (in bytes), or zero if a directory |
|
|
A target-relative path |
|
|
The octal permission, e.g |
|
|
The block size (in bytes) |
|
|
The replication factor for a file, or zero if a directory |
|
|
The timestamp of last access (in Epoch milliseconds) |
|
|
The timestamp of last modification (in Epoch milliseconds) |
|
|
The owning user |
|
|
The owning group |
Files API
This is an overview of the files API. All API endpoints, except /f/home-directory
, require a path
request parameter.
If a path
does not correspond to an existing file (or when a regular file is expected but a directory is given), the server responds
with 404 Not Found
along with a detailed error message (as a JSON payload).
Operation | Request | Details/Comments |
---|---|---|
Get home directory |
|
|
Get status of file/directory |
|
|
Get checksum of a file |
|
|
List a directory |
|
List children of a directory (no recursion) |
Get content summary of a file/directory |
|
|
Download a file |
|
Stream file content to the client |
Create a directory |
|
|
Rename a file/directory |
|
|
Upload a file |
|
Upload to create (or replace) content of target path |
Append to a file |
|
Upload to append content to a target path |
Concatenate sources into a file |
|
Do not upload, use existing file sources to be concatenated. |
Truncate a file |
|
|
Delete a file/directory |
|
|
Set permission for a file/directory |
|
|
Set replication factor for a file |
|
Get home directory
This is a GET
request to /f/home-directory
.
The request parameters
No parameters
The response
The overall response is as follows:
Path | Type | Description |
---|---|---|
|
|
An overall status ( |
|
|
An array of error messages, or |
|
|
The request-specific result (on success) |
The result
part in detail:
Path | Type | Description |
---|---|---|
|
|
The absolute path to the home directory |
A request/response example
GET /f/home-directory HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Length: 90
Content-Type: application/json;charset=UTF-8
{
"status" : "SUCCESS",
"result" : {
"path" : "/user/user"
},
"error" : null
}
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/home-directory' -i -X GET
Get status of a file/directory
This is a GET
request to /f/file/status
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a file or a directory |
The response
The overall response is as follows:
Path | Type | Description |
---|---|---|
|
|
An overall status ( |
|
|
An array of error messages, or |
|
|
The request-specific result (on success) |
The result
part in detail:
Path | Type | Description |
---|---|---|
|
|
An object holding the file status of target path |
A request/response example
A typical request/response exhange is the following (note that, for this kind of request,
the result pathSuffix
will always be empty):
GET /f/file/status?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Content-Length: 364
{
"status" : "SUCCESS",
"result" : {
"status" : {
"type" : "FILE",
"pathSuffix" : "",
"permission" : "644",
"length" : 13,
"blockSize" : 67108864,
"accessTime" : 1570383375309,
"modificationTime" : 1570383375330,
"owner" : "user",
"group" : "user",
"replication" : 2
}
},
"error" : null
}
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello.txt' -i -X GET
Get checksum of a file
This is a GET
request to /f/file/checksum
. It returns a variation of an MD5 checksum as computed on the HDFS filesystem.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a (regular) file |
The response
The overall response is as follows:
Path | Type | Description |
---|---|---|
|
|
An overall status ( |
|
|
An array of error messages, or |
|
|
The request-specific result (on success) |
The result
part in detail:
Path | Type | Description |
---|---|---|
|
|
An object holding the details on a computed cheksum |
|
|
The name of the checksum algorithm (an MD5 variation) |
|
|
The checksum as a hex-encoded string |
|
|
The length (in bytes) of the checksum |
A request/response example
A typical request/response exhange is as follows:
GET /f/file/checksum?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Content-Length: 233
{
"status" : "SUCCESS",
"result" : {
"checksum" : {
"algorithm" : "MD5-of-0MD5-of-512CRC32C",
"bytes" : "0000020000000000000000002cc8ec24690acd32cd03e9439ba0760d",
"length" : 28
}
},
"error" : null
}
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/checksum?path=temp/1570383373547/hello.txt' -i -X GET
List a directory
This is a GET
request to /f/listing
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a directory |
The response
The overall response is as follows:
Path | Type | Description |
---|---|---|
|
|
An overall status ( |
|
|
An array of error messages, or |
|
|
The request-specific result (on success) |
The result
part in detail:
Path | Type | Description |
---|---|---|
|
|
An array of file status objects |
A request/response example
A typical request/response exhange is the following (note that, for this kind of request,
pathSuffix
will always a non-empty file/directory name):
GET /f/listing?path=temp/1570383373547/ HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Length: 1240
Content-Type: application/json;charset=UTF-8
{
"status" : "SUCCESS",
"result" : {
"statuses" : [ {
"type" : "DIRECTORY",
"pathSuffix" : "baz",
"permission" : "755",
"length" : 0,
"blockSize" : 0,
"accessTime" : 0,
"modificationTime" : 1570383374702,
"owner" : "user",
"group" : "user",
"replication" : 0
}, {
"type" : "FILE",
"pathSuffix" : "hello-and-then-goodbye.txt",
"permission" : "644",
"length" : 28,
"blockSize" : 67108864,
"accessTime" : 1570383375487,
"modificationTime" : 1570383375617,
"owner" : "user",
"group" : "user",
"replication" : 2
}, {
"type" : "FILE",
"pathSuffix" : "hello.txt",
"permission" : "644",
"length" : 12,
"blockSize" : 67108864,
"accessTime" : 1570383375683,
"modificationTime" : 1570383375703,
"owner" : "user",
"group" : "user",
"replication" : 3
}, {
"type" : "DIRECTORY",
"pathSuffix" : "sub1",
"permission" : "755",
"length" : 0,
"blockSize" : 0,
"accessTime" : 0,
"modificationTime" : 1570383375857,
"owner" : "user",
"group" : "user",
"replication" : 0
} ]
},
"error" : null
}
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/listing?path=temp/1570383373547/' -i -X GET
Get content summary
This is a GET
request to /f/file/summary
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a file or directory |
The response
The response is a summary of space/quota utilization for a given subtree (path) of the HDFS filesystem.
The overall response is as follows:
Path | Type | Description |
---|---|---|
|
|
An overall status ( |
|
|
An array of error messages, or |
|
|
The request-specific result (on success) |
The result
part in detail:
Path | Type | Description |
---|---|---|
|
|
An object representing a usage summary of a subtree |
|
|
The number of directories |
|
|
The number of regular files |
|
|
The number of bytes used by the content |
|
|
The quota on the number of entries under this directory |
|
|
The disk space consumed by the content |
|
|
The quota on the total disk space |
A request/response example
A typical request/response exhange is the following:
GET /f/file/summary?path=temp/1570383373547/ HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Content-Length: 233
{
"status" : "SUCCESS",
"result" : {
"summary" : {
"directoryCount" : 3,
"fileCount" : 3,
"length" : 95,
"quota" : -1,
"spaceConsumed" : 202,
"spaceQuota" : -1
}
},
"error" : null
}
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/summary?path=temp/1570383373547/' -i -X GET
Download a file
This is a GET
request to /f/file/content
.
Because content can be quite large, this method supports compressed responses; it examines the Accept-Encoding
request header and
outputs compressed content (if, of course, is requested so). Currently, the only supported compression algorithm is gzip
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a regular file |
|
The number of bytes to be returned |
|
|
The starting byte position |
The response
The response always comes as generic binary content (i.e. content of MIME type application/octet-stream
).
A request/response example
A typical request/response exhange is the following (file content not shown):
GET /f/file/content?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 200 OK
Content-Type: application/octet-stream
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello.txt' -i -X GET
In order to reduce the size of the transfer, you can use the --compressed
flag which will send the appropriate Accept-Encoding: gzip
request header and will let the server switch to output compression.
Create a directory
This is a PUT
request to /f/directory
.
This method will recursively create all parent directories (if needed so). If target directory already exists, it will do nothing and will report a success.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a directory (to be created) |
|
The octal permission for this directory (default is |
The response
On success, the server responds with 201 Created
, an empty body, and the Location
header set to the status URI of the newly created directory.
A request/response example
A typical request/response exhange is the following:
PUT /f/directory?path=temp/1570383373547/ HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/directory?path=temp/1570383373547/' -i -X PUT
Rename a file/directory
This is a PUT
request to /f/name
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing file or directory |
|
Yes |
The new path to move to |
The response
On success, the server responds with 201 Created
, an empty body, and the Location
header set to the status URI of the new path.
A request/response example
A typical request/response exhange is the following:
PUT /f/name?path=temp/1570383373547/foo&destination=temp/1570383373547/baz HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/baz
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/name?path=temp/1570383373547/foo&destination=temp/1570383373547/baz' -i -X PUT
Upload a file
This is a PUT
request to /f/file/content
.
Because content can be quite large, this method supports request body compression: it examines the Content-Encoding
request header and
acts accordingly (by de-compressing content on the server side). Currently, the only supported compression algorithm is gzip
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of a regular file (to be created or replaced) |
|
A flag that indicates if an existing file should be replaced (default is |
|
|
The replication factor |
|
|
The octal permission for this file (default is |
The response
On success, the server responds with 201 Created
, an empty body, and the Location
header set to the status URI of the newly created (or replaced) file.
A request/response example
A typical request/response exhange is the following:
PUT /f/file/content?path=temp/1570383373547/hello.txt&overwrite=true HTTP/1.1
Host: httpfsproxy.example.net:8443
Content-Type: application/octet-stream
Content-Length: 13
Hello Hadoop!
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello.txt
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello.txt&overwrite=true' -i -X PUT \
-H 'Content-Type: application/octet-stream' \
-d 'Hello Hadoop!'
For uploading real-world files, the recommended way is by using the --upload-file <FILE>
switch (instead of supplying a --data 'DATA'
switch).
To reduce the amount of transferred data, compress the request body: pipe the output of gzip
to curl
and add the Content-Encoding: gzip
request header. For example:
gzip -c ~/data/road-network.csv | \
curl -T "-" -H "Content-Encoding: gzip" -H "Content-Type: application/octet-stream" -X POST ...
Append to a file
This is a POST
request to /f/file/content
.
Note that a request header of Content-Type: application/octet-stream
must be present to let the server know that the source of our data (to be appended)
comes from the request body (instead of concatenating sources already present in the HDFS filesystem).
This method also supports request body compression (same as in the upload case).
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing regular file (to append content to) |
The response
On success, the server responds with 201 Created
, an empty body, and the Location
header set to the status URI of the modified file.
A request/response example
A typical request/response exhange is the following:
POST /f/file/content?path=temp/1570383373547/hello-and-then-goodbye.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
Content-Length: 16
Content-Type: application/octet-stream
You say goodbye!
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello-and-then-goodbye.txt
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello-and-then-goodbye.txt' -i -X POST \
-H 'Content-Type: application/octet-stream' \
-d 'You say goodbye!'
For appending large amounts of data, the recommended way is using --upload-file <FILE>
switch (instead of supplying a --data 'DATA'
switch).
Concatenate sources into a file
This is a POST
request to /f/file/content
.
Note that (unlike the append scenario) the request body must be absent (and, of course, any relevant header of Content-*
must also be absent).
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing regular file (the target file) |
|
Yes |
The list of comma-separated names of source files to be concatenated into the target file. These names must be plain file names and will be resolved relative to parent of the target file |
The response
On success, the server responds with 201 Created
, an empty body, and the Location
header set to the status URI of the target file.
A request/response example
Let hello-part.txt
and goodbye-part.txt
be the source files to be concatenated into an existing target file hello-goodbye.txt
. The source files are located in the same directory with the target file (this is a limitation from the underlying HttpFS
operation).
A typical request/response exhange is the following:
POST /f/file/content?path=temp/1570383373547/hello-goodbye.txt&sources=hello-part.txt,goodbye-part.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 201 Created
Location: https://httpfsproxy.example.net:8443/f/file/status?path=temp/1570383373547/hello-goodbye.txt
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello-goodbye.txt&sources=hello-part.txt,goodbye-part.txt' -i -X POST
Truncate a file
This is a DELETE
request to /f/file/content
.
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing regular file (to be truncated to zero length) |
The response
On success, the server responds with 204 No Content
and an empty body.
A request/response example
A typical request/response exhange is the following:
DELETE /f/file/content?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/content?path=temp/1570383373547/hello.txt' -i -X DELETE
Delete a file/directory
This is a DELETE
request to /f/file
The request parameters
It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing file or directory |
|
A flag that indicates if a directory should be deleted recursively (default is |
The response
On success, the server responds with 204 No Content
and an empty body.
A request/response example
A typical request/response exhange is the following:
DELETE /f/file?path=temp/1570383373547/hello.txt HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file?path=temp/1570383373547/hello.txt' -i -X DELETE
Set permission for a file/directory
This is a PUT
request to /f/file/permission
The request parameters
parameters It accepts the following request parameters:
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing file or directory |
|
Yes |
An octal permission to set |
The response
On success, the server responds with 204 No Content
and an empty body.
A request/response example
A typical request/response exhange is the following:
PUT /f/file/permission?path=temp/1570383373547/hello.txt&permission=640 HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/permission?path=temp/1570383373547/hello.txt&permission=640' -i -X PUT
Set replication factor for a file
This is a PUT
request to /f/file/replication
The request parameters
Parameter |
Required |
Description |
|
Yes |
A user-relative or absolute path of an existing file or directory |
|
Yes |
The replication factor (>= 2) to apply |
The response
On success, the server responds with 204 No Content
and an empty body.
A request/response example
A typical request/response exhange is the following:
PUT /f/file/replication?path=temp/1570383373547/hello.txt&replication=3 HTTP/1.1
Host: httpfsproxy.example.net:8443
HTTP/1.1 204 No Content
A request example with CURL
$ curl 'https://httpfsproxy.example.net:8443/f/file/replication?path=temp/1570383373547/hello.txt&replication=3' -i -X PUT