Deletion URLs on POST uploads #255

Closed
opened 2026-01-19 18:29:37 +00:00 by michael · 7 comments
Owner

Originally created by @JustAnotherArchivist on GitHub.

I realised that the postHandler doesn't return any deletion URLs (X-Url-Delete header, cf. #44).

This might be tricky to solve since POST uploads can contain multiple files, and obviously headers (i.e. deletion URLs) have to be sent before the body (i.e. download URLs). The only solution really would be to collect the download URLs while the files are being processed, returning the deletion URL headers immediately and the download URLs only at the end. But that would be problematic if a large number of files is uploaded at once (= huge list of download URLs).
Perhaps a compromise could be to only return deletion URLs up to a certain number of files to limit the download URL list size. So for example, once the list reaches 100 entries, output a final header signifying that the deletion URL list is truncated, then start outputting the download URLs (and don't use the list anymore from that point).

I'm also not entirely sure about how the deletion URLs should be returned.
First, I was thinking multiple X-Url-Delete header fields. That is permitted by the HTTP standard, but only if there is no semantic difference between multiple header fields and a single field with all values merged together as a comma-separated list (RFC 2616, section 4.2, last paragraph). That works fine with URLs if there is a space after the comma (cf. srcset), but as far as I can see, that's not required, and since URLs can contain commas, it wouldn't be possible to distinguish reliably between one URL containing a comma and two URLs separated by a comma. So that's probably not a good idea.
Now I think it would be better to use X-Url-Delete-0, X-Url-Delete-1, etc. It's not as pretty and slightly more complicated to parse for the client, but at least it is reliable.


Suggestion:

On uploading multiple files over POST, the server should return the following:

< X-Url-Delete-0: deletion-url-for-the-first-file
< X-Url-Delete-1: deletion-url-for-the-second-file
...
< X-Url-Delete-<n>: deletion-url-for-the-last-file
download-url-for-the-first-file
download-url-for-the-second-file
...
download-url-for-the-last-file

If there are too many files uploaded within one request:

< X-Url-Delete-0: deletion-url-for-the-first-file
< X-Url-Delete-1: deletion-url-for-the-second-file
...
< X-Url-Delete-<m>: deletion-url-for-the-mth-file
< X-Url-Delete-Truncated: true
download-url-for-the-first-file
download-url-for-the-second-file
...
download-url-for-the-mth-file
download-url-for-the-m+1th-file
...
download-url-for-the-last-file

Here, m would be some safe number that doesn't cause significant memory usage due to the download URL list. I think it can be quite large, but it has to be there to prevent a DoS vector (crashing the server by uploading many files and making it run out of memory). I don't know what a good number would be though. It should be configurable as well, ideally.

(If only a single file is uploaded over POST, the deletion header could be either X-Url-Delete or X-Url-Delete-0. The latter should be easier to implement.)

Originally created by @JustAnotherArchivist on GitHub. I realised that the `postHandler` doesn't return any deletion URLs (`X-Url-Delete` header, cf. #44). This might be tricky to solve since POST uploads can contain multiple files, and obviously headers (i.e. deletion URLs) have to be sent before the body (i.e. download URLs). The only solution really would be to collect the download URLs while the files are being processed, returning the deletion URL headers immediately and the download URLs only at the end. But that would be problematic if a large number of files is uploaded at once (= huge list of download URLs). Perhaps a compromise could be to only return deletion URLs up to a certain number of files to limit the download URL list size. So for example, once the list reaches 100 entries, output a final header signifying that the deletion URL list is truncated, then start outputting the download URLs (and don't use the list anymore from that point). I'm also not entirely sure about how the deletion URLs should be returned. First, I was thinking multiple X-Url-Delete header fields. That is permitted by the HTTP standard, but only if there is no semantic difference between multiple header fields and a single field with all values merged together as a comma-separated list ([RFC 2616, section 4.2](https://tools.ietf.org/html/rfc2616#section-4.2), last paragraph). That works fine with URLs *if* there is a space after the comma (cf. srcset), but as far as I can see, that's not required, and since URLs can contain commas, it wouldn't be possible to distinguish reliably between one URL containing a comma and two URLs separated by a comma. So that's probably not a good idea. Now I think it would be better to use `X-Url-Delete-0`, `X-Url-Delete-1`, etc. It's not as pretty and slightly more complicated to parse for the client, but at least it is reliable. --- Suggestion: On uploading multiple files over POST, the server should return the following: ``` < X-Url-Delete-0: deletion-url-for-the-first-file < X-Url-Delete-1: deletion-url-for-the-second-file ... < X-Url-Delete-<n>: deletion-url-for-the-last-file download-url-for-the-first-file download-url-for-the-second-file ... download-url-for-the-last-file ``` If there are too many files uploaded within one request: ``` < X-Url-Delete-0: deletion-url-for-the-first-file < X-Url-Delete-1: deletion-url-for-the-second-file ... < X-Url-Delete-<m>: deletion-url-for-the-mth-file < X-Url-Delete-Truncated: true download-url-for-the-first-file download-url-for-the-second-file ... download-url-for-the-mth-file download-url-for-the-m+1th-file ... download-url-for-the-last-file ``` Here, `m` would be some safe number that doesn't cause significant memory usage due to the download URL list. I think it can be quite large, but it has to be there to prevent a DoS vector (crashing the server by uploading many files and making it run out of memory). I don't know what a good number would be though. It should be configurable as well, ideally. (If only a single file is uploaded over POST, the deletion header could be either `X-Url-Delete` or `X-Url-Delete-0`. The latter should be easier to implement.)
Author
Owner

@paolafrancesca commented on GitHub:

@JustAnotherArchivist any feedback?

@paolafrancesca commented on GitHub: @JustAnotherArchivist any feedback?
Author
Owner

@paolafrancesca commented on GitHub:

@JustAnotherArchivist I really never user postHandler and I don't know if adding the X-Url-Delete header to it would be a feature that people care for.

is this actually something you need or more a generic issue?

I have also some concern implementing it truncating the list of the headers returned to prevent being a DoS vector: I mean it should be done that way, in case, but then I'd rather prefer to miss the feature at all rather that having it defective by design

my stance at the moment is that if you need a delete url header, then use putHandler

@paolafrancesca commented on GitHub: @JustAnotherArchivist I really never user `postHandler` and I don't know if adding the `X-Url-Delete` header to it would be a feature that people care for. is this actually something you need or more a generic issue? I have also some concern implementing it truncating the list of the headers returned to prevent being a DoS vector: I mean it should be done that way, in case, but then I'd rather prefer to miss the feature at all rather that having it defective by design my stance at the moment is that if you need a delete url header, then use `putHandler`
Author
Owner

@JustAnotherArchivist commented on GitHub:

Sorry for the delay.

My main problem is uploading over high-latency links; so far, I use PUT because I do need the deletion URLs, but I create a new connection for every upload, which is very slow due to the high latency.

The nice thing about postHandler is that, unlike putHandler, it can be used to upload multiple files at once. However, if the server also supports HTTP pipelining (haven't tested that yet), I suppose multiple PUT requests over the same connection would work as well, albeit with slightly larger header overhead and more complex uploading code.

@JustAnotherArchivist commented on GitHub: Sorry for the delay. My main problem is uploading over high-latency links; so far, I use `PUT` because I do need the deletion URLs, but I create a new connection for every upload, which is very slow due to the high latency. The nice thing about `postHandler` is that, unlike `putHandler`, it can be used to upload multiple files at once. However, if the server also supports HTTP pipelining (haven't tested that yet), I suppose multiple `PUT` requests over the same connection would work as well, albeit with slightly larger header overhead and more complex uploading code.
Author
Owner

@paolafrancesca commented on GitHub:

@JustAnotherArchivist any feedback? otherwise I'm going to close the issue next week

@paolafrancesca commented on GitHub: @JustAnotherArchivist any feedback? otherwise I'm going to close the issue next week
Author
Owner

@paolafrancesca commented on GitHub:

I'd rather add HTTP pipelining to PUT, please, can you make some tests on your side for this?

@paolafrancesca commented on GitHub: I'd rather add HTTP pipelining to `PUT`, please, can you make some tests on your side for this?
Author
Owner

@JustAnotherArchivist commented on GitHub:

I finally got around to testing this. In our setup, transfer.sh is behind nginx. At least in this case, multiple PUT requests over one connection work fine. So in that sense, consider my issue solved.

However, I did not test the plain transfer.sh server; I don't know how nginx plays into this exactly. And I also haven't tested actual pipelining, i.e. sending the second request before reading the response to the first one, since Python's http.client doesn't let you do that (raises http.client.CannotSendRequest: Request-sent).


In case someone wants to test further, here's the little test script I used:

import http.client

con = http.client.HTTPSConnection('example.org')

con.request('PUT', '/test-a', body = b'foo')
result = con.getresponse()
result.read()
print('test-a:', result.reason, result.getheaders())

con.request('PUT', '/test-b', body = b'bar')
result = con.getresponse()
result.read()
print('test-b:', result.reason, result.getheaders())
@JustAnotherArchivist commented on GitHub: I finally got around to testing this. In our setup, transfer.sh is behind nginx. At least in this case, multiple `PUT` requests over one connection work fine. So in that sense, consider my issue solved. However, I did not test the plain transfer.sh server; I don't know how nginx plays into this exactly. And I also haven't tested actual pipelining, i.e. sending the second request before reading the response to the first one, since Python's `http.client` doesn't let you do that (raises `http.client.CannotSendRequest: Request-sent`). --- In case someone wants to test further, here's the little test script I used: ```python import http.client con = http.client.HTTPSConnection('example.org') con.request('PUT', '/test-a', body = b'foo') result = con.getresponse() result.read() print('test-a:', result.reason, result.getheaders()) con.request('PUT', '/test-b', body = b'bar') result = con.getresponse() result.read() print('test-b:', result.reason, result.getheaders()) ```
Author
Owner

@JustAnotherArchivist commented on GitHub:

For future reference, this has since been implemented via #419 and #435.

@JustAnotherArchivist commented on GitHub: For future reference, this has since been implemented via #419 and #435.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dutchcoders/transfer.sh#255