Race condition crash in CheckMetadata locking: fatal error: concurrent map read and map write #170

Closed
opened 2026-01-19 18:29:16 +00:00 by michael · 1 comment
Owner

Originally created by @JustAnotherArchivist on GitHub.

Our transfer.sh server crashed today with the following traceback (only the relevant lines, paths redacted):

fatal error: concurrent map read and map write

goroutine 130672 [running]:
runtime.throw(0x104a276, 0x21)
        .../go1.15.11/src/runtime/panic.go:1116 +0x72 fp=0xc0007150a0 sp=0xc000715070 pc=0x4391b2
runtime.mapaccess2_faststr(0xed3c20, 0xc000449d40, 0xc000b148c0, 0x1d, 0x1d, 0xc000a8602a)
        .../go1.15.11/src/runtime/map_faststr.go:116 +0x4a5 fp=0xc000715110 sp=0xc0007150a0 pc=0x415fc5
github.com/dutchcoders/transfer.sh/server.(*Server).Lock(0xc000001680, 0xc000f2cab5, 0x5, 0xc000f2cabb, 0x17, 0x311d, 0x7231)
        .../transfer.sh/server/handlers.go:631 +0xc5 fp=0xc000715188 sp=0xc000715110 pc=0xda2665
github.com/dutchcoders/transfer.sh/server.(*Server).CheckMetadata(0xc000001680, 0xc000f2cab5, 0x5, 0xc000f2cabb, 0x17, 0xc000f2cb01, 0x0, 0x0, 0x0, 0x0, ...)
        .../transfer.sh/server/handlers.go:648 +0xd6 fp=0xc000715308 sp=0xc000715188 pc=0xda2936
github.com/dutchcoders/transfer.sh/server.(*Server).getHandler(0xc000001680, 0x11b5ca0, 0xc0009be720, 0xc000b0cb00)
        .../transfer.sh/server/handlers.go:991 +0x19c fp=0xc0007156d0 sp=0xc000715308 pc=0xda7efc
[...]

goroutine 130540 [runnable]:
github.com/dutchcoders/transfer.sh/server.(*Server).Lock(0xc000001680, 0xc000b263c5, 0x5, 0xc000b263cb, 0x17, 0x4edd97, 0xc000a96410)
        .../transfer.sh/server/handlers.go:632 +0x187
github.com/dutchcoders/transfer.sh/server.(*Server).CheckMetadata(0xc000001680, 0xc000b263c5, 0x5, 0xc000b263cb, 0x17, 0xc000b26401, 0x0, 0x0, 0x0, 0x0, ...)
        .../transfer.sh/server/handlers.go:648 +0xd6
github.com/dutchcoders/transfer.sh/server.(*Server).getHandler(0xc000001680, 0x11b5ca0, 0xc0001ca2d0, 0xc000580900)
        .../transfer.sh/server/handlers.go:991 +0x19c
[...]

The line numbers differ from the code in this repository as I'm running a modified code, but I'm all but certain that these modifications are unrelated to the crash. From the bottom, the corresponding lines in the current master code are:

1594f95abf/server/handlers.go (L986)

1594f95abf/server/handlers.go (L660)

1594f95abf/server/handlers.go (L643-L644)

Note that the traceback is the same except for the one line difference in Lock, with one goroutine trying to create a mutex under s.locks[key] (line 632 in my code, 644 in master) and the other trying to access it (line 631 in my code, 643 in master).

This happened on a GET shortly after the upload of a new file. The upload was successful, and the URL was automatically shared and then accessed immediately around a dozen times in parallel and very quick succession (5 accesses within 100 ms, then another 5 in the following 200 ms plus a few more after, according to the Caddy logs). However, these kind of requests happen all the time on our instance and has never led to crashes before. transfer crashed before returning a response for any of the requests as far as I can tell.

I tried to reproduce this by running parallel requests against transfer (with a tool built in Python around aiohttp) but did not manage to trigger it that way. Perhaps the requests weren't quite quick enough. I haven't had time to set up and try other methods.

My interpretation is that this is a race condition in the locking code: two requests were trying to work with the metadata lock at the same time, one trying to create the mutex under s.locks[key], the other trying to access that key, but because s.locks is a simple map, such simultaneous actions aren't possible.

Originally created by @JustAnotherArchivist on GitHub. Our transfer.sh server crashed today with the following traceback (only the relevant lines, paths redacted): ``` fatal error: concurrent map read and map write goroutine 130672 [running]: runtime.throw(0x104a276, 0x21) .../go1.15.11/src/runtime/panic.go:1116 +0x72 fp=0xc0007150a0 sp=0xc000715070 pc=0x4391b2 runtime.mapaccess2_faststr(0xed3c20, 0xc000449d40, 0xc000b148c0, 0x1d, 0x1d, 0xc000a8602a) .../go1.15.11/src/runtime/map_faststr.go:116 +0x4a5 fp=0xc000715110 sp=0xc0007150a0 pc=0x415fc5 github.com/dutchcoders/transfer.sh/server.(*Server).Lock(0xc000001680, 0xc000f2cab5, 0x5, 0xc000f2cabb, 0x17, 0x311d, 0x7231) .../transfer.sh/server/handlers.go:631 +0xc5 fp=0xc000715188 sp=0xc000715110 pc=0xda2665 github.com/dutchcoders/transfer.sh/server.(*Server).CheckMetadata(0xc000001680, 0xc000f2cab5, 0x5, 0xc000f2cabb, 0x17, 0xc000f2cb01, 0x0, 0x0, 0x0, 0x0, ...) .../transfer.sh/server/handlers.go:648 +0xd6 fp=0xc000715308 sp=0xc000715188 pc=0xda2936 github.com/dutchcoders/transfer.sh/server.(*Server).getHandler(0xc000001680, 0x11b5ca0, 0xc0009be720, 0xc000b0cb00) .../transfer.sh/server/handlers.go:991 +0x19c fp=0xc0007156d0 sp=0xc000715308 pc=0xda7efc [...] goroutine 130540 [runnable]: github.com/dutchcoders/transfer.sh/server.(*Server).Lock(0xc000001680, 0xc000b263c5, 0x5, 0xc000b263cb, 0x17, 0x4edd97, 0xc000a96410) .../transfer.sh/server/handlers.go:632 +0x187 github.com/dutchcoders/transfer.sh/server.(*Server).CheckMetadata(0xc000001680, 0xc000b263c5, 0x5, 0xc000b263cb, 0x17, 0xc000b26401, 0x0, 0x0, 0x0, 0x0, ...) .../transfer.sh/server/handlers.go:648 +0xd6 github.com/dutchcoders/transfer.sh/server.(*Server).getHandler(0xc000001680, 0x11b5ca0, 0xc0001ca2d0, 0xc000580900) .../transfer.sh/server/handlers.go:991 +0x19c [...] ``` The line numbers differ from the code in this repository as I'm running a modified code, but I'm all but certain that these modifications are unrelated to the crash. From the bottom, the corresponding lines in the current master code are: https://github.com/dutchcoders/transfer.sh/blob/1594f95abf11fb9bb8e6e4581f6cc7e202370c1b/server/handlers.go#L986 https://github.com/dutchcoders/transfer.sh/blob/1594f95abf11fb9bb8e6e4581f6cc7e202370c1b/server/handlers.go#L660 https://github.com/dutchcoders/transfer.sh/blob/1594f95abf11fb9bb8e6e4581f6cc7e202370c1b/server/handlers.go#L643-L644 Note that the traceback is the same except for the one line difference in `Lock`, with one goroutine trying to create a mutex under `s.locks[key]` (line 632 in my code, 644 in master) and the other trying to access it (line 631 in my code, 643 in master). This happened on a `GET` shortly after the upload of a new file. The upload was successful, and the URL was automatically shared and then accessed immediately around a dozen times in parallel and very quick succession (5 accesses within 100 ms, then another 5 in the following 200 ms plus a few more after, according to the Caddy logs). However, these kind of requests happen all the time on our instance and has never led to crashes before. transfer crashed before returning a response for any of the requests as far as I can tell. I tried to reproduce this by running parallel requests against transfer (with a tool built in Python around aiohttp) but did not manage to trigger it that way. Perhaps the requests weren't quite quick enough. I haven't had time to set up and try other methods. My interpretation is that this is a race condition in the locking code: two requests were trying to work with the metadata lock at the same time, one trying to create the mutex under `s.locks[key]`, the other trying to access that key, but because `s.locks` is a simple map, such simultaneous actions aren't possible.
Author
Owner

@paolafrancesca commented on GitHub:

@JustAnotherArchivist makes sense, I will try to check in the weekend, probably need to change s.locks with https://golang.org/pkg/sync/#Map

@paolafrancesca commented on GitHub: @JustAnotherArchivist makes sense, I will try to check in the weekend, probably need to change `s.locks` with https://golang.org/pkg/sync/#Map
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dutchcoders/transfer.sh#170