I've added locust.io config file to the #damn server and performed "extreme" load test. (i.e. I've used "extreme" load times of https://git.sr.ht/~qeef/damn-server/tree/master/item/tests/mapathon.py)
Outcomes are: (1) python server just waits for the database (and it should be like that), (2) about 1 GB RAM is sufficient, (3) need to work on SELECT queries.
Query type: Average / Percentile(99)
INSERT: 2 ms / 634 ms
SELECT: 32 ms / 57 s 834 ms
Next is "average" load test run overnight.
Overnight test failed (too many 404: no square to map -- that's not an error, but I don't want to test it.) So, I run the average test (each mapper map for 30 - 60 s, then wait for 30 - 60 s before next mapping) again for 3 hours.
For the database queries:
Max duration / Latency Percentile(99)
1 s 190 ms / 32 s 502 ms
From the locust.io testing: 31582 POST requests (the real work) with 99%ile of 420 ms!
2 uvicorn workers, 4 GB RAM, Intel i5-2520M @ 2.5 GHz. Locust.io runs on the same machine.
I run the locust.io load testing for
with 100 mappers (all spawned in the first minute,) and "average" load scenario: 30 - 60 s mapping, 30 - 60 s waiting (https://git.sr.ht/~qeef/damn-server/tree/master/item/tests/mapathon.py).
After 3 hours, the statistics of the areas were (to map, to review, done:) 0%/31%/69%, respective 0%/55%/45%.
I had to stop the load test of
with 200 mappers after approximately 20 minutes. (The test performed since 17:00 till 17:30.) The statistics of the areas when stopped were (to map, to review, done:) 9%/66%/25%, respective 37%/47%/16%.
The reason to stop the test was increase in the resource utilization. Unfortunately, I have no database logs this time.
The results of load testing for the server of https://www.damn-project.org/ with 4 areas and 200 mappers (all spawned in the first minute,) and "average" load scenario: 30 - 60 s mapping, 30 - 60 s waiting (https://git.sr.ht/~qeef/damn-server/tree/master/item/tests/mapathon.py) are attached.
Run for 1.5 hour approximately -- after that time there were 0 % to map for all of the areas so the results could become biased.
Server: 1x 2.2 GHz vCPU, 1 GB RAM, 25 GB SSD.
I think that 200 mappers (30 s mapping, 30 s waiting) with 4 areas is the top wall for load testing. I've tested 300 mappers with 6 areas for about 10 minutes and I don't feel confident about it.
First round of endurance load testing. Started at 12:00, 0% to map at 15:00, non-recoverable CPU load after 18:00.
My guess is that it's upkeep scripts.
Also, I think that I should speed up waiting of locust average mappers from 30 - 60 seconds to 2 - 5 seconds. I will let the mapping time of 30 - 60 seconds, though.
So the results looks more like the "extreme" results when changing times towards the "extreme" load times (https://git.sr.ht/~qeef/damn-server/tree/master/item/tests/mapathon.py) I will keep "extreme" and "average"clearly distinguished.
What also influence the results is locust _spawn rate_ (mappers spawned/second) I will keep it such that all the mappers are spawned within the first minute.
Also, actual #postgis database parameters are https://git.sr.ht/~qeef/damn-deploy/commit/5b5bf8cc7c819b1bb76f5a37d07e32ceac606616
I did another round of "average" load testing for 200 mappers and 2 areas to maintain the compatibility with https://en.osm.town/web/statuses/106189249266359926 and https://en.osm.town/web/statuses/106189292966532711 (the same number of mappers, the same number of areas.)
The interesting time is at start 11:40 -- spawning of mappers. Next, around 12:13 to 12:18, where mapping is finished (i.e. 0 % to map). Finally, 12:45 -- upkeep script runs while server is returning 404: no square to map. (I checked it with `journalctl -u damn_upkeep.service -f`.)
I've summarized the main points of the thread in the #OpenStreetMap diary https://www.openstreetmap.org/user/qeef/diary/396811
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!