Purpose of SSH nodes in a cluster

Hi!

After reading the documentation on Clustering and, especially on SSH hosts, I don’t really understand what’s the purpose of the latter. The article only mentions that:

SSH servers accept SSH requests from commands like git clone and relay them to hosts that can serve the requests.

I don’t think that simply forwarding requests to other hosts is particularly efficient. I mean, for that to work, those other hosts must also accept SSH requests, so why not make the clients directly connect to those other hosts?

Basically, I’m just wondering if the documentation is omitting some important logic that happens behind those SSH nodes, so I guess my questions are: are SSH hosts here only for security purposes? Or do they have an actual impact on the cluster performance? If they do, what kind of logic do they follow?
Also, by any chance, is there a more detailed documentation of all the cluster components than the one I have linked above?

Thanks! :slight_smile:

…why not make the clients directly connect to those other hosts?

Suppose I’ve set up clustering with four hosts: A, B, C, and D. For whatever reason, I put repository X on hosts A and B, and repository Y on hosts C and D.

Now, I run git clone ssh://example.mycompany.com/src/repository-x.git, requesting repository X.

Where would you expect the logic that makes this request connect directly to host A or host B (but not host C or host D) should live?

Oh yeah I see, that makes more sense. From the documentation, it’s not very clear that SSH hosts are responsible of that part: “relay them to hosts that can serve the requests” lacks some details, so it would be great if it was describing a bit more that logic. :smile:

Thanks a lot!