[clc-devel] Rsync for Maintainers
I am just curious, what was the reason again for not using rsync for port distribution? Just an idea, why can't we make people who submit ports have an rsync account: ---------------- rsyncd.cruxusers.conf ----------------- victord:pass1 cptn:pass2 ---------------- rsyncd.cruxusers.conf ----------------- ---------------- rsyncd.conf ----------------- # # /etc/rsyncd.conf # uid = nobody gid = nobody # use chroot = no max connections = 4 pid file = /var/run/rsyncd.pid log file = /var/log/rsyncd.log [victord] path = /data/PORTS/victord use chroot = true secrets file = /etc/rsyncd.cruxusers.secrets auth users = victord read only = false [cptn] path = /data/PORTS/cptn use chroot = true secrets file = /etc/rsyncd.cruxusers.secrets auth users = cptn read only = false # End of file ---------------- rsyncd.conf ----------------- (I checked, they don't seem to support %u to replace username) Then people who want their repos to be public, can ask for the account, and a simple script where user:pass is specified can generate these config files. Then it's up to them to update their trees. A gui (I have a php one that can handle this) can then list/search/etc the ports. Wouldn't this be an easy solutoion? It centralizes all the HTTPUP repos with minimal work. Am I missing something? We can even make httpup take -user -pass -rsync server, args so they can distribute them without learning rsync commands. And httpup can just sync their repo to central location with some default settings set for calling rsync. Victor
Hi, On Fri, Aug 27, 2004 at 01:23:50 -0400, Victor wrote:
I am just curious, what was the reason again for not using rsync for port distribution?
Just an idea, why can't we make people who submit ports have an rsync account: [...] Then people who want their repos to be public, can ask for the account, and a simple script where user:pass is specified can generate these config files. Then it's up to them to update their trees. A gui (I have a php one that can handle this) can then list/search/etc the ports.
Wouldn't this be an easy solutoion? It centralizes all the HTTPUP repos with minimal work. I don't really see any big advantage over the other proposals (httpup mirror collection, people collection). I'm somewhat missing a comparison, so I'll run my own:
Advantages: - Changes are transmitted using diffs (httpup: whole files) Disadvantages: - Requires running a service (rsyncd) - Configurations looks rather complicated - If a repo maintainer decides to just not update his repo anymore, we have to detect that he's not accessing our repo anymore; this is a bit harder in a "push" model than it is in a "pull" model (as the httpup ideas implement). - To merge those ports into one collection, you'll need another script (I don't think rsync does this); therefore, you'll end up with more components to maintain (rsync, script), which run independently, which means that you have to manually ensure that there are no "commits" to the collection while merging it (I know this is simple to do, but it will cause some disk load). Furthermore, one obvious difference is the distributed vs. centralized approach, which has the following properties: - Dist: people who want to keep their repository have to sync to two places (their webspace and our rsync space); Central: those that don't want a repository don't have to create one. - Dist: People interested in maintaining ports can start at once and will have their repo synced eventually. Central: People are dependent to get an account to get started I know I'm biased so I'm probably missing something, but I know I'd be willing to maintain the system pointed out in the 'people' collection on a private server without concerns regarding security and effort required; in my case, this doesn't hold true for the rsync idea. Whether distributed or centralized development is better is really a matter of taste. I think that contrib should be central, and 'people' distributed; and we should try to get more talented packagers to join CLC. Kind regards, Johannes -- Johannes Winkelmann mailto:jw@tks6.net Bern, Switzerland http://jw.tks6.net
Johannes Winkelmann wrote:
Hi,
[snip]
Advantages: - Changes are transmitted using diffs (httpup: whole files)
Well, also clients don't have to run any servers at all. There is a central location for all the ports (fewer downtime with people rebooting their boxes, not everyone has dedicated servers)
Disadvantages: - Requires running a service (rsyncd)
Yeah, but on the server
- Configurations looks rather complicated
It's not really :)
- If a repo maintainer decides to just not update his repo anymore, we have to detect that he's not accessing our repo anymore; this is a bit harder in a "push" model than it is in a "pull" model (as the httpup ideas implement).
Yeah, that's an issue... How is it solved now?
- To merge those ports into one collection, you'll need another script (I don't think rsync does this); therefore, you'll end up with more components to maintain (rsync, script), which run independently, which means that you have to manually ensure that there are no "commits" to the collection while merging it (I know this is simple to do, but it will cause some disk load).
Actually, I didn't think we would merge them into one collection. -Victor
Johannes Winkelmann wrote:
Hi,
[snip]
Advantages: - Changes are transmitted using diffs (httpup: whole files)
Well, also clients don't have to run any servers at all. I mentioned that in the difference section; this is both advantage and disadvantage; if you want an httpup repo, this solution would duplicate your work; furthermore and you can't develop independently. I'm not saying that distributed development is better, but I'm not convinced
Hi, On Wed, Sep 01, 2004 at 12:07:34 -0400, Victor wrote: that centralized development is. Note that if you join CLC, you don't need a repo :-)
There is a central location for all the ports (fewer downtime with people rebooting their boxes, not everyone has dedicated servers) Okay, I guess I wasn't too clear. Both suggestions I made would be hosted on a central server as well. Central access, but distributed (= independent) development.
- If a repo maintainer decides to just not update his repo anymore, we have to detect that he's not accessing our repo anymore; this is a bit harder in a "push" model than it is in a "pull" model (as the httpup ideas implement).
Yeah, that's an issue... How is it solved now?
Illustration of the 'people collection' idea: +--------------------------------- | central server | | +--------------+ sync +----------------------+ + httpup repos + -------------> + separate collections + +--------------+ n 1 +----------------------+ ^ | | possible to subscribe directly | - merge into one collection | to prefer a repo over base/opt/ | - check duplicates contrib, or if you don't want | - rm base/opt/contrib dups | everything from people | - whatever checks you want | | - notifications | | V +------+ fetches +----------------------------+ + User + <-------------- + httpup collection 'people' + <-> Mirrors +------+ n 1 +----------------------------+ | | | +-------------------------------------- Note that this even allows maintainers of private repos to have duplicates over base/opt/contrib (my former mail said that such duplicates would cause a notification, coupled with the request to remove it); they just won't be propagated to the 'people' collection. Users can still get those by subscribing to the repo directly, and putting it higher in prt-get.conf than /usr/ports/people. At the same time, people who have dups over ports from base/opt/contrib don't have to maintain two httpup repos. I guess there is a need for a better explanation; I'll try to put it together sometime soon. the sync script updates the collections on the central server hourly (or whatever interval we'll choose). If there's a 404, we'll find out instantly. If we want to introduce checking for changes, this can be implemented in this update script.
- To merge those ports into one collection, you'll need another script (I don't think rsync does this); therefore, you'll end up with more components to maintain (rsync, script), which run independently, which means that you have to manually ensure that there are no "commits" to the collection while merging it (I know this is simple to do, but it will cause some disk load).
Actually, I didn't think we would merge them into one collection. Okay, but you need to check for duplicates, right? Unless you do that, it'll always be a number of independent repos, thus the quality won't improve which is one of my main goals.
Regards, Johannes -- Johannes Winkelmann mailto:jw@tks6.net Bern, Switzerland http://jw.tks6.net
participants (2)
-
Johannes Winkelmann
-
Victor