#193 Split repository metadata per architecture
Closed: Completed 4 months ago by lkocman. Opened 5 months ago by dirkmueller.

Currently https://download.opensuse.org/distribution/leap/16.0/repo/oss/ , similar to how Leap 15.x worked, provides repository metadata carrying all four primary architectures (aarch64, x86_64, ppc64le, s390x). This has a disadvantage on the size and the complexity of the repository metadata. just parsing the primary.xml today takes excessive time:

xmllint --timing --stream 0ad0e264fcbb98937b6b8d6e6bb8230d3d370fb444c6bd45bc43af5389d8c5ae0c22fec9862aaac8af5e978de8f51a53ad981ced57b212184988d29efd78e8a3-primary.xml
Parsing took 1396 ms

this is the timing on a relatively modern machine. it can easily be 5s and more on less powerful hardware. if it would be only singlearch, this timing would be more in the range of 350ms, which is a significant win. Of course there is some possibility to optimize libxml2, it is rather unlikely to make a significant difference other than a few percentages here and there because it's a project that has a very wide and standards compliant focus, so parsing speed is secondary.


So basically move to a Fedora/RHEL style repository structure where you have independent repositories for each architecture?

Honestly, that would be great. It would also make the Mock configs for SUSE distributions so much less complicated.

Metadata Update from @Pharaoh_Atem:
- Issue set to the milestone: 16.0

5 months ago

Hello Dirk, this is not so easy at the moment.
We'd have to build Leap in a similar way as Tumblweed (in separate projects).

The conflict here is to decide between storage needs (esp. on mirrors) versus primary.xml size.

For example we have currently for Backports SLE-15-SP7 x86_64

23 GB noarch
15 GB x86_64

when we split the archs and push to different places at different times it is causing to duplicate noarch rpms for every architecture there by default. src.rpms most likely need also to be duplicated if we want to match exact rpm binaries at all times.

An alternative solution would be to enhance rpmmd.xml and at least createrepo_c and zypp tooling. We could have primary_$arch files so that only the matching architecture would be downloaded (we could also switch to json there to speed up parsing time). The full primary xml could stay to support clients not being able to handle the architecture ones.

(since such an implementation is backward compatible it could even be rolled out for existing SLE 15 repos)

The conflict here is to decide between storage needs (esp. on mirrors) versus primary.xml size.

For example we have currently for Backports SLE-15-SP7 x86_64

23 GB noarch
15 GB x86_64

when we split the archs and push to different places at different times it is causing to duplicate noarch rpms for every architecture there by default. src.rpms most likely need also to be duplicated if we want to match exact rpm binaries at all times.

The way this is typically solved is by hard-linking everything. And rsync can ensure that hard-link is carried over when mirroring.

An alternative solution would be to enhance rpmmd.xml and at least createrepo_c and zypp tooling. We could have primary_$arch files so that only the matching architecture would be downloaded (we could also switch to json there to speed up parsing time). The full primary xml could stay to support clients not being able to handle the architecture ones.

Please don't do this.

Also SRPMs can be its own "arch" rather than duplicated everywhere.

An alternative solution would be to enhance rpmmd.xml and at least createrepo_c and zypp tooling. We could have primary_$arch files so that only the matching architecture would be downloaded (we could also switch to json there to speed up parsing time). The full primary xml could stay to support clients not being able to handle the architecture ones.

Doable, but clear disadvantage is that tools using local metadata after a zypper ref will break. Suma's repo-sync e.g. They expect to see all the packages the repo offers.

well, not adapted tools would still see the original primary.xml containing everything. Only adapted tools would get the speedup though.

However, I implemented now a mechanic to generate multiple repodata directories during a build. Arch specific repodata are in the architecture subdirectories (eg. x86_64/repodata/... contains only x86_64 and noarch references). Disadvantage is that no unique URL is working anymore and likely more tools which modify the repodata need to get adapted.

@Pharaoh_Atem: mirrors reported that they loose the hardlinks due to independend rsync runs for each directory. And therefore drop all non-x86_64 archs.

Do we not provide guidance on rsync and how to sync rsync modules? There are definitely ways to avoid this problem with the right flags. We're the only RPM distribution that doesn't split things up like this, so if other distributions aren't seeing this problem en masse, then I don't think it's that significant of a problem.

@lkocman Will you update the openSUSE services to use the ../${basearch} URLs then? This would be IMO smarter than letting zypper use the splited primaries. Plugins downloading the repos changes and filelists would benfit being directed to the smaller arch-specific versions.

This is the plan Michael, however we have to have a working Leap 16.0 build first. The recent builds are still red because of libzypp / https://bugzilla.suse.com/show_bug.cgi?id=1237172

The new setup is following

The ftp-trees are build as part of https://build.opensuse.org/package/show/openSUSE:Leap:16.0/000productcompose

while offline media is built separate in the 000productcompose.all as it utilizes the all feature.
https://build.opensuse.org/package/show/openSUSE:Leap:16.0/000productcompose.all

isn't it the opposite way around?

000productcompose.all is a single build building the ftp tree (aka online rpmmd repository).

000productcompose is building the iso files for offline installation via multibuild flavors.

btw, it may make sense to rename 000productcompose.all , it was just my first shot. But the name is maybe not helpful and has in the end no influence in the build.

Closing.

https://download.opensuse.org/distribution/leap/16.0/repo/oss/repodata/
The legacy repodata will continue existing aside from

https://download.opensuse.org/distribution/leap/16.0/repo/oss/x86_64/repodata/ or generally repo/$basearch/repodata

This will help to ease migration. Users are adviced to use $basearch/repodata.

Metadata Update from @lkocman:
- Issue close_status updated to: Completed
- Issue status updated to: Closed (was: Open)

4 months ago

Log in to comment on this ticket.

Metadata