Fedora 16 upgrade |
2011-11-13
|
A new Fedora, a new decision on which machines to upgrade. Usually I try to stagger the three machines I use most - my work desktop, my home desktop, and my laptop. I had updated work machine and laptop to F-15 when it came out, and kept my home desktop at F-14.
I actually have two or three root partitions on each of those machines, and I typically do a fresh install on a separate root, so I can try things, poke around, and make sure everything I will need works. When I do the install, I don't mount my /home partition, because I don't want to have the new version upgrade things for me on my user config.
I have a pretty long checklist by now that I go through on each install/upgrade, installing the packages I use a lot, setting up specific configuration, copy over ssh keys, ...
I actually liked F-15 a lot, and though GNOME 3 has its issues (which I still want to document in a separate post), I overall enjoyed the experience. At home, I noticed myself using the windows key or moving my mouse to the top left corner expecting something to happen.
That is how you know you really are ready for GNOME3.
So I thought, what the heck, let's get to upgrading all of them. I started with my laptop, as usual. That mostly went fine, except for hurdle number one. My laptop actually has /home encrypted. And I did not add it to my custom layout in anaconda. So, the system dropped me in a rescue shell after booting. It took me quite a while to figure out that I had to copy over /etc/crypttab from the old system. After that, things worked again.
Arguably, hurdle #1 may not be Fedora's fault. Maybe normal users don't encrypt home drives, or use custom partitioning like I do (although on a few fedora upgrades this saved my bacon when it turned out certain things I needed didn't work in the new Fedora, like VMWare)
And yes, GNOME 3.2 is a slight improvement. Enough to make a difference at least. All the usual applications seem to work, so I can now mount my old /home directory.
That's when I ran into hurdle number 2: the default uid/gid numbering change. My thomas user now was 1000:1000 as opposed to 500:500 on all my machines before Fedora 16.
In this day and age, I still have to shell it up to fix things like that:
find / -uid 500 -exec chown 1000 {} \;
find / -gid 500 -exec chown :1000 {} \;
If I had less shame I'd tell you how embarassing it is if you do this for a few users on your system, and start thinking "let's put this in a for loop", and because it's already 1 AM you start doing things like
for a in 0 1 3; do find / -uid 50$a -exec chown 10$a {} \; ; find / -gid 50$a -exec chown 100$a {} \; done
Note how I got the number of 0's wrong in the first find, and how I actually forgot the : in the second. You can imagine how amusing it is to fix the effect of those commands.
But I'm a shameful person so I won't tell you about this bit. Instead, suffice it to say that this took a long time.
Ok, so now /home is mounted on the laptop, and for the most part things worked fine.
On comes the weekend, so I turn to the home machine. I tend to keep the work machine for last, because I don't want to spend work time on fixing distro problems. And I usually take a whole weekend to upgrade at home. The home machine turned out to be more of a problem. I ran headlong into hurdle number three. You see, there is this new thing called GPT for your partition table, and it is now the default, and it means that fdisk will no longer work, and now you should use gdisk (which sadly is not installed on the rescue bit of the install DVD, boo!), and this is all so we can have grub2, which is supposed to be better or something.
I'm sure one day I will be thankful. But on my home machine, I didn't know any of this, and just had anaconda tell me something about the boot image being too large and there was no space for it and my system may not boot. (I am not sure why I did not run into this problem on my laptop - presumably, looking at the disk layout now, because I kept the original install, which includes Windows, and just shrunk that and added linux - so it's probably the windows thing doing the booting). And sure enough, the Fedora 16 install did not boot. It dropped me into my friend, the shell.
So here's the thing. This new way of doing things needs more space than your average MBR, so you actually need to create a primary partition for this, and it needs to be in the first 2 TiB. So you know what time it is now. It's resize-o-clock time - I get to learn the joys and mysteries of shrinking ext4-on-software-raid so I can make space for this new partition, which doesn't need to be big, apparently 5 MB is more than enough. Aren't I happy now that I stubbornly stuck to having a /boot partition as the primary one on my machines, so I can just shrink that a little?
So shrinking an ext partition I already had down pat. I learnt about shrinking software raid partitions, and again I got into the land of not understanding which of the many types of numbers (sectors ? blocks ? bytes ? cilinders ? Mebi vs Mega ?) are understood the same way by the tools, or not understanding how much of those numbers you need to count extra because of the layer of indirection being added (encryption on logical volume on LVM on software RAID anyone ?). So to be safe I end up shrinking 10% on each layer of the onion as I go deeper - then let the tools handle growing to the maximum space again, since that's the one thing they're usually decent at.
But you know, if I've done all this, I want to get it right. I don't want a stinking BIOS boot partition sitting after my /boot partitions. That's not how F16 sets it up by default. But I have never actually moved a partition. So, download gparted, look at it, figure out how it can let me do that, make sure I ask it to count by cylinders so it doesn't leave gaps, be puzzled at why it doesn't let me fractions for MiB sizes of partitions, and work around it in some other way. And so I finally have those two software raid /boot-wearing partitions where I want them - sitting right behind this new BIOS partition.
I create a new partition in fdisk (which is what I'm used to), but I can't actually set the partition type to EF02, which has four characters where I expect two. But really that is what BIOS BOOT should be.
And now the internet tells me I need to set some flag on it using a tool called parted - some flag called bios_grub. Except when I type that magical command that sets the flag, it tells me it can't exist:
[root@otto ~]# parted /dev/sda set 6 bios_grub
parted: invalid token: bios_grub
Flag to Invert?
Isn't this tool nicely written for only the writer of the tool instead of for human beings? Of course I don't know this when it barfs this at me, but at the end of this story I figured a bunch of things out that this tool could have told me.
You see, invalid token just means that it doesn't accept the flag named bios_grub. I know this because I'm a programmer so I know the programmer used a token parser - a thing normal people shouldn't have to know about. What's that you're asking? Flag to Invert? How about the Belgian flag, I would quite like to see the colors go in the opposite direction. No, that's a prompt to choose a different flag to invert than bios_grub. Apparently bios_grub is a flag, not a setting, and I'm trying to invert it, instead of setting it. Can you tell me what flags you do know about, dear parted ?
(parted) help set
set NUMBER FLAG STATE change the FLAG on partition NUMBER
NUMBER is the partition number used by Linux. On MS-DOS disk labels,
the primary partitions number from 1 to 4, logical partitions from 5
onwards.
FLAG is one of: boot, root, swap, hidden, raid, lvm, lba, hp-service,
palo, prep, msftres, bios_grub, atvrecv, diag, legacy_boot
STATE is one of: on, off
Wait, what ? You do know about bios_grub ? But you don't let me set it ?
I seriously spent 30 minutes on trying to figure that one out.
In the end, it's because a) I should run gdisk b) parted won't let you set that flag on a normal MBR drive c) gdisk should convert to using GPT and d) the messages gdisk prints by default are SUPER scary and the docs say that this is intentional to keep away stupid Windows users (I am not making this shit up). Well, that's why I use software RAID, isn't it ? How about we take our chances, dive in deep, and let this gdisk thing do the conversion to GPT on the first disk. Gulp.
OK, I got lucky. That actually worked. I can now create this partition, with the proper flag set. While I'm at it, why don't we try this 'sort partitions' option in gdisk so that this new partition, which is now at the start, but listed as number 4 out of 4, shows up as number 1. Sure, it will renumber all other partitions, but let's just hope that most things use UUID's and labels and what not by now, and if not I should be able to figure things out.
In what feels like Day 5 in a two-day weekend, the system now boots! I actually see a new grub (wait, why is that text-mode only again ? Fedora guys, you spent years to make everything look graphical, because that was some huge important feature that mostly got in my way when it took longer than it was supposed to and I had no way to see why except reboot and remove quiet and rhgb from the options) and now you suddenly let grub2 take that back from you? Show us some spine, please), and the system shows me plymouth again. Until it doesn't anymore, and drops me into a terminal screen.
Hurdle number four. Can you guess what it is ? Go on, take a stab. If you've updated your system, I'm sure you know the answer. I'll give you some whitespace to think about it...
SELinux. Riding in to relabel my file system to save it from the evil people out there. And sure, it warns me. This may take a long time. And then it proceeds to throw asterisks in my face. Lots of asterisks. It's not the first time this happens. But every time it does, I cannot help but wonder one thing.
Who thought it was a great idea to throw asterisks at the user? How many asterisks am I supposed to expect? Never mind that you can't actually count them unless you glue your eyeball at the screen, because there are so many they actually scroll off at the top. You know, if you squint hard enough, you can see the maniacally laughing face of the programmer who thought this was a nice way of showing progress. Never mind that tools like fsck can show a progress bar that actually means something (if you trick it into sending data to file descriptor 0) in a sensible way - one line on the console, and visible progress towards an end goal of 100%.
If only I could guess what a long time is going to end up being. Is it a 'get a drink' amount of time? Or 'watch some dexter'. Or nookie time? Or, get the hell out of the house and do all the shopping for the next three hours because there's no way you'll be doing anything useful with this system for that long?
So I do all of those things, twice, and one even four times times (I won't tell you which but I ended up having to pee a lot), and I come back, and the system has rebooted, and there's actually a GUI asking me to log in.
You know, this Fedora 16 better be frigging spectacular after this six day weekend.
I log in, follow my standard upgrade checklist, try out some of my tools. Media keys don't seem to work as before for my prototype music player (it flashes a nasty forbidden sign at me), and even though I set up to have nothing happen on inserting audio CD's (because my LEGO robot is inserting CD's into an external drive about fifty times a day), Rhythmbox craps on and FORCES me to select which of the many CD's with exactly the same name that audio CD might be. So, par for the course so far.
Maybe a reboot will fix that, it may not know about those settings until I have everything installed and upgraded. And if I reboot, I'd better convert my second drive to GPT and fix my /boot and set that flag and all that. So I do. And for some reason I can't figure out how to tell software raid that sda2 and sdb2 (which are both still perfectly mountable as ext file systems and were part of the previous RAID-1 /boot array before I resized them) really are a software raid. So there's this point where I've wasted more time on trying that then it would have taken me to actually manually type every byte on that /boot partition, and I just give up and recreate a software raid on those two partitions and copy stuff over.
And then I reboot. And won't you know it. Effing goddamn selinux relabel all over again. In fact, this way too long entry was typed completely in less than half the time selinux took to complete some work it had already done an hour ago.
I better have a working system after this last relabel finishes. Now excuse me while I go make some comfort food, potatoes and beans and runny eggs with butter sauce. I'm going to eat it while my good friend Dexter comes back from a long holiday. It's the only thing that is going to get me out of this weekend funk. And you know who I will be thinking about every time my friend Dexter tells me of a problem he solved...
selinux=0 is your friend. I use it everytime.
Comment by Mark — 2011-11-13 @ 16:19
Grub2 works fine with traditional partitions. AFAIK GPT is needed so you can have disks larger than 2 terrabytes.
Comment by Marius Gedminas — 2011-11-13 @ 17:23
Sounds like you spent too much time on this. I’m sure you’re getting what you deserve by having an overly complicated setup.
Don’t get me wrong, I do too, but I expect to pay the price for it sometimes.
Comment by James — 2011-11-13 @ 18:05
Upgrading rather than reinstalling would have saved you from hurdles 1 to 3 (1 because it’s a direct side effect of reinstalling, 2 and 3 because the UID changes and the partition table layout change only affect fresh installations, not upgrades), maybe even 4 too, though the best solution there is to just disable SELinux.
Reinstalling is rumored to be more reliable than upgrading, but it really isn’t.
Comment by Kevin Kofler — 2011-11-13 @ 18:51
Just remember to turn off SELinux before upgrading because there’s a bunch of scripts in RPMs that break when run with an old unupgraded SELinux config. Not sure if any of those would have been critical for the F15=>F16 I did recently, put it certainly warned me a lot while upgrading.
And of course, getattr on /sbin is highly suspicious…
Comment by Benjamin Otte — 2011-11-13 @ 22:30
Haha, I recently switched from Ubuntu to Fedora 16. Though my experience wasn’t QUITE so bad, the SELinux relabeling thing did happen. Actually it wasn’t automatic, first I had to figure out why when I logged in to my account it said an unrecoverable error had occurred and I must log out.
After figuring out what SELinux was, figuring out what a label was, figuring out my labels for my home directory were wrong, and figuring out how to automatically relabel the system, I was able to experience the awesome parade of asterisks for myself!
Comment by Michael Gauthier — 2011-11-13 @ 22:46
Whenever a new Fedora release comes out, I create a new installation using the installer DVD on a new LVM2 logical volume (which I create during installation). My /home partition is encrypted (and keeping this as /home in Anaconda works just fine afterwards), swap as well (although I always need to change /etc/crypttab afterwards to set up automatic key generation at boot time etc.). I reuse the existing /boot partition as /boot, not formatted.
I didn’t have any problems with GPT using this method (because the LVM PV partitions were there already, unchanged, I guess), nor with /home permissions/SElinux attributes: these were changed by Anaconda while creating my ‘new’ user at first boot (it told me /home/user already existed, asked whether this should be reused, then changed ownership and attributes accordingly).
Didn’t experience any issues whatsoever (and didn’t with the 14 to 15 upgrade before either, can’t remember about earlier releases).
The upside of this approach: my F15 installation is still there (until I delete its LV), so I can still boot into it if required.
The downside: need to reinstall all packages I use (but this is a minor issue IMHO: gets rid of all the things I no longer use etc).
Comment by Nicolas Trangez — 2011-11-13 @ 23:47
I do pretty much the same, except for enabling /home during anaconda install, because I want to avoid config and other things getting upgraded. I want to be able to roll back in case things go wrong.
Comment by Thomas — 2011-11-13 @ 23:54
I’d agree somewhat with James – the complicated setup is where things go wrong. Like you, I did the F15 to 16 upgrade over the weekend, and the only problem I hit was that /boot was to small for the upgrade tool (preupgrade) to put the new boot image. And fixing that was simple enough – unmount /boot, turn off swap, and then resize the /boot partition to grab a little of my swap space. GParted handled that just fine, no need to resort to command-line tools.
After that, no fussing with GPT or Grub – it coped quite happily with my existing partition scheme, and seems to have left Grub 1 in place. No problems with usernames / numbers – it’s an upgrade, afterall, so quite happy to leave the existing data in place. And I didn’t notice anything about SELinux during the upgrade process…
Comment by Simon — 2011-11-13 @ 23:48
Your post made my day. Honestly I couldn’t agree more. These are already complex tasks and they are made even harder by tools that are unnecessarily cryptic and just plain freaking hard to use. Couple that with almost always having to deal with them at times when you would rather not (oh my god it’s 1 AM and I can no longer boot my system or access any of my files!!!) and you have a perfect storm of frustration.
Comment by Paul Eggleton — 2011-11-13 @ 23:57
great post, I know how you feel – upgrading is still such a pain
Comment by davidosomething — 2011-11-14 @ 02:18
Kev: *really* reinstalling is usually safer than an upgrade, but doing a ‘fresh’ install on top of your Very Own Kewl Partition Scheme and /home directory that started life as an AIX install in 1982 is every bit as complex and prone to breakage as an upgrade, only people invariably expect it to work perfectly for no good reason.
“You see, there is this new thing called GPT for your partition table, and it is now the default, and it means that fdisk will no longer work, and now you should use gdisk (which sadly is not installed on the rescue bit of the install DVD, boo!), and this is all so we can have grub2, which is supposed to be better or something.”
No, GPT is not ‘so we can have grub2’, grub2 works fine with MS-DOS disk labels. GPT is just flat out _better_: you can have zillions of partitions without the stupid physical/logical dodge, it works on really big disks, and it ditches a ton of legacy gunk which hasn’t made any sense since 1992. We actually mostly advise people to use parted, not gdisk, which *is* in the rescue boot and the anaconda shell. (Which is why gdisk isn’t).
“This new way of doing things needs more space than your average MBR, so you actually need to create a primary partition for this”
It doesn’t, actually. But because of how the MBR layout was done there was always a small bit of space between the MBR and the first partition on the disk, which bootloaders would stick themselves into. Because GPT isn’t crazy there is no such space, so instead of relying on a quirk of the disk label scheme to install themselves into a bit of space which to all intents and purposes doesn’t officially exist, on GPT disks, bootloaders get an actual identified partition to put themselves into. So you, like, know what it is and what’s in it, and junk. Progress!
“But you know, if I’ve done all this, I want to get it right. I don’t want a stinking BIOS boot partition sitting after my /boot partitions. That’s not how F16 sets it up by default.”
There’s nothing particularly ‘right’ or ‘wrong’ about any given location for the BIOS boot partition. It just has to be the right type. That’s one of the obvious advantages of it being a partition and not a magic bit of empty space.
“b) parted won’t let you set that flag on a normal MBR drive”
Wait, what? If you’re installing to an existing disk which has MS-DOS disk labelling what the hell’s your problem? Why all this stuff about GPT? F16 doesn’t *need* a GPT labelled disk, it just uses that format by default when it is entirely reformatting a disk. If you’re reusing an existing layout it won’t try to convert it to GPT and doesn’t need to. So…why all the stuff about GPT when you weren’t using a GPT disk at all? And why do you think it’s so crazy for parted to give you slightly weird output when you try and do something dumb like putting a BIOS boot partition on an MS-DOS labelled disk? Are you sure your issue wasn’t actually nothing at all to do with GPT and you just happened to ‘fix’ it by going in and completely mucking around with your disk layout? Given that you’re using software RAID it sounds somewhat more like https://bugzilla.redhat.com/show_bug.cgi?id=737508 to me.
“Who thought it was a great idea to throw asterisks at the user? How many asterisks am I supposed to expect?”
If the asterisks stop coming, you probably have a problem. It’s just about impossible for SELinux to know how long the relabel is actually going to take, but the asterisks at least tell you ‘I’m doing stuff, keep waiting, don’t force a reboot’.
Comment by Adam Williamson — 2011-11-14 @ 08:51
Hi Adam,
thanks for the clarifications. Yes, the bug you linked to sounds like it was my problem – I had the same error messages, and I’m not sure now (at work) but I do seem to remember the first sector being a low number on my drives. If it was low however it’s because Fedora originally put it there. The system was installed cleanly originally with Fedora 11, so I assume it created that layout. I used gdisk because most instructions I could find regarding this problem suggested gdisk.
I’m sure GPT is better and it may even be better to assign a partition to it – though I don’t particularly enjoy handing out new primary partitions given that in practice you really only get 3 as the 4th is used to make extended ones. Maybe GPT does away with that limitation too though?
By right or wrong, I meant matching what Fedora installs by default as closely as possible.
I didn’t understand your comment on the parted problem at all. I’m not sure why you think people are expected to know that putting a BIOS boot partition on an MS-DOS labeled disk is dumb, or why it is strange to expect a tool to tell you that you can’t set a particular flag instead of pretending it doesn’t exist in one command when it’s clearly telling you it knows about it when you ask it what flags it does understand with another command.
As for the asterisks – I disagree about the impossible. It’s walking a file system, the computer knows how many entries it needs to go through, looks pretty simple to me. Sure, it needs to count and keep track of stuff. So what? FYI, I had 4 more instances of selinux relabeling my whole file system, each taking well over an hour, for no apparent reason that I can tell. The only thing that changed over reboots was whether or not /boot was mounted from a partition… I’m still not doing selinux=0, but from the comments on my post that still seems to be the preferred option for a lot of people, and silly shit like this is exactly why… It’s called friction and SELinux is bad at eliminating it.
Comment by Thomas — 2011-11-14 @ 11:19
Ha, yes, amusing. No, I’m kidding, it isn’t. ESPECIALLY the parts about the uid reassignment and “we need a bigger /boot”, which, considering both have happened before, should have been a) foreseeable, b) avoidable c) treated well in an upgrade path. But no, they’re not, so yet again everything fails, pretty much in the same spectacular fashion it all failed the last times.
How many system ids are needed? If 500 isn’t enough, just like 100 weren’t enough the first time around, what possible reason could there be to think that 1000 would be enough? How about user ids are started from 100000 or something so that this time there REALLY is space at the bottom? Why do the system ids need to be at the bottom, anyway?
And /boot is even more laughably inadequate. Until F11 it was recommended 200MB. Then for F12 it was 300MB. F13 increased it to 500MB. Now it’s 1GB. These are all rounding errors with todays disks (and arguably even with SSDs), but somehow the process continues to nickel-and-dime users with almost every upgrade. And why is a /boot required, anyway? Don’t even get me started on swap. How many more years are going to be needed before Fedora (or Linux in general) can boot, run and hibernate with one partition, just like pretty much every other operating system on the planet.
If I sound frustrated, that’s because those annoyances aren’t even the beginning of the trouble of getting Fedora running on a MacBook Air 2011, which I’ve tried on-and-off for the past couple of months. Many of the other problems are arguable as much or more Apple’s fault as they are Fedora’s, but these can not be blamed on Apple. And Fedora is the (in my mind, anyway), advanced, forward-looking distribution. I don’t even want to imagine how ugly it is somewhere else.
Comment by Osma — 2011-11-14 @ 14:25
osma: there’s no ‘bigger /boot’ requirement in F16, so I’m not sure where that idea comes from. The default size of /boot has been 500MB for a while. There was no 300MB ‘requirement’ that I recall, there was only one change in recent times, and that was from a default of 200MB to a default of 500MB. The BIOS boot partition is a *separate* thing that’s needed when booting a GPT-labelled disk from BIOS (rather than EFI).
/boot is not required, you can install without a separate /boot if you like. You can install Fedora to a single partition and it will work (you’d need to use MS-DOS disk labels or boot via EFI now, though, or else you’d need two partitions – BIOS boot and /); it’ll give you a warning about not using a swap partition, but it will work. We test this at every release.
We can’t stick UIDs arbitrarily high because there are some things that already use very high UIDs rather like very high port numbers; this was discussed on -devel when the UID change was proposed, in fact. Also, using 1000 synchronizes with Debian, which is worth doing.
Comment by Adam Williamson — 2011-11-14 @ 17:56
I stumbled on this blog when desperately searching for help to get Fedora 16 install and boot on an UEFI enabled system, a totally new install (no upgrade, a brand new desktop PC).
This is my 2nd or 3rd day I’m trying, and just as I saw the light when this different, colorful, penguin-banner boot screen from the install DVD appeared and I proceeded to go, in meticulous fashion, through all the installation steps – in particular when formatting the disk – to finally hit the reboot button, to once more get a self-resetting PC, right where the bootloader should start.
The only reason I’m even trying to get Fedora 16 work is that I found a blog on another forum explaining how to setup Xen with VMs and PCI/VGA passthrough that would finally allow me to ditch the dual-boot setup, and the howto is based on – guess what – Fedora 16, which comes strongly recommended by this guy.
So far, the ONLY good thing about Fedora 16 that I could find is LVM support out of the box, if only the damn thing would boot the installation (I can boot from USB stick and DVD, “just” not from the install).
Thanks to this blog I now know that I’m not the only one having a hard time with Fedora. That maybe I’m not just too dumb to figure it out. After all, reading through the endlessly long installation instructions on the Fedora site, I did discover that I made some mistakes during the first 2-3 attempts of getting it installed, such as live media is no good for UEFI etc.
Googling for help I found endless explanations on the great benefits of gpt and how wonderfully advanced Fedora 16 is. If only someone could just plain tell me WHAT WORKS (for setting up Fedora 16 64-bit on a UEFI enabled system).
P.S.: Linux Mint 12 LMDE installs and runs in a breeze, and comes ready for work. It also boots real fast from SSD.
Comment by heiko — 2012-05-06 @ 20:09