beagle help |
2007-05-18
|
I want to like beagle, I really do. I want to use it, and I want it to work for me. Some of the smartest people in free software have hacked on it, and I can only say good stuff about them.
So where's the disconnect ? I have been going crazy for the last week trying to figure out why my home machine is consuming 100% CPU when it should be idling because I'm not doing anything on it. I can tell it's doing stuff because the fans are blowing like crazy. When I move my mouse to wake up the screens and X, I can see my CPU monitor suddenly going from 50% (on this HT machine) back to 0%.
It wazzz in my PC. Eatsing my CPU.
But I couldn't figure out what it was, as it went away when I checked, and I had no second computer in the house to log into the machine.
Tonight I brought my laptop back home, and it's poor old beagled. It's probably been Doing Stuff for the last week, and it has had 100% of one hyperthread for more than 75% of the time to it in.
But it is Not Yet Done.
And it goes skulking off into the corner whenever the X session wakes up - typically the sort of nifty hack Joe or Jon would have come up with.
Who can tell me What It Is Doing ?
Here's a textdump of beagle-status output if it helps:
Every 5.0s: beagle-info --status                       Fri May 18 22:15:40 2007
Scheduler:
Count: 1755041513
Status: Finding next task to execute
Pending Tasks:
1 Delayed 0 (5/18/2007 10:15:41 PM)
File Crawler
2 Maintenance 100 (5/18/2007 8:55:15 AM)
Final Flush for FileSystemIndex
3 Maintenance 0 (5/18/2007 8:55:18 AM)
Optimize FileSystemIndex
Any tips on debugging beagle - from a user perspective - are welcome too.
http://beagle-project.org/Troubleshooting ;)
Comment by Serkan Hosca — 2007-05-18 @ 21:38
Many users have the 100% CPU usage bug with Beagle (which eventually forces the machine to require a reset). My guess is that it does not “like” some file formats when it indexes them, and so it goes bananas. There is a bug open in Ubuntu’s bugzilla about this too… Over here, it would happen 3-4 times a week.
I had emailed Joe Shaw about it and he kindly emailed me back with instructions on how to debug it, but each time I was up and ready to debug, the bug wouldn’t happen (Beagle was obviously not indexing that part of my drive at the time). Eventually, I gave up waiting to reproduce and I uninstalled Beagle. As far as I know, the bug still exists in the latest versions of Beagle too.
So, here is what Joe said. I think he would still like to have these logs to analyze them.
“Ok, when it’s spinning at 100% CPU, can you send SIGUSR2 to the beagled-helper process and see what it says in the logs? ~/.beagle/Log/current-IndexHelper) The logs should still be in ~/.beagle/Log, although it’s not necessarily the current-* ones. Even if that’s the case, just shutdown beagle when you log in with beagle-shutdown, and then run it in the foreground with “beagled –debug –fg” and when the bug happens, you can either cut-and-paste the error or just shut it down again and look at the log file.”
Comment by Eugenia — 2007-05-18 @ 21:54
There’s a chance that you’ve hit one of the various bugs where beagle sees a certain file type – empty OpenOffice docs, for instance – and Freaks The Heck Out. See (e.g.) http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217031 for details.
Basically.. try restarting beagle, and cross yer fingers. Maybe try stracing it and see if it hits a particular file and Flips Out on that, and report it as a bug.
Someday Beagle will stop doing this stuff. Then it will be awesome.
Comment by Will Woods — 2007-05-18 @ 22:04
Just disable beagle and do this instead:
yum install tracker-search-tool
I know that might not be exactly the answer that you’re looking for but that saved me lots of hair pulling. :)
Comment by User — 2007-05-18 @ 22:10
Beagle indexes as hard as it can while the screensaver is on. The idea being that you’re away from your computer, there’s no reason why it shouldn’t speed up and try to index all your data as fast as possible so that it doesn’t do it while you *are* using it. So that’s why this is happening.
A scheduler count of 1755041513 is insanely high, so the likely culprit as to why it’s obsessively crawling your files is that you’ve extinguished your allocation of inotify watches. Beagle uses one per directory in your home directory, and the kernel gives you 8192 by default. If you have large, expansive source trees, you could easily be using up all 8192 of them. If this is the case, Beagle will fall back to continuously crawling over the non-watched directories for changes. Paired with the screensaver, this is probably why your machine is going batshit.
To increase the number of watches, put a large number in /proc/sys/fs/inotify/max_user_watches. If you’re running Beagle 0.2.15 or newer, you can look in your ~/.beagle/Log/current-Beagle file for a message like “Maximum inotify watch limit hit”. Prior to 0.2.15, you might get an error like that in a ~/.beagle/Log/current-BeagleConsole log file.
Hope this helps, let me know if you need any more help.
Joe
Comment by Joe Shaw — 2007-05-18 @ 22:12
Also, 8192 is too low of a number for the default single-user Linux desktop machine. Increasing the number has no effect on kernel memory usage unless they’re actually used, so for most people increasing the limit is harmless. I’d suggest filing a bug against your distro to get the number bumped up. I’ve thought about including an init script to do this with Beagle, but since every distro’s is different and it’d be literally one line it didn’t seem worth it.
Joe
Comment by Joe Shaw — 2007-05-18 @ 22:18
Seriously I hate to say this, but use Tracker. I love beagle, and it indexes more files than Tracker does, but it is a pig. Tracker indexes faster and uses less resources than beagle. You might be best to pull the latest from svn as Jaime is going to do a release soon with some new features + a new gui:
http://jamiemcc.livejournal.com/6588.html
Beagle is sweet, but something as speed critical as an indexer should be written in lowlevel code, not C#.
Comment by Jeff Schroeder — 2007-05-18 @ 22:32
FWIW, I just exclude my source trees from beagle, because they take fucking forever to index and they’re not the kind of thing I really want to search with Beagle (I’ll use other tools for that, e.g. IDEs or grep).
Comment by no one in particular — 2007-05-18 @ 22:58
Have to agree on the comments on Tracker. Beagle showed the way, and deserves credit for this, but it’s clearly not the way forward for a lot of reasons, not only that performance never gets any better even after years of development. This is how it often goes: Beagle was a great proof-of-concept, but now it’s time to have an actual, practical implementation for everyone, not just hyper-modern quadro-core-machines with tons of memory.
Also, Tracker is so much more than just a file indexer, it’s a complete metadatabase which the whole desktop will get benefits from. The *only* drawback of Tracker is that it has less file formats due to being started later… everything else it wins hands down side-by-side. And more file types is just a matter of manpower. So, ditch Beagle and go help Tracker.
I think the Beagle guys all know this, too, but there’s a lot of personal investment done here and I’m sure it’s hard to let go.
Comment by Stoffe — 2007-05-18 @ 23:07
Tracker and Beagle both have things in their favor; anybody who says either one has only one advantage (ahem, “less file formats”) doesn’t know what they’re talking about.
– Tracker seems to index only files, has no API (not to mention HLL bindings) for apps to use, has no documentation on how to write new filters (or anything else…), and doesn’t seem to allow filtering something based on an existing dump-to-text tool.
– Beagle is written in C# (higher memory requirements, troublesome for bindings), and seems to run all filters in one process.
Personally, my money is on Beagle, because it has only technical hurdles ahead (I’ll take a HLL and documentation any day), but maybe Tracker will catch up (it looks like a younger project). But both are cool projects, and both have many advantages over the other.
Sorry I can’t help with Beagle debugging. My experience is that when I tried to install it myself, it sucked horribly, but when I got it automatically as part of a distro (Ubuntu) it worked great. There must have been some magic I missed, but I have no idea what it was.
Comment by Mike — 2007-05-19 @ 04:29
I gave up on beagle after TWO WEEKS of continuous hard drive thrashing. One inotify handle per subdirectory of my homedir, eh? And if it can’t get them all it resorts to crawling continuously?
$ find . -type d | wc -l
32226
That would, perhaps, explain it.
I have to say, I don’t think this is sane fallback behavior.
Comment by Zack Weinberg — 2007-05-19 @ 06:21
i found that beagle took several days to index my home folder when i first installed it. i think the main reason was that i had tens of thousands of small files. most of these were documentation, eg the entire docs for python, php, gtk, pygtk, mysql etc etc, all in many-html-pages format. and then usually sat right next to it a tar.gz containing all the docs again. i did a clear out of my docs folder and removed ~50,000 files!!
also i suspect that beagle goes into all the .svn folders i have and indexes those.
Comment by sam — 2007-05-19 @ 09:43
[…] http://thomas.apestaart.org/log/?p=482 . […]
Pingback by divisioni and holy war « YANNB - yet another not needed blog — 2007-05-19 @ 14:10
Maybe there should be an option (checked on by default) to exclude anything that looks like a source code directory. For example, exclude a dir tree with a .svn in the top level parent. And exclude lengthy HTML doc directories too. I don’t think I’ve ever found it useful to search my source code using beagle, bc that’s what grep is for. Beagle should be doing other (desktop-related) things, like search my openoffice docs and music files, and then get the heck out of the way.
Comment by Brent — 2007-05-20 @ 05:14
no really use tracker
Comment by Gavin — 2007-05-20 @ 10:55
@Joe: Here’s output of a find on my machine:
[thomas@ana pYsearch-3.0]$ find /home/thomas/ -type d | wc
8271279 8293398 985407071
Granted, among other things, I have in my home directory:
– lots of source code
– some nfs mounts with music
– an nfs mount with my dirvish backup directory, which for the purpose of this discussion contains the equivalent of multiple copies of this machine’s /home/thomas, as well as my laptop’s
Is there an easy way to tell beagle to exclude stuff ?
Comment by Thomas — 2007-05-20 @ 14:55
Thomas: Run “beagle-settings” (or the “Search and Indexing” item under the System->Settings menu), the second tab has a list of directories to exclude.
Comment by Mads Chr. Olesen — 2007-05-20 @ 14:58
[…] of issues people have seen with indexing, including the irritating loop people have seen (and which Thomas blogged about) in which a shortage of inotify watches combined with the higher speed indexing that kicks in when […]
Pingback by joe shaw / thunder thunder thunder thunder thunderbirds ho — 2007-08-06 @ 18:07
[…] don’t know what kind of tests Beagle has as part of its codebase and release procedure. I wrote about my experience with Beagle at some point and Joe commented to the effect that the kernel gives you 8192 inotify watches and […]
Pingback by thomas.apestaart.org » strongly typed — 2007-10-29 @ 09:49
[…] of its codebase and release procedure. I wrote about my experience with Beagle at some point and Joe commented to the effect that the kernel gives you 8192 inotify watches and when you run out, it acts […]
Pingback by thomas.apestaart.org » strongly typed — 2008-07-11 @ 10:48