First five minutes with tracker |
2011-03-21
|
I have two projects on which I will eventually need some kind of file metadata management. One is my still-being-written music application, with one of its core features being distributed - I have three computers and a bunch of devices (like my phone) to put music on, and I want to stop doing that manually.
At its core it will need a distributed database (for example desktop couch) and some rules to decide what files to copy where.
A second application is yet to be started, and I hope it already exists and I can avoid having to write it. But at its core it would help me keep my offline backups up-to-date by indexing online data on all my machines, tracking a backup strategy for each machine and its folders (for example, documents and photos need really good backup, while downloaded iso's for fedora I can lose no problem), and telling me when I should bring an external drive from work and hook it up to the NAS because, according to the records, I have at least 100 MB of changes in important files that I care about.
If I am serious about wanting to avoid rolling my own, I should check what metadata projects exist out there, and the first one on the list is Tracker.
I'm surprised that, for all I've read about it on Planet Gnome I never actually tried to use it.
So, my initial impressions:
- Tracker is a really ungoogleable name. No, you go and try to find instructions on how to start Tracker!
- I installed it on Fedora 14 - the package were there so that's a good start
- I found the search tool. Obviously it doesn't find anything. However, it doesn't actually tell me that. Surely a message telling me that there is nothing indexed yet would make sense.
- Strangely enough it doesn't just start indexing. How do I start using this thing ? I have my laptop running and it can index all night, but it just needs a kick! The documentation is not much help - the header says 'How can I use it' then goes on to mention tools but does not mention the very first step to take to get going.
- Since Google doesn't help much, let's check the package with rpm -qa tracker. The README mentions a tracker-miner-fs in /usr/libexec that should index. Starting that does something. A status icon now shows a looking glass blinking once every two seconds. Applications are 100% indexed, my file system 1%.
- It's hardly using any CPU at all. top doesn't show any tracker-related process doing anything. I want to throw more CPU at tracker. The preferences already have resource use maxed out. I'm sure what I'm asking is ironic to the devs who I'm sure have had gazillions of bug reports asking them to make tracker use less CPU.
- Still not sure how to use the search tool. Typing any letter and searching for it doesn't do anything. Surely one of the files or applications it has indexed has the letter a or e in it?
- Since Applications are indexed, maybe it can find 'terminal' ? Oh great, yes, it finds that.
- Playing around with the command line tools instead then.
$ tracker-search a
Search term 'a' is a stop word.
Stop words are common words which may be ignored during the indexing process.Results: 0
Oh, ok, so you're ignoring one letter searches. Maybe your GUI should actually *tell the user*.
- Meanwhile, browsing the documentation to see if tracker is going to suit my two purposes mentioned above, I see it doesn't store any kind of checksums on files. I don't know if that is considered unnecessary overhead (and I'm wrong thinking I will need it for my purposes) or just no added yet.
- asked a question in the IRC channel, did not get a reply so far. Not meant as criticism - IRC help gets taken for granted easily by people, and I know how hard it is to make an IRC channel responsive.
I'm going to let tracker sit and see if indexes something by tomorrow. It'd be great to use a tool that I know will get lots of love and care in GNOME.
I'm pretty sure underneath there's some excellent hacking, but for developers evaluating solutions to their problems the five minute out of the box experience is important. I think Tracker could do a few quick improvements and get some easy wins to get more people convinced.
Hey, how about picking up the task and to improve the documentation then, instead of wasting time with blog posts? Just an idea ;-)
Comment by michael — 2011-03-21 @ 23:05
Michael,
I might if I knew what the right command was. Obviously starting something by hand in /usr/libexec is not the way things are supposed to work.
The best docs are no docs – and tracker just starts automatically or can be started from the preferences or search tool on your first search.
Comment by Thomas — 2011-03-21 @ 23:09
For distributed music you should also look at UPnP and rygel
Comment by qwefpnia — 2011-03-21 @ 23:23
I’m not sure if tracker is supposed to start indexing directly after being installed. However on Debian, tracker-miner-fs gets started by gnome-session after login. You can also start the indexing process with tracker-control -s.
Comment by bronte — 2011-03-21 @ 23:34
I actually ran into the same issue yesterday.
I restarted my machine and it started indexing things well.
I guess the installation script doesnt start tracker-extract by default (which I think has something to do with the indexing.)
Comment by Varun Madiath — 2011-03-21 @ 23:52
Hi,
If I am not mistaken you have to:
1) Install tracker
2) Open up tracker preferences (in my system, ubuntu, it is in System->Preferexes->Search and indexing) and set it up
3) Log out and login again.
A good way to know if it is working is to check the preference to always show the icon in the notification area.
Another hint is look for the command line program tracker-control. You can use it to stop indexing, delete old index and start over, etc.
Comment by Paulo — 2011-03-22 @ 00:13
were you able to have tracker start by scan nothing, and then add folders for it to scan
I really hate it how it tries to scan everything, a trick Google might pull
using 0.8 version on Maverick
Comment by pt — 2011-03-22 @ 03:05
By default it seems to track only the XDG dirs and things in your home dir, but not underneath.
Comment by Thomas — 2011-03-22 @ 10:27
Sorry, but my previous reply got all messed up by your blogging software. Let’s try this again:
In response:
#1, Tracker is a really ungoogleable name. No, you go and try to find instructions on how to start Tracker!
A1, This is the same for a lot of GNOME projects. If you put “project” or “gnome” after “tracker”, you find what you’re looking for.
—
#2, I installed it on Fedora 14 – the package were there so that’s a good start
A2, I presume 0.8.x of Tracker?
—
#3, I found the search tool. Obviously it doesn’t find anything. However, it doesn’t actually tell me that. Surely a message telling me that there is nothing indexed yet would make sense.
A3, Which one? We don’t just have one command line tool or just one UI for searching. About nothing being indexed, that really depends on the UI and it’s a fair point. We can improve that. However, that’s also a geeky thing, the average user won’t know what “indexing” means here and will likely just be confused by such a message.
—
#4, Strangely enough it doesn’t just start indexing. How do I start using this thing ? I have my laptop running and it can index all night, but it just needs a kick! The documentation is not much help – the header says ‘How can I use it’ then goes on to mention tools but does not mention the very first step to take to get going.
A4, Packaging error? It starts when the session starts, so unless you logged out or the packaging started it for you, why would it? How is this Tracker’s fault? You mean the documentation on the project website is out of date? I agree, that needs updating and I have been meaning to do that. However, the documentation it links to *FIRST* which is http://live.gnome.org/Tracker/Documentation does have the details you’re looking for and is updated quite regularly. Other than the fact that Tracker instantiates itself, anyone (with some Linux experience) could use tracker- and find a bunch of commands available, like tracker-control which seems an obvious place to start for someone not just using the UI.
—
#5, Since Google doesn’t help much, let’s check the package with rpm -qa tracker. The README mentions a tracker-miner-fs in /usr/libexec that should index. Starting that does something. A status icon now shows a looking glass blinking once every two seconds. Applications are 100% indexed, my file system 1%.
A5, Yes, this sounds like 0.8.x. This has been removed for 0.10.x.
—
#6, It’s hardly using any CPU at all. top doesn’t show any tracker-related process doing anything. I want to throw more CPU at tracker. The preferences already have resource use maxed out. I’m sure what I’m asking is ironic to the devs who I’m sure have had gazillions of bug reports asking them to make tracker use less CPU.
A6, Well, it really depends what it is doing. If it is at 1%, it’s usually crawling your file system. This is not an expensive operation. The indexing part (which it does subsequently) is. Yes we have had a lot of CPU related issues.
—
#7, Still not sure how to use the search tool. Typing any letter and searching for it doesn’t do anything. Surely one of the files or applications it has indexed has the letter a or e in it?
Since Applications are indexed, maybe it can find ‘terminal’ ? Oh great, yes, it finds that.
Playing around with the command line tools instead then.
$ tracker-search a
Search term ‘a’ is a stop word.
Stop words are common words which may be ignored during the indexing process.
Results: 0
Oh, ok, so you’re ignoring one letter searches. Maybe your GUI should actually *tell the user*.
A7, No, we’re ignoring “stopwords”, http://en.wikipedia.org/wiki/Stop_words, yes we should update the UI to do this. Are you willing to help out here?
—
#8, Meanwhile, browsing the documentation to see if tracker is going to suit my two purposes mentioned above, I see it doesn’t store any kind of checksums on files. I don’t know if that is considered unnecessary overhead (and I’m wrong thinking I will need it for my purposes) or just no added yet.
A8, We don’t store the checksum of a file, we store the mtime. We haven’t investigated if checsums would be faster or slower, but I would guess slower.
—
#9, asked a question in the IRC channel, did not get a reply so far. Not meant as criticism – IRC help gets taken for granted easily by people, and I know how hard it is to make an IRC channel responsive.
A9, Which channel? I am in there every day practically for the last few years and I don’t remember you coming to ask us any questions.
Comment by Martyn Russell — 2011-03-22 @ 10:41
Hi Martyn,
thanks for the detailed reply!
1) Ok, so I searched ‘gnome tracker start’ this time. I got here: http://projects.gnome.org/tracker/start.html Given that this references trackerd which is a program I don’t have, I assume this page is out of date. Probably needs deleting. It does link to the up-to-date docs, which as I mentioned still don’t actually say how to start using it.
2) yep, 0.8.17
3) Applications>Accessories>Tracker Search Tool Philip told me yesterday this is just a test app and not actually meant to be the interface. If that’s the case, it shouldn’t be installed in the GNOME menu. I would think it a mistake however to not provide a good responsive usable UI for it, since I would assume most people that would need desktop search stuff would want a UI for it ?
4) I got a similar response from Philip yesterday – a packaging bug. Apparently both of you expect a user to log out and log back in to use Tracker. First of all, if it is possible for software to start working without logging out and in, why should a user have to ? That just means the developer punts the problem to the user. Second, how can the user know he’s supposed to log out and in again ? I don’t see it mentioned in any of the docs. Third, why can the tools that interact with the system not tell you exactly that ? Why can’t the tool that searches notice that there is no metadata available to search through, and either start the underlying system automatically, or tell the user what to do (log out and back in) ? I’m sure the reasoning ‘it’s a packaging bug’ makes sense from the side of the fence on which the Tracker developer working at this for years is sitting, but a new user trying out this system does not have that perspective. I still haven’t logged out of my session after installing tracker and don’t plan to either until my battery running out forces me to. I’m sure you have about as many terminals open as me to know why :)
The wiki you link to has useful stuff, but that first link (Getting Started) is more about explaining some components. I guess what I would have expected to see somewhere is ‘After installing tracker, either log out or back in, or run tracker-control -s’ I did look through the binaries as you suggested, but when there are 12 the one you need is easy to miss. I had tried tracker-import (assuming it would index something based on the name), tracker-stats to see if it was indexing (that’s the wrong on), tracker-status (which told me a little but not a lot). I’m sure I tried tracker-control too. Maybe ‘Start miners’ didn’t mean much to me at that point, although I later learned that the miner is the process that indexes.
5. What has been removed in 0.10 ? The status icon ?
6. With Philip pointing me to where it logs I figured out that I had less than 1% space in my home dir, so it was paused. However, neither the status applet nor the search tool told me the reason. A bit infuriating. Philip told me this is because tracker 0.8 completely blows up when it runs out of disk space, so stopping it was considered better.
7. When I read about Stop_words obviously it makes sense. But again, this is after the fact. The same problem exists to a lesser extent in any kind of search tool – for example LDAP search in Evolution. As a user you want to convince yourself that this software you’re trying to use works. If it’s search then you want to see some results based on a query, but you don’t yet know what is in the list of things that can be searched or what the proper search queries are. Most search tools don’t allow saying ‘give me anything’ because in the steady state that would return too much results. Most of them require you to already search them. So as a user you do as wide a search as possible – for example one letter searches. LDAP will limit the number of results. Evolution’s address completion only kicks in after three characters, so similar problem. Tracker seems to silently ignore those searches. It’s frustrating from a user perspective even if it makes perfect sense once you understand the technology.
Curiously I had never tried this in Google for example, so I just did. Google simply returns relevant results for the letter ‘a’ or ‘e’. I had expected it to give me a message telling me to refine my search.
Bottom line to me – if tracker ignores part or all of the intention of the user’s search request, it would be good that the ui tells the user. Not doing anything and not showing anything will lead the average user to think it doesn’t work.
As for whether I’m willing to help out here – as you can imagine when I’m still in the pre-honeymoon stage looking for good candidates, I’m not, and I doubt anyone else doing his preliminary investigation of systems would. If the first experiences convince me in some way that this is the way to go, then that might change. I’m guessing you already knew that answer though :)
8. I’m sure checksums would make it slower. If I were to add it in my system it would be in a second step, after first indexing the rest. There definitely are situations in which a file can be changed in-place, without the mtime being updated, and the resulting file having the same length. id3v1 tag changing comes to mind as an example.
9. in #tracker on gimpnet; Zeenix answered with some hints after a while. Again, not a criticism as such, I know this is hard to expect.
My main concern, as always, is focused on the 5 minutes out of the box experience. I tried tracker first because a) I read about it on Planet GNOME and b) it’s GNOME technology so I should try it first. But I’m sure I can find a whole bunch of metadata-gathering projects and I’m not going to spend two hours on each of them before I decide which one I want to use for my software.
I think Tracker can make some easy wins here by improving this 5 minutes out of the box experience vastly. I have been surprised at the amount of friction Tracker has generated in the GNOME community, not really understanding either side of the discussion. Both sides have people and viewpoints that I respect a lot. My opinion is just one out of many, but I think it would help a lot in removing that friction if this 5 minute experience were good.
Obviously just take this opinion for what it’s worth, because I’m sure you can find a hundred other people each with a different opinion on why Tracker adoption is not friction-free. I just happen to be a developer that, for a few personal projects, decided to get interested enough in the technology to start learning and trying a little. So I guess I’m the target audience for Tracker.
Comment by Thomas — 2011-03-22 @ 11:26
As Paulo said you can use the gui to config `tracker-preferences`. In there you’ll find you can tell where to scan and what not to scan (recursively or not, as well). The config file($HOME/.config/tracker/tracker-miner-fs.cfg) might be a better bet b/c, IIRC, it is commented nicely. As for starting it, you use `tracker-control`. It has various options and plug-ins so you’ll need to decide for yourself what you need. Read the Man page for tracker-control.
BTW, tracker-1.0 was released recently, IIRC, and if you plan on using it you might want to go with the latest version (besides performance improvements you’ll be able to use the latest ontologies — go here http://projects.gnome.org/tracker/features.html).
I really wish Gnome would fully embrace Tracker. GVFS’ store isn’t sufficient, IMHO, and Gnome-Shell is desperate for its integration.
Comment by liam — 2011-03-22 @ 11:10
Liam, Yea 0.10 was released recently. We agree, thank you for the support.
—
Thomas, :) thanks for replying, more answers:
1) This is a bug with the gnomeweb-wml (IIRC) module and how things are processed there. Because of the way I removed the old website, it seems some of the old HTML files still exist and I can’t get rid of them (so they’re still indexed by Google I presume). Note, the git repository doesn’t have these files any more. :/
2) Yea, really need 0.10.x I would say, much nicer :) and 9 months more development.
3) Avoid t-s-t like the plague I would say. It isn’t nice. The “tracker-needle” is the new version of this app which I wrote. There is also tracker-search-bar (but you need ultra new GNOME for that). Some people want a UI, others (like the Tracker team) prefer app integration.
4) You don’t have to log out and log back in. You can use ‘tracker-control -ts’ to terminate and start (restart) the processes. The -s is enough to just start things otherwise. The tools in the system (I presume you mean UI tools) shouldn’t tell the user about command line tools, that’s really geeky and shouldn’t be necessary.
5) The list of features is on the roadmap (added/removed): http://live.gnome.org/Tracker/Roadmap
6) Yes, we don’t tell you if you’re paused due to disk space. It’s a known issue, there is a bug somewhere about it. The question is, how to deal with this properly. My thoughts were to use libnotify so the user is notified (now we don’t have tracker-status-icon) and this would work with GNOME shell too. Do you have any free time to work on this? :)
7) Agree. But we only have 6 people full time working on Tracker, more hands would help and it should be an easy hack ;)
8) Sounds like a broken file system then *cough* FAT *cough* :) Usually we check the parent folder mtime, not each file (sorry, I should have been clearer) first time to check if anything in the directory changed since the last index. This isn’t full proof. If you change the file’s contents, the mtime should be updated, otherwise I consider that just broken.
9) Agree. But if you’re developing Tracker it’s hard to notice these things, so you’re input is valued and we would welcome any patches on this and/or bug reports.
—
We’re always trying to get people on board, and we have been technically driven by performance and getting the core right – which means the UI and outside user experience is a little diminished by comparison. Sorry about that. Any help you can provide would be great.
Comment by Martyn Russell — 2011-03-22 @ 16:20
Take a look at git-annex too.
Comment by foo — 2011-03-23 @ 05:30
I wish Tracker used Xapian for searching.
Comment by foo — 2011-03-23 @ 05:32
is there a PPA for the 0.10 version for Ubuntu 10.10
Comment by pt — 2011-03-23 @ 06:48