November 2001

11.27.01:
Ok, I haven't been keeping this log up to date very well. I was actually away quite a lot in the last 2 weeks with the conference and Thanksgiving. Mainly been working on the GUI for the filtering stuff, using Java Swing. Not too many complications there. So I guess that's why I didn't have too much to say anyway. The GUI's mainly done for the input end, but I still need to actually run her code. We also need to figure out what/how to display the callback information while a process is running.

Been also compiling John's new param code on windows. Having a link error with __builtin_next_arg. This is used by va_start, which is in the stdarg.h lib. The documentation (on windows) says that to use that, which has support for methods with variable length arguments, (like printf), I should include stdio.h and stdarg.h (or varargs.h for some flavor of unix.) Anyway, it claims to be compatible with WinNT but I cannot figure out where builtin_next_arg is supposed to be. I tried including stdlib.h just to see, but that didn't work. If I include varargs.h then I get a different error about not defining va_alist, which is a bit strange because we do not try to use that method. So anyway, I've been searching on the web and searching through the documentation but I have not found a solution yet.

Meanwhile we've set a date for the toolkit release. We found a memory leak due to an inconsistency between mine and Cheng's implemenation of nextEntry (in list iteration). We handle memory in different ways which causes the user to free memory in different ways. It should be consistent though so that the same retrieval code can be used with any underlying index. After a long discussion where we couldn't really decide on what's the best way of handling memory, I decided that I would change my code to match Cheng's for the release (on Dec. 7th). An issue came up during the discussion about having multiple iterators down the same list. Our design does not handle that. In the early stages, we had intended on using STL type iterators but ended up not doing that, even though I remember that we figured out how to do it technically. Nobody could remember why we ended up doing the way we did. (If only I had logs! :)

11.06.01:
Meeting with Jamie today. I told him about what Yi and I did yesterday, but I think he'd heard it before because he did not seem very impressed. We went over the GUI design and he clarified some things, which I think will be very useful. I showed him the results from the pruning experiments except they are not what he expected. I've read the paper in this year's SIGIR about pruning and what I ended up doing is like their threshold type of pruning, where entire inverted lists get dropped. But the method that works better is one that modifies each inverted list to contain only the more frequent words. That's what Jamie was expecting too. Anyway, I suppose it was good to try this because now we've tried it. The results are actually pretty good, except we don't save enough in disk size or anything else to make it worthwhile. So I will have to run the other experiments too, but not until I finish the GUI. I said that I would finish in 1.5 weeks.

11.05.01:
Long meeting with Yi today. But we successfully built a jni layer between java and her existing C++ code, using the same "architecture" that I tried out last week. We also went over the requirements for the GUI. It was really difficult for me to do a task analysis or requirements analysis or anything like that because I have very little understanding of what the application actually does. There are pages and pages of settings. She said that the users found them confusing, but there aren't too many ways to present it. If you don't have a good understanding of the underlying functionality then there it will be hard to understand. This is not something that can really be intuitive or "walk up and use" for a novice user. It definitely requires learning the science behind it all. So the best solution we have so far is to group settings and parameters into a tabbed pane instead of presenting it all at once. I tried my best to figure out what combinations of parameters are possible so that if you select something that excludes others, they become disabled. That is a bit of help to the user. I can't think of what kind be more helpful except a user's manual. Maybe tooltips.

11.01.01:
Ran many pruning tests today on the collection selection database, taking the top 30K words, 25K words, etc. down to 5K words. The retrieval performance is not affected much, but the benefits are also unclear. The database sizes do not reduce by a significant amount. And the query times are almost the same. Still I am glad that the pruning code works and I am able to run all the scripts and such OK. I need to read that paper Jamie was talking about from SIGIR and see what kind of pruning techniques they used.

Also, wrote a little jni layer between java and C++ today. It was very simple but it proves that the concept is doable. We can have a java code call a native method that is implemented in C++. That method sets the jni env and calling object to be global. Then it calls another C++ class. That class then calls a static method in the native implementation which calls a java method (of the same original java object) using the global environments. I need to now write, probably with Yi, the real layer that will fit between the GUI I build and her C++ code. We still need to figure out exactly how to access parameters, like java strings, but it shouldn't be too hard. The stuff is well-documented on java's site.

<<>>