




|
November 2001
11.27.01:
Ok, I haven't been keeping this log up to date very well. I was actually away quite a lot in the last 2 weeks
with the conference and Thanksgiving. Mainly been working on the GUI for the filtering stuff, using Java Swing.
Not too many complications there. So I guess that's why I didn't have too much to say anyway. The GUI's mainly
done for the input end, but I still need to actually run her code. We also need to figure out what/how to display
the callback information while a process is running.
Been also compiling John's new param code on windows. Having a link error with __builtin_next_arg. This is used
by va_start, which is in the stdarg.h lib. The documentation (on windows) says that to use that, which has support for
methods with variable length arguments, (like printf), I should include stdio.h and stdarg.h (or varargs.h for
some flavor of unix.) Anyway, it claims to be compatible with WinNT but I cannot figure out where builtin_next_arg
is supposed to be. I tried including stdlib.h just to see, but that didn't work. If I include varargs.h then I
get a different error about not defining va_alist, which is a bit strange because we do not try to use that method.
So anyway, I've been searching on the web and searching through the documentation but I have not found a solution
yet.
Meanwhile we've set a date for the toolkit release. We found a memory leak due to an inconsistency between mine
and Cheng's implemenation of nextEntry (in list iteration). We handle memory in different ways which causes the user
to free memory in different ways. It should be consistent though so that the same retrieval code can be used with
any underlying index. After a long discussion where we couldn't really decide on what's the best way of handling
memory, I decided that I would change my code to match Cheng's for the release (on Dec. 7th). An issue came up
during the discussion about having multiple iterators down the same list. Our design does not handle that. In the
early stages, we had intended on using STL type iterators but ended up not doing that, even though I remember that
we figured out how to do it technically. Nobody could remember why we ended up doing the way we did. (If only I
had logs! :)
11.06.01:
Meeting with Jamie today. I told him about what Yi and I did yesterday, but I think he'd heard it before because
he did not seem very impressed. We went over the GUI design and he clarified some things, which I think will be
very useful.
I showed him the results from the pruning experiments except they are not what he expected. I've read the paper
in this year's SIGIR about pruning and what I ended up doing is like their threshold type of pruning, where entire
inverted lists get dropped. But the method that works better is one that modifies each inverted list to contain
only the more frequent words. That's what Jamie was expecting too. Anyway, I suppose it was good to try this
because now we've tried it. The results are actually pretty good, except we don't save enough in disk size or
anything else to make it worthwhile. So I will have to run the other experiments too, but not until I finish
the GUI. I said that I would finish in 1.5 weeks.
11.05.01:
Long meeting with Yi today. But we successfully built a jni layer between java and her existing C++ code, using the
same "architecture" that I tried out last week. We also went over the requirements for the GUI. It was really
difficult for me to do a task analysis or requirements analysis or anything like that because I have very little
understanding of what the application actually does. There are pages and pages of settings. She said that the
users found them confusing, but there aren't too many ways to present it. If you don't have a good understanding
of the underlying functionality then there it will be hard to understand. This is not something that can really be
intuitive or "walk up and use" for a novice user. It definitely requires learning the science behind it all. So
the best solution we have so far is to group settings and parameters into a tabbed pane instead of presenting it
all at once. I tried my best to figure out what combinations of parameters are possible so that if you select
something that excludes others, they become disabled. That is a bit of help to the user. I can't think of what
kind be more helpful except a user's manual. Maybe tooltips.
11.01.01:
Ran many pruning tests today on the collection selection database, taking the top 30K words, 25K words, etc. down to 5K
words. The retrieval performance is not affected much, but the benefits are also unclear. The database sizes do not
reduce by a significant amount. And the query times are almost the same. Still I am glad that the pruning code works
and I am able to run all the scripts and such OK. I need to read that paper Jamie was talking about from SIGIR and
see what kind of pruning techniques they used.
Also, wrote a little jni layer between java and C++ today. It was very simple but it proves that the concept is doable.
We can have a java code call a native method that is implemented in C++. That method sets the jni env and calling
object to be global. Then it calls another C++ class. That class then calls a static method in the native implementation
which calls a java method (of the same original java object) using the global environments. I need to now write,
probably with Yi, the real layer that will fit between the GUI I build and her C++ code. We still need to figure
out exactly how to access parameters, like java strings, but it shouldn't be too hard. The stuff is well-documented
on java's site.
|