![]() ![]() ![]() ![]() ![]() |
February 20022.25.02:Found the "linux" bug, although the problem might be more related to different versions of gcc. Our linux machines run a newer version. The problem was that during the merge, when I iterate through the lists, there's a for loop to increment the iterator. But I don't want to start at the first item, I want the loop to increment itself first so the start case is says iter++, which works. But for some bone-headed reason, I have some that say iter=iter++, which doesn't increment. So all that "junk" being added was the values for the first item merged into the list a second time. iter+=1 and iter=iter+1 also work. Also revamped the pruning code. It now does as many passes as necessary to get each db within x% error rate of desired rank. Also can do runs with top N% instead of just top n terms. Need to make runs on kmeans data to get generalizeable results.
2.20.02:
On LA, the indexes are exactly the same whether it goes through the
merge or not. But the same problems occurs on Ringtail, which leads
me to believe that it is a platform issue. The sample data, being small,
worked beautifully. Off hand I don't know what the problem is. When we first tried to
compile on linux, there were problems with the setbuf functionality
of fstream (forcing it to use a given buffer). I commented that stuff
out (letting it manage its own buffer), but the same problem occurs.
I need to step through the merge code line by line to see what the
problem is. The doc frequency and ctf counts are correct. The read
also reads in the correct length list, but the data is incorrect.
When I compare inverted lists from the 2 dbs, the begining looks the
same, the corrupted one then has a bunch of garbage (wrong value
integers) then continues on the right path again. So some wrong
values got spliced into the middle of the list.
Could there be a memory leak on linux that is not on solaris? I have downloaded some purify type programs
for linux, but have not tried them out. I think I should step through
the merge first to see that there isn't a glaring error. First, I have to get gdb to work predictably.
2.04.02: Jamie's guess is that there is a memory leak somewhere, which is causing a problem on linux and not
on solaris. I ran the pushindexer through purify over the weekend (on solaris since there is not purify
for linux) and there aren't really any memory leaks. There are a few UMR and array out of bounds stuff that
is all in Paul's text handling stuff. I guess I'll have to either grab Paul or fix them myself. There are some
small memory leaks relating to the param stuff. I kind of remember Cheng mentioning something about them, but
then dismissing them because they are so small. There was one significant memory leak from my code with not
releasing all the document id strings. But this would happen at the very end of the process anyway, so I can't
see how that would make much of difference. In any case, that is easy to fix. Currently I'm looking into whether
or not I need to allocate that memory anyway. I can't remember if I have to do my own copying before I push into a
vector or if the vector takes care of all of that. I was doing strdup just to be safe. I'm guessing that I have
to since I vaguely remember looking into that and I would have changed it if it wasn't the case. Anyway, I'm running
that test now, and it's taking forever because there are people on LA.
Speaking of that, Jamie and I are in the process of looking into new servers to buy. I guess we will go
with linux servers because they are cheaper. We're still waiting on some quotes from purchasing and some
answers from facilities regarding cross-mounting of disks. Since it looks like we're moving over to linux,
I also checked to see what memory leak detectors are available on linux. I downloaded 2 of them to try out
at some point.
I'll need to re-run the indexing using a smaller cache so that it uses the hierarchical merge. Although
that code is almost identical to the final merge method, I should still check to make sure there isn't anything
weird going on there. It just takes so long to run one of these processes.
|