




|

09.25.01:
So. I've been a bit bad about keeping this up to date. I don't remember what I've been doing in the last week.
I moved over to the new CVS tree. I checked everything in. The newest version has the following differences:
1) removes temp files 2) pre-loads doclength, ctf, and df 3) has bag of words support. Paul is supposed to run
the retrieval tests to see how much preloading makes everything faster.
OPEN_MAX replaced with 32. Not sure how to allow override.
Also looked into full path names. Not sure how to do this to work on both platforms. In "stdlib.h" on NT,
there is a function called _fullpath that gives you a full path from a relative path. This is not included on
unix. Well, there must be a command somewhere on unix that will give you the full path or the current working
directory. Just need to find it.
Also read some of Jamie's papers about query-based sampling. And looked at some of John's papers. Just trying
to get a better idea of what language modeling is all about. I should know, right.
09.18.01:
So people are downloading the toolkit and trying to compile it too. I got two help requests. (both linux users.) One
was about a missing definition for OPEN_MAX. Somewhat prepared for that since I got the same error on NT. The other
person had some syntax problems with the Makerule file (using FreeBSD). I forwarded that one on to Cheng because I have no
idea.
BTW, OPEN_MAX on solaris (at least on our server LA) is 64. Windows is 256.
Also read the documentation again today because one test site didn't know about my indexer. True enough, no mention
of PushIndex API anywhere. Odd and strange because I remember talking to Cheng about it before. It must have been in
an older version. Found an error, kind of. Sent mail to Cheng and John about it. Need to also write something for InvFPMain
ASAP and distribute.
09.17.01:
Sigh.
Took the code that worked on NT and compiled it on unix. or tried to. The error I got: no matching function for call
to "filebuf::pubsetbuf". Why is there a standard when it doesn't keep anything standard??? I'm particularly frustrated
because I didn't really anticipate any problems. So I guess I can #ifdef __PLATFORM this part since I have code that
will do the same thing on each platform. But how annoying.
Also today is small housekeeping chores. Checked out the code from the new cvs tree. Compiled. Have to see what code
I've checked into the old tree that needs to be moved to the new tree. My fault for not paying more attention to this
new tree email when it went out last month. Need to remind Cheng to give permissions to Paul and Kevin. (Sent out email.)
Other chores: updating the lemur mailing lists and handing out alpha download logins.
09.12.01:
Created a new header file today called stl_headers.hpp, which includes <iostream> <fstream> <cstdlib> <map> <vector>.
I was still trying to experiment with including <fstream.h> so that I could use setbuf. This resulted in many
ambiguous symbol errors in files where I was trying to use ifstream or ofstream. The using namespace statement does
not rid the ambiguous errors. Only using "std::ifstream" worked. This problem was mentioned earlier a few days ago.
Although this could be a solution, it is a real pain in the butt to do that everywhere I want to use it. Also, it
does not solve my problem of having to include <fstream.h> which I really shouldn't do if we want to stick with
the standard. I have a stinking suspicion that these things are not supported because they are not part of the standard.
(ie, a constructor that allows you to specify the buffer).
I think I was wrong before in assuming (if I did) that rdbuf is not a member function. While setbuf and fd are not, rdbuf is
indeed part of basic_ifstream. It returns the filebuf object that is connected with the stream. However, the setbuf
method is a protected member of filebuf, so I can't use that either. Searching through the documentation, I found that
in basic_streambuf, which basic_filebuf is based on, there is actually another method call pubsetbuf, that is public.
I don't know why they have 2, and what is the difference. Some documentation says that it just calls setbuf, which makes
it more confusing as to why they'd need 2. I guess the implementation can decide not to.
So, instead of trying to say ifstream->setbuf, I change that line to ifstream->rdbuf->pubsetbuf, and that compiles.
I'm in the middle of testing it to see if it actually works. Although, I guess I have no real way of knowing if it is
using the buffer I told it to use. Do I? I can check for error returns, I guess.
From the changes that I need to do to experiment with the headers, it does seem that having all of them in one
centralized header file makes that easier. Maybe that is they way to do it to avoid ambiguity problems.
So I ran it, and it merged. and everything looks normal. Frustrating to think that there's not been anything fundamentally
wrong with my code. Just a matter of sifting through documentation and tweaking the names.
09.11.01:
Didn't do any work today.
No work related revelations at all.
09.06.01:
Quick lesson today. <algo.h> is not just <algo> but actually <algorithm>. Why? I don't know.
Learned during compiling the retrieval and language modeling code. Not too many problems with it. Had to comment out another <unistd.h>.
09.05.01:
I have some kind of stomache bug today so not too much progress. The most productive thing I've
done is talk to Kevin and show him how to telnet to LA, brief introduction of CVS, and show him where
the toolkit lives. I've asked him to read the documentation and make notes on where we can improve it.
We sat down together and he tried to help me with the NT compilation. It resulted in clarifying one of
the mysteries I discussed yesterday, but I haven't found a solution yet.
So it turns out that for whatever reasons, windows has 2 ifstream classes. One is included by
<ifstream>, which just typedefs it to basic_ifstream, which is why it does not have members like setbuf, fd, etc.
There is another, which has all that stuff, called <fstream.h>. After reviewing the ANSI C++ standard, I
was under the impression that <fstream> is supposed to be equivalent to <fstream.h> so I am pretty
confused by this. This is also a problem since last week, we decided on using the standard way of including headers,
which means to include everything minus the .h, prepend C libraries with "c" (ie cstdlib instead of stdlib.h) and means
that we have to declare using namespace std. Kevin suggests that I use a separate header file solely for the
purposes of including STL stuff so that I can have a centralized place for controlling it. This may not be a
bad idea, since currently when I have include problems I have to trace each file. It is quite painful.
I've also discovered what to do about the unistd.h file. Apparently, all you have to do is have an empty
file called unistd.h that gets included to make it work on windows. Now, the one on unix is not empty, which
means that we should not include unistd.h in our distribution. So, why does windows need a unistd.h file if
it's just going to be empty? Perhaps a better solution is not to have it try to include that file when it is
being compiled on windows. This means a #ifdef. I haven't been able to test this yet because since I was working
on the fstream problem, everything is a mess and does not compile at all. I am getting all sorts of ambiguous
symbol errors.
Another strange thing is that in one of the files (invfpdoclist), I can get rid of the ambiguous symbol message
by declaring std::ostream, where I use it. However, the "using namespace std" at the beginning of the file is
supposed to do the same thing. But that doesn't work. I'm not really sure why. Kevin doesn't know either and
he says he will read up on namespaces. No matter how much reading I've done on namespaces, it doesn't seem to
help me. The standard seems like more trouble than it's worth. I hope that someone finds that namespaces are a
very useful thing because all it's done is give me problems.
09.04.01:
I'm trying to compile things on NT. Things that have previously compiled on unix. I am running into
a lot of problems, even with my own code. In the merge code, I had added "limits.h" so that I could
use OPEN_MAX. A system var that tells me the maximum number of filehandles can be opened at one time.
However, even though windows does not complain about the include line, it doesn't know what OPEN_MAX is.
I think the windows limits.h is a file for integer consts (like max int, etc). Anyway, Jamie told me today
that probably we cannot guess how many file handles are available at any moment (even if we can manage to
get the MAX) so it might be better to set a reasonable default value that can be overridden by a command
line argument. I need to figure out now what the reasonable default should be.
Another problem I'm having is with an include file called "unistd.h". I'm not sure what this file is
for, but it's being included by the lex_parser (auto-generated code). Strange thing is, I've compiled this
before on the other machine. It might be that I downloaded that file to get rid of the error message and
so it is not on this machine. It might be that these machines have different releases of VC++. I think
the first is more likely. It means though that we would need to distribute "unistd.h" if it doesn't usually
come with windows. One article I read about unix/NT development suggested not including this file because
it might cause problems on NT. I should note then that the previous builds on NT (presumeably with this
file included) executed fine. There is no way for me to check my old machine because it has been re-formatted.
But wait. I remember now that I did use yet another machine to successfully compile the same project. I need
to check that machine.
This last problem is annoying. I've been using ifstream for file input and used the setbuf member function
to specify my own buffer area for use. (it uses filebuf object to manage the buffer). all of this compiled
and executed fine on unix. Now, on NT, it claims that "setbuf" is not a member of basic_ifstream. apparently
ifstream is an alias for basic_ifstream. I don't really understand why I can't use setbuf since I found out
about it through the Windows documentation. I tried to get around it by creating a filebuf object and attaching
it to the stream. But you need a file descriptor, and "fd" apparently isn't a member of basic_ifstream either event
though, like filebuf, it is in the documentation. My suspicion so far is that it is not an actual part of the
C++ standard. I need to do more research on that. I think this because there are some constructors that allow
you to specify your own buffer reserved space, but I found some documentation on the web saying those constructors
are not part of the standard. Like I said, more research is needed. (Another strange thing is that the autocomplete
on VC++ lists setbuf (fd, rdbuf) as members, which makes it seem like I should be able to use them.)
<<
|