LINGUIST List 2.821
Sat 23 Nov 1991
FYI: Brown and LOB Corpora
Editor for this issue: <>
Directory
Henry Kucera, Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
Steve Fligelstone, Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
Message 1: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
Date: Thu, 21 Nov 91 09:49:23 EST
From: Henry Kucera <HENRYbrownvm.brown.edu>
Subject: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
This concerns the query re the Brown and LOB corpora:
The Brown corpus (American English) is available to
non-profit organizations (such as universities), essentially in two formats:
text only (so called "untagged" version) on tape or diskettes from our friends
at the Norwegian Centre for Humanistic Research, P.O. Box 54, University of
Bergen, Bergen, Norway. The cost varies depending on format and the dollar
exchange rate. It is in the range of $100 -$200. E-mail (for Bitnet) is:
FAFSRVNOBERGEN. However, you would have to sign a written agreement (no
copying, no commercial use, etc.). The size varies depending on format but
the untagged uncompressed Brown corpus (without grammatical designators)
is about 8mb.
The "tagged" version of the corpus (which includes an annotation of every word
by an expanded grammatical class-82 classes in all) is available from Text
Research, 196 Bowen Street, Providence, RI 02906. Because of its size, it
comes on mag. tape only (1600 or 6250 bpi, ASCII or EBCDIC) and its cost to
academic institutions is $1,000.- The reason for the difference is that the
tagged corpus provides much more information and carries a separate copyright.
There are also some restrictions: no copying, no commercial use, etc. A written
agreement must be signed by a responsible official of the Department or
University Administration.
Text Research has no connection with Brown University and has no e-mail
address. However, you can either send e-mail to me for transmission or a fax
to Text Research at 401-751-8958. The size of the tagged database is quite
large--about 53mb. However, it can be fairly easily compressed by a skilled
programmer. A large manual, giving a detailed description of tags, etc. is
included.
Incidentally, there are no discounts available for either the tagged or the
untagged version. These are fixed prices. Non-academic use is possible only
by obtaining a license from Text Research.
As for the LOB corpus (British English): Both untagged and tagged versions
are available, but only to non-profit institutions, from the address in Bergen
given above. There are fairly severe restrictions on its use, as far as I
remember (because of British copyright laws). I can't cite the prices right
now but the Bergen people a pretty good in answering e-mail.
Hope this helps. Henry Kucera.
Message 2: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
Date: Thu, 21 Nov 91 16:48:41 GMT
From: Steve Fligelstone <eia002cent1.lancs.ac.uk>
Subject: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
Mark Sanderson asks about availability of tagged versions of the Brown
and LOB (Lancaster/Oslo-Bergen) Corpora. The tagged LOB Corpus, along
with several other widely used corpora can be obtained by writing to
ICAME (International Computer Archive of Modern English) at this address:
Knut Hofland,
ICAME
Norwegian Computing Centre for the Humanities
Harald Harfagresgt. 31
Postboks 53
Universitetet
N-5027 Bergen
NORWAY
email (earn/bitnet): fafkhnobergen
The Brown Corpus is also available from this source, but not in tagged
format. However, I understand that the tagged version may be obtained
TEXT RESEARCH,
186 Bowen St.,
Providence RI 02906,
U.S.A.
There is furthermore a grammatically analysed (parsed as opposed to merely
part-of-speech tagged) version of part of the Brown Corpus. This is
referred to as the Gothenburg Corpus. For details contact:
Gudrun Magnusdottir
Sprakdata
Goteborgs Universitet
S-412 98 Goteborg
Sweden
Finally, here at Lancaster work is nearing completion (honestly!) on
a parsed version of part of the LOB Corpus. Write to me if you want
to be kept informed of its progress and availability.
Steve Fligelstone
UCREL
Linguistics Department
Bowland College
Lancaster University
GB-Lancster LA1 4XZ
email: eia002uk.ac.lancaster
Steve Fligelstone