自然语言处理_Movie Review Data(电影审查数据)
带月的成语-七年级下册英语教案
Movie Review Data(电影审查数据)
数据摘要:
This page is a distribution site for movie-
review data for use in
sentiment-analysis
experiments. Available are collections of
movie-review documents labeled with respect to
their overall sentiment
polarity (positive or
negative) or subjective rating (e.g.,
stars)
and sentences labeled with respect to their
subjectivity status
(subjective or objective)
or polarity.
中文关键词:
电影,情感剖析,情绪极性,主观评价,地位,
英文关键词:
Movie,Sentiment-analysis,Sentiment
polarity,Subjective rating,Status,
数据格式:
TEXT
数据用途:
The data set
could be used for nature speech processing and
analysis.
数据详细介绍:
Movie Review Data
This page is a distribution site for movie-
review data for use in
sentiment-analysis
experiments. Available are collections of movie-
review
documents labeled with respect to their
overall sentiment polarity (positive or
negative) or subjective rating (e.g.,
labeled with respect to their subjectivity
status (subjective or objective) or
polarity.
These data sets were introduced in the following
papers:
Bo Pang, Lillian Lee, and Shivakumar
Vaithyanathan, Thumbs up?
Sentiment
Classification using Machine Learning Techniques,
Proceedings of EMNLP 2002.
Bo Pang
and Lillian Lee, A Sentimental Education:
Sentiment Analysis
Using Subjectivity
Summarization Based on Minimum Cuts,
Proceedings of ACL 2004.
Bo Pang
and Lillian Lee, Seeing stars: Exploiting class
relationships for
sentiment categorization
with respect to rating scales, Proceedings of
ACL 2005.
We also have available an
addtional sentiment-analysis dataset,
Congressional floor-debate transcripts, with
supportoppose labels.
If you have results to
report on these corpora, please send email to Bo
Pang
andor Lillian Lee so we can add you to
our list of other papers using this data.
Thanks!
Please cite the version
number of the dataset you used in any
publications, in order to facilitate
comparison of results. Thank you.
Sentiment
polarity datasets
polarity dataset v2.0 (
3.0Mb) (includes README v2.0): 1000 positive
and 1000 negative processed reviews.
Introduced in PangLee ACL
2004. Released June
2004.
Pool of 27886 unprocessed html
files (81.1Mb) from which the polarity
dataset
v2.0 was derived. (This file is identical to from
data
release v1.0.)
sentence
polarity dataset v1.0 (includes sentence polarity
dataset
README v1.0: 5331 positive and 5331
negative processed sentences
snippets.
Introduced in PangLee ACL 2005. Released July
2005.
archive:
polarity dataset v1.0 (2.8Mb) (includes
README): 700 positive
and 700 negative
processed reviews. Released July 2002.
o
polarity dataset v1.1 (2.2Mb) (includes
README.1.1):
approximately 700 positive and
700 negative processed reviews.
Released
November 2002. This alternative version was
created
by Nathan Treloar, who removed a few
non-Englishincomplete
reviews and changing
some of the labels (judging some
polarities to
be different from the original author's rating).
The
complete list of changes made to v1.1 can
be found in .
o
polarity dataset v0.9
(2.8Mb) (includes a README):. 700
positive and
700 negative processed reviews. Introduced in
PangLeeVaithyanathan EMNLP 2002. Released July
2002.
Please read the
README.
o
(81.1Mb): all html files we collected from the
IMDb
archive.
o
Sentiment scale
datasets
scale dataset v1.0 (includes scale
data README v1.0): a collection of
documents
whose labels come from a rating scale. Introduced
in
PangLee ACL 2005. Released July 2005.
o
Sep 30, 2009: Yanir Seroussi points out
that due to some
misformatting in the raw html
files, six reviews are misattributed
to Dennis
Schwartz (29411 should be Max Messier, 29412
should be Norm Schrager, 29418 should be Steve
Rhodes,
29419 should be Blake French, 29420
should be Pete Croatto,
29422 should be Rachel
Gordon) and one (23982) is blank.
original reviews for scale dataset v1.0
(includes scale data README
v1.0): original
reviews from which the subjective extracts in
scale
dataset v1.0 were extracted.
Subjectivity datasets
subjectivity dataset
v1.0 (508K) (includes subjectivity README v1.0):
5000 subjective and 5000 objective processed
sentences. Introduced in
PangLee ACL 2004.
Released June 2004.
Pool of unprocessed
source documents (9.3Mb) from which the
sentences in the subjectivity dataset v1.0
were extracted.
The creation of this
website is based upon work supported in part by
the
National Science Foundation (NSF) under
grant no. ITRIM IIS-0081334,
IIS-0329064,
CCR-0122581, and BES-0329549; SRI International
under
subcontract no. 03-000211 on their
project funded by the Department of the
Interior, National Business Center; a
Cornell Graduate Fellowship in Cognitive
Studies; and by an Alfred P. Sloan Research
Fellowship. Any opinions,
findings, and
conclusions or recommendations expressed above are
those of
the authors and do not necessarily
reflect the views of the National Science
Foundation or Sloan Foundation and should not
be interpreted as representing
the official
policies, either expressed or implied, of any
sponsoring institution,
the U.S. government or
any other entity.
If you have any questions
or comments regarding this site, please send email
to Bo Pang or Lillian Lee.
数据预览:
点此下载完整数据集