自然语言处理_Movie Review Data(电影审查数据)_「金点文库网」分享知识创造价值

带月的成语-七年级下册英语教案

Movie Review Data(电影审查数据)

数据摘要：
This page is a distribution site for movie- review data for use in
sentiment-analysis experiments. Available are collections of
movie-review documents labeled with respect to their overall sentiment
polarity (positive or negative) or subjective rating (e.g.,
stars) and sentences labeled with respect to their subjectivity status
(subjective or objective) or polarity.

中文关键词：

电影,情感剖析,情绪极性,主观评价,地位,
英文关键词：

Movie,Sentiment-analysis,Sentiment polarity,Subjective rating,Status,
数据格式：

TEXT
数据用途：

The data set could be used for nature speech processing and analysis.

数据详细介绍：
Movie Review Data
This page is a distribution site for movie- review data for use in
sentiment-analysis experiments. Available are collections of movie- review
documents labeled with respect to their overall sentiment polarity (positive or
negative) or subjective rating (e.g.,
labeled with respect to their subjectivity status (subjective or objective) or
polarity. These data sets were introduced in the following papers:
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up?
Sentiment Classification using Machine Learning Techniques,
Proceedings of EMNLP 2002.

Bo Pang and Lillian Lee, A Sentimental Education: Sentiment Analysis
Using Subjectivity Summarization Based on Minimum Cuts,
Proceedings of ACL 2004.

Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for
sentiment categorization with respect to rating scales, Proceedings of
ACL 2005.

We also have available an addtional sentiment-analysis dataset,
Congressional floor-debate transcripts, with supportoppose labels.
If you have results to report on these corpora, please send email to Bo Pang
andor Lillian Lee so we can add you to our list of other papers using this data.
Thanks!

Please cite the version number of the dataset you used in any
publications, in order to facilitate comparison of results. Thank you.
Sentiment polarity datasets
polarity dataset v2.0 ( 3.0Mb) (includes README v2.0): 1000 positive
and 1000 negative processed reviews. Introduced in PangLee ACL
2004. Released June 2004.

Pool of 27886 unprocessed html files (81.1Mb) from which the polarity
dataset v2.0 was derived. (This file is identical to from data
release v1.0.)

sentence polarity dataset v1.0 (includes sentence polarity dataset
README v1.0: 5331 positive and 5331 negative processed sentences
snippets. Introduced in PangLee ACL 2005. Released July 2005.

archive:


polarity dataset v1.0 (2.8Mb) (includes README): 700 positive
and 700 negative processed reviews. Released July 2002.
o
polarity dataset v1.1 (2.2Mb) (includes README.1.1):
approximately 700 positive and 700 negative processed reviews.
Released November 2002. This alternative version was created
by Nathan Treloar, who removed a few non-Englishincomplete
reviews and changing some of the labels (judging some
polarities to be different from the original author's rating). The
complete list of changes made to v1.1 can be found in .
o
polarity dataset v0.9 (2.8Mb) (includes a README):. 700
positive and 700 negative processed reviews. Introduced in
PangLeeVaithyanathan EMNLP 2002. Released July 2002.
Please read the
README.
o
(81.1Mb): all html files we collected from the IMDb
archive.
o
Sentiment scale datasets
scale dataset v1.0 (includes scale data README v1.0): a collection of
documents whose labels come from a rating scale. Introduced in
PangLee ACL 2005. Released July 2005.
o
Sep 30, 2009: Yanir Seroussi points out that due to some
misformatting in the raw html files, six reviews are misattributed
to Dennis Schwartz (29411 should be Max Messier, 29412
should be Norm Schrager, 29418 should be Steve Rhodes,
29419 should be Blake French, 29420 should be Pete Croatto,
29422 should be Rachel Gordon) and one (23982) is blank.

original reviews for scale dataset v1.0 (includes scale data README
v1.0): original reviews from which the subjective extracts in scale
dataset v1.0 were extracted.

Subjectivity datasets
subjectivity dataset v1.0 (508K) (includes subjectivity README v1.0):
5000 subjective and 5000 objective processed sentences. Introduced in
PangLee ACL 2004. Released June 2004.

Pool of unprocessed source documents (9.3Mb) from which the
sentences in the subjectivity dataset v1.0 were extracted.

The creation of this website is based upon work supported in part by the
National Science Foundation (NSF) under grant no. ITRIM IIS-0081334,
IIS-0329064, CCR-0122581, and BES-0329549; SRI International under
subcontract no. 03-000211 on their project funded by the Department of the


Interior, National Business Center; a Cornell Graduate Fellowship in Cognitive
Studies; and by an Alfred P. Sloan Research Fellowship. Any opinions,
findings, and conclusions or recommendations expressed above are those of
the authors and do not necessarily reflect the views of the National Science
Foundation or Sloan Foundation and should not be interpreted as representing
the official policies, either expressed or implied, of any sponsoring institution,
the U.S. government or any other entity.
If you have any questions or comments regarding this site, please send email
to Bo Pang or Lillian Lee.

数据预览：

点此下载完整数据集