图书简介
Textual Statistics with R comprehensively covers the main multidimensional methods in textual statistics supported by a specially-written package in R. Methods discussed include correspondence analysis, clustering, and multiple factor analysis for contigency tables. Each method is illuminated by applications. The book is aimed at researchers and students in statistics, social sciences, hiistory, literature and linguistics. The book will be of interest to anyone from practitioners needing to extract information from texts to students in the field of massive data, where the ability to process textual data is becoming essential.
1. Encoding: from a corpus to statistical tables Textual and contextual data Textual data Contextual data Documents and aggregate documents Examples and notation Choosing textual units Graphical forms Lemmas Stems Repeated segments In practice Preprocessing Unique spellings Partially-automated preprocessing Word selection Word and segment indexes The Life UK corpus: preliminary results Verbal content through word and repeated segment indexes Univariate description of contextual variables A note on the frequency range Implementation with the Xplortext package In summary 2. Correspondence analysis of textual data Data and goals Correspondence analysis: a tool for linguistic data analysis Data: a small example Objectives Associations between documents and words Profile comparisons Independence of documents and words The X2 test Association rates between columns and words Active row and column clouds Row and column pro_le spaces Distributional equivalence and the X2 distance Inertia of a cloud Fitting document and word clouds Factorial axes Visualizing rows and columns Category representation Word representation Transition formulas Superimposed representation of rows and columns Interpretation aids Eigenvalues and representation quality of the clouds Contribution of documents and words to axis inertia Representation quality of a point Supplementary rows and columns Supplementary tables Supplementary frequency rows and columns Supplementary quantitative and qualitative variables Validating the visualization Interpretation scheme for textual CA results Implementation with Xplortext Summary of the CA approach 3. Applications of correspondence analysis Choosing the level of detail for analyses Correspondence analysis on aggregate free text answers Data and objectives Word selection CA on the aggregate table Document representation Word representation Simultaneous interpretation of the plots Supplementary elements Supplementary words Supplementary repeated segments Supplementary categories Implementation with Xplortext Direct analysis Data and objectives The main features of direct analysis Direct analysis of the culture question Implementation with Xplortext 4. Clustering in textual analysis Clustering documents Dissimilarity measures between documents Measuring partition quality Document clusters in the factorial space Partition quality Dissimilarity measures between document clusters The single-linkage method The complete-linkage method Ward’s method Agglomerative hierarchical clustering Hierarchical tree construction algorithm Selecting the final partition Interpreting clusters Direct partitioning Combining clustering methods Consolidating partitions Direct partitioning followed by AHC A procedure for combining CA and clustering Example: joint use of CA and AHC Data and objectives Data preprocessing using CA Constructing the hierarchical tree Choosing the final partition Contiguity-constrained hierarchical clustering Principles and algorithm AHC of age groups with a chronological constraint Implementation with Xplortext Example: clustering free text answers Data and objectives Data preprocessing CA: eigenvalues and total inertia Interpreting the first axes AHC: building the tree and choosing the final partition Describing cluster features Lexical features of clusters Describing clusters in terms of characteristic words Describing clusters in terms of characteristic documents Describing clusters using contextual variables Describing clusters using contextual qualitative variables Describing clusters using quantitative contextual variables Implementation with Xplortext Summary of the use of AHC on factorial coordinates coming from CA 5. Lexical characterization of parts of a corpus Characteristic words Characteristic words and CA Characteristic words and clustering Clustering based on verbal content Clustering based on contextual variables Hierarchical words Characteristic documents Example: characteristic elements and CA Characteristic words for the categories Characteristic words and factorial planes Documents that characterize categories Characteristic words in addition to clustering Implementation with Xplortext 6. Multiple factor analysis for textual analysis Multiple tables in textual analysis Data and objectives Data preprocessing Problems posed by lemmatization Description of the corpora data Indexes of the most frequent words Notation Objectives Introduction to MFACT The limits of CA on multiple contingency tables How MFACT works Integrating contextual variables Analysis of multilingual free text answers MFACT: eigenvalues of the global analysis Representation of documents and words Superimposed representation of the global and partial configurations Links between the axes of the global analysis and the separate analyses Representation of the groups of words Implementation with Xplortext Simultaneous analysis of two open-ended questions: impact of lemmatization Objectives Preliminary steps MFACT on the left and right: lemmatized or nonlemmatized Implementation with Xplortext Other applications of MFACT in textual analysis MFACT summary 7. Applications and analysis workflows General rules for presenting results Analyzing bibliographic databases Introduction to the lupus data The corpus Exploratory analysis of the corpus CA of the documents _ words table The eigenvalues Meta-keys and doc-keys Analysis of the year-aggregate table Eigenvalues and CA of the lexical table Chronological study of drug names Implementation with Xplortext Conclusions from the study Badinter’s speech: a discursive strategy Methods Breaking up the corpus into documents The speech trajectory unveiled by CA Results Argument flow Conclusions on the study of Badinter’s speech Implementation with Xplortext Political speeches Data and objectives Methodology Results Data preprocessing Lexicometric characteristics of the speeches and lexical table coding Eigenvalues and Cramer’s V Speech trajectory Word representation Remarks Hierarchical structure of the corpus Conclusions Implementation with Xplortext Corpus of sensory descriptions Introduction Data Eight Catalan wines Jury Verbal categorization Encoding the data Objectives Statistical methodology MFACT and constructing the mean configuration Determining consensual words Results Data preprocessing Some initial results Individual configurations MFACT: directions of inertia common to the majority of groups MFACT: representing words and documents on the first plane Word contributions MFACT: group representation Consensual words Conclusion
Trade Policy 买家须知
- 关于产品:
- ● 正版保障:本网站隶属于中国国际图书贸易集团公司,确保所有图书都是100%正版。
- ● 环保纸张:进口图书大多使用的都是环保轻型张,颜色偏黄,重量比较轻。
- ● 毛边版:即书翻页的地方,故意做成了参差不齐的样子,一般为精装版,更具收藏价值。
关于退换货:
- 由于预订产品的特殊性,采购订单正式发订后,买方不得无故取消全部或部分产品的订购。
- 由于进口图书的特殊性,发生以下情况的,请直接拒收货物,由快递返回:
- ● 外包装破损/发错货/少发货/图书外观破损/图书配件不全(例如:光盘等)
并请在工作日通过电话400-008-1110联系我们。
- 签收后,如发生以下情况,请在签收后的5个工作日内联系客服办理退换货:
- ● 缺页/错页/错印/脱线
关于发货时间:
- 一般情况下:
- ●【现货】 下单后48小时内由北京(库房)发出快递。
- ●【预订】【预售】下单后国外发货,到货时间预计5-8周左右,店铺默认中通快递,如需顺丰快递邮费到付。
- ● 需要开具发票的客户,发货时间可能在上述基础上再延后1-2个工作日(紧急发票需求,请联系010-68433105/3213);
- ● 如遇其他特殊原因,对发货时间有影响的,我们会第一时间在网站公告,敬请留意。
关于到货时间:
- 由于进口图书入境入库后,都是委托第三方快递发货,所以我们只能保证在规定时间内发出,但无法为您保证确切的到货时间。
- ● 主要城市一般2-4天
- ● 偏远地区一般4-7天
关于接听咨询电话的时间:
- 010-68433105/3213正常接听咨询电话的时间为:周一至周五上午8:30~下午5:00,周六、日及法定节假日休息,将无法接听来电,敬请谅解。
- 其它时间您也可以通过邮件联系我们:customer@readgo.cn,工作日会优先处理。
关于快递:
- ● 已付款订单:主要由中通、宅急送负责派送,订单进度查询请拨打010-68433105/3213。
本书暂无推荐
本书暂无推荐