图书简介
This textbook is designed for an undergraduate course in data science that emphasizes topics in both statistics and computer science.
Preface Background and motivation Intended audience Key features of this book Changes in the second edition Key role of technology How to use this book Acknowledgments I Part I: Introduction to Data Science 1. Prologue: Why data science? What is data science? Case study: The evolution of sabermetrics Datasets Further resources 2. Data visualization The federal election cycle Composing data graphics Importance of data graphics: Challenger Creating effective presentations The wider world of data visualization Further resources Exercises Supplementary exercises 3. A grammar for graphics A grammar for data graphics Canonical data graphics in R Extended example: Historical baby names Further resources Exercises Supplementary exercises 4. Data wrangling on one table A grammar for data wrangling Extended example: Ben’s time with the Mets Further resources Exercises Supplementary exercises 5. Data wrangling on multiple tables inner_join() left_join() Extended example: Manny Ramirez Further resources Exercises Supplementary exercises 6. Tidy data Tidy data Reshaping data Naming conventions Data intake Further resources Exercises Supplementary exercises 7. Iteration Vectorized operations Using across() with dplyr functions The map() family of functions Iterating over a one-dimensional vector Iteration over subgroups Simulation Extended example: Factors associated with BMI Further resources Exercises Supplementary exercises 8. Data Science Ethics Introduction Truthful falsehoods Role of data science in society Some settings for professional ethics Some principles to guide ethical action Algorithmic bias Data and disclosure Reproducibility Ethics, collectively Professional guidelines for ethical conduct Further resources Exercises Supplementary exercises II Part II: Statistics and Modeling 9. Statistical foundations Samples and populations Sample statistics The bootstrap Outliers Statistical models: Explaining variation Confounding and accounting for other factors The perils of p-values Further resources Exercises Supplementary exercises 10. Predictive modeling Predictive modeling Simple classification models Evaluating models Extended example: Who has diabetes? Further resources Exercises Supplementary exercises 11. Supervised learning Non-regression classifiers Parameter tuning Example: Evaluation of income models redux Extended example: Who has diabetes this time? Regularization Further resources Exercises Supplementary exercises 12. Unsupervised learning Clustering Dimension reduction Further resources Exercises Supplementary exercises 13. Simulation Reasoning in reverse Extended example: Grouping cancers Randomizing functions Simulating variability Random networks Key principles of simulation Further resources Exercises Supplementary exercises III Part III: Topics in Data Science 14. Dynamic and customized data graphics Rich Web content using Djs and htmlwidgets Animation Flexdashboard Interactive Web apps with Shiny Customization of library(ggplot)ggplot graphics Extended example: Hot dog eating Further resources Exercises Supplementary exercises 15. Database querying using SQL From dplyr to SQL Flat-file databases The SQL universe The SQL data manipulation language Extended example: FiveThirtyEight flights SQL vs R Further resources Exercises Supplementary exercises 16. Database administration Constructing efficient SQL databases Changing SQL data Extended example: Building a database Scalability Further resources Exercises Supplementary exercises 17. Working with geospatial data Motivation: What’s so great about geospatial data? Spatial data structures Making maps Extended example: Congressional districts Effective maps: How (not) to lie Projecting polygons Playing well with others Further resources Exercises Supplementary exercises 18. Geospatial computations Geospatial operations Geospatial aggregation Geospatial joins Extended example: Trail elevations at MacLeish Further resources Exercises Supplementary exercises 19. Text as data Regular expressions using Macbeth Extended example: Analyzing textual data from arXivorg Ingesting text Further resources Exercises Supplementary exercises 20. Network science Introduction to network science Extended example: Six degrees of Kristen Stewart PageRank Extended example: men’s college basketball Further resources Exercises Supplementary exercises 21. Epilogue: Towards \"big data\" Notions of big data Tools for bigger data Alternatives to R Closing thoughts Further resources IV Part IV: Appendices A Packages used in this book The mdsr package Other packages Further resources B Introduction to R and RStudio Installation Learning R Fundamental structures and objects Add-ons: Packages Further resources Exercises Supplementary exercises C Algorithmic thinking Introduction Simple example Extended example: Law of large numbers Non-standard evaluation Debugging and defensive coding Further resources Exercises Supplementary exercises D Reproducible analysis and workflow Scriptable statistical computing Reproducible analysis with R Markdown Projects and version control Further resources Exercises Supplementary exercises E Regression modeling Multiple regression Inference for regression Assumptions underlying regression Logistic regression Further resources Exercises Supplementary exercises F Setting up a database server SQLite MySQL PostgreSQL Connecting to SQL
Trade Policy 买家须知
- 关于产品:
- ● 正版保障:本网站隶属于中国国际图书贸易集团公司,确保所有图书都是100%正版。
- ● 环保纸张:进口图书大多使用的都是环保轻型张,颜色偏黄,重量比较轻。
- ● 毛边版:即书翻页的地方,故意做成了参差不齐的样子,一般为精装版,更具收藏价值。
关于退换货:
- 由于预订产品的特殊性,采购订单正式发订后,买方不得无故取消全部或部分产品的订购。
- 由于进口图书的特殊性,发生以下情况的,请直接拒收货物,由快递返回:
- ● 外包装破损/发错货/少发货/图书外观破损/图书配件不全(例如:光盘等)
并请在工作日通过电话400-008-1110联系我们。
- 签收后,如发生以下情况,请在签收后的5个工作日内联系客服办理退换货:
- ● 缺页/错页/错印/脱线
关于发货时间:
- 一般情况下:
- ●【现货】 下单后48小时内由北京(库房)发出快递。
- ●【预订】【预售】下单后国外发货,到货时间预计5-8周左右,店铺默认中通快递,如需顺丰快递邮费到付。
- ● 需要开具发票的客户,发货时间可能在上述基础上再延后1-2个工作日(紧急发票需求,请联系010-68433105/3213);
- ● 如遇其他特殊原因,对发货时间有影响的,我们会第一时间在网站公告,敬请留意。
关于到货时间:
- 由于进口图书入境入库后,都是委托第三方快递发货,所以我们只能保证在规定时间内发出,但无法为您保证确切的到货时间。
- ● 主要城市一般2-4天
- ● 偏远地区一般4-7天
关于接听咨询电话的时间:
- 010-68433105/3213正常接听咨询电话的时间为:周一至周五上午8:30~下午5:00,周六、日及法定节假日休息,将无法接听来电,敬请谅解。
- 其它时间您也可以通过邮件联系我们:customer@readgo.cn,工作日会优先处理。
关于快递:
- ● 已付款订单:主要由中通、宅急送负责派送,订单进度查询请拨打010-68433105/3213。
本书暂无推荐