[爆卦]table r是什麼？優點缺點精華區懶人包

為什麼這篇table r鄉民發文收入到精華區：因為在table r這個討論話題中，有許多相關的文章在討論，這篇最有參考價值！作者celestialgod (攸藍)看板R_Language標題[心得] 資料整理套件介紹-第一章...

table r 在 Agnes Chee謝嫣薇 Instagram 的精選貼文

2020-05-09 16:06:32

Tabler the melted chocolate is just like doing a painting ....... #tabler #meltedchocolate #pastry #frenchpastrytechnique #pastrytechniques #foodisart...

作者celestialgod (攸藍)

看板R_Language

標題[心得] 資料整理套件介紹-第一章 data.table

時間Tue Jul 21 16:24:57 2015

data.table包含的東西很多

但是很多東西都可以被plyr, dplyr的function取代

所以data.table很多function，我都不太熟

這裡簡單介紹一下data.table

如果你想要了解更多，請自行去看manual

要了解data.table，我們可以先從package的description來看

"Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins,
fast add/modify/delete of columns by group using no copies at all, list
columns and a fast file reader (fread). Offers a natural and flexible syntax,
for faster development."

簡單翻譯一下，大資料(例如，記憶體中大小為100GB的資料)的在不創建複本下，根據

類別(group)變數進行快速整合、排列、合併、增加/修改/刪除行資料等動作。...

重點就在不創建複本，因為R修改data.frame時，會先複製一次再修改，

然後傳回複本，因此，會浪費不少記憶體，而且很容易拖累速度，因此，

data.table提供這方面更有效率的操作。

(這方面的速度比較可以參考#1LeXNCKV (R_Language) [分享] 資料數據處理修改)

1. data.table

這個函數基本上data.frame使用差不多，而且data.frame的參數都可以放進

像是很常用到的stringsAsFactors，只是data.table預設是FALSE，

這點跟data.frame不同，使用上需要注意，範例如下：

` R
t = data.table(a = LETTERS[1:3])
str(t)
# Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable:
# $ a: chr "A" "B" "C"
# - attr(*, ".internal.selfref")=<externalptr>
t2 = data.frame(a = LETTERS[1:3])
str(t2)
# 'data.frame': 3 obs. of 1 variable:
# $ a: Factor w/ 3 levels "A","B","C": 1 2 3
`

第二個差異是data.table不包含rownames，

在轉換data.frame到data.table時，要注意這點

下一章會提到把rowname轉成column的函數

附註一條：data.table都包含data.frame的class

可以用在data.frame的方法都可以在data.table上實現

但是data.table還多了一個引數 "key"，我對它的解讀是一種索引的概念

而透過索引的動作都會被加速。

key可以是一個變數，也可以是多個變數，這點看個人使用。

再來，就是data.table的'['，這部分跟data.frame不太一樣

所以需要特別說明，但是這部分，我自己也不是很熟悉，我只能大概講過

a. 我們很常在data.frame做取多行的動作，在data.table是不可行的，舉例：

` R
vars = data.frame(X = rnorm(3), Y = rnorm(3), Z = rnorm(3))
vars[,1:2]
# X Y
# 1 -0.5677575 2.1831285
# 2 -0.7161529 0.3714633
# 3 1.2665120 0.7837508

vars_dt = data.table(vars)
vars_dt[,1:2]
# [1] 1 2
`

但是你想這麼做，怎麼辦？加上with=FALSE就好了，或是用list包住column name

` R
vars_dt[,1:2,with=FALSE]
# X Y
# 1: -0.5677575 2.1831285
# 2: -0.7161529 0.3714633
# 3: 1.2665120 0.7837508

vars_dt[j=list(X, Y)]
# X Y
# 1: -0.5677575 2.1831285
# 2: -0.7161529 0.3714633
# 3: 1.2665120 0.7837508
`

剩下像是by, .SD, .SDcols等自行?data.table查看吧

data.table的部分就先說明到這，接下來，講一些相關的function

b. setkey: 改變key的值, setnames: 改變column name，但是一樣不製造複本

c. copy: 製造data.table的複本

d. setDF: 在不製作複本下，把data.table的class改為data.frame

舉例：

` R
DT = data.table(X = rnorm(3), Y = rnorm(3))
str(DT)
# Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
# $ X: num -1.3738 0.167 -0.0578
# $ Y: num 0.487 1.728 0.646
# - attr(*, ".internal.selfref")=<externalptr>

setDF(DT)
str(DT)
# 'data.frame': 3 obs. of 2 variables:
# $ X: num -1.3738 0.167 -0.0578
# $ Y: num 0.487 1.728 0.646

DT = data.table(X = rnorm(3), Y = rnorm(3))
tracemem(DT)
# [1] "<0000000006A1BE28>"
setDF(DT) # 沒有複製的動作

DF = data.frame(DT)
retracemem(DF, retracemem(DT))
# tracemem[<0000000006A1BE28> -> 0x00000000061ec928]:
## 記憶體位置就發生改變了，就複製了DT一次
`

這部分可能不太懂，不過沒關係，記住一點，要轉成data.frame用setDF就好

e. setDT: setDF的反向

f. duplicated, unique

duplicated提供一個跟data.table列數相等長度的邏輯值向量，

TRUE代表前面有一樣的列，FALSE代表沒有

unique則是留下沒有重複的列，舉例來說：

` R
set.seed(100)
DT = data.table(A = rbinom(5, 1, 0.5), B = rbinom(5, 1, 0.5))
# A B
# 1: 0 0
# 2: 0 1
# 3: 1 0
# 4: 0 1
# 5: 0 0

duplicated(DT)
# [1] FALSE FALSE FALSE TRUE TRUE

unique(DT)
# A B
# 1: 0 0
# 2: 0 1
# 3: 1 0

DT[!duplicated(DT)]
# A B
# 1: 0 0
# 2: 0 1
# 3: 1 0
`

不過unique還有更多功能，它可以選擇變數做unique，舉例來說：

` R
unique(DT, by = "A")
# A B
# 1: 0 0
# 2: 1 0

unique(DT, by = "B")
# A B
# 1: 0 0
# 2: 0 1
`
順便一提，dplyr的distinct，如果你input的class是data.table

它就是用unique做的

` R
library(dplyr)
distinct(DT)
# A B
# 1: 0 0
# 2: 0 1
# 3: 1 0
`
你如果想看distinct怎麼做，可以在R上面打dplyr:::distinct_.data.table

> dplyr:::distinct_.data.table
function (.data, ..., .dots)
{
dist <- distinct_vars(.data, ..., .dots = .dots)
if (length(dist$vars) == 0) {
unique(dist$data)
}
else {
unique(dist$data, by = dist$vars)
}
}

之後提到distinct，我們再來講distinct

其他相關function像是subset, setcolorder, setorder (setorderv)

對這三個function有興趣，再去看manual，不贅述

這三個對應到dplyr的filter, select, arrange，之後我們會再提到這些

g. transform: 改變column的屬性、值等，舉例來說：

` R
DT = data.table(a = 1:3, b = 2:4, c = LETTERS[1:3])
DT2 = copy(DT)
DT[, b := b**2]
DT2 %<>% transform(b = b**2)
all.equal(DT, DT2) # TRUE
DT %<>% transform(c = as.factor(c))
str(DT)
# Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
# $ a: int 1 2 3
# $ b: num 4 9 16
# $ c: Factor w/ 3 levels "A","B","C": 1 2 3
# - attr(*, ".internal.selfref")=<externalptr>
`

h. set: 用來變更特定column，某些列的值，舉個簡單的例子

` R
DT = data.table(a = 1:3, b = 2:4)
DT2 = copy(DT)
DT[, b := 1]
set(DT2,, "b", value = 1)
all.equal(DT, DT2) # TRUE
`
一般來說都用'['來做，但是你如果需要用到for再來完成，再用set

還有一個function是 J，這裡就不提了，一樣請洽manual

最後，還有一個operator，':='，它是用來擴增data.table的column，

同樣，也不創造複本，這樣可以更快的增加column

那如果刪除怎麼辦？還記得前面學過 DT[, list('X', 'Y')]，就用這個

再來，我們講一些data.table中其他function

2. fread

功能可以用來取代read.table, read.csv

它可以用多種separate去分割columns，然後讀入R

而且讀入速度比read.table, read.csv快很多

但是注意，不規則的檔案會讀入失敗

這裡提幾個參數：

a. sep: column跟column之間的分隔，如果是csv就是','，

如果是tab separated values就是'\t'

b. na.strings: 視作NA的字串，它可以是一個vector

c. stringsAsFactors：是否要把字串轉成factor，預設是否

d. colClasses：各行的classes，可以自行設定

我愛用fread還有一個原因，第一個input可以直接放我要讀的字串，

但是read.table需要經過其他的方式，有點麻煩(我懶得記，其實沒記過)

舉例來說

` R
text = "a b
1 2
3 4"
DT = fread(text)
setDF(DT) # 轉成data.frame，前面學過，還記得嗎？
DF = read.table(header = TRUE, text = text) # text format
DF2 = read.table(textConnection(text), header = TRUE) # file format

all.equal(DT, DF) # TRUE
all.equal(DT, DF2) # TRUE
`

fread很適合拿來讀大資料，所以有必要把table輸出成text

用文字方式處理時，讀入就變得很方便，可見 #1LegOjwB (R_Language)

還剩下 dcast.data.table, melt 跟 merge

它們會留到之後跟tidyr一起介紹

下一章重點會放在dplyr

補充：

key，我也不是很熟悉，也很少用，因此，我這裡介紹的很少

如果對key有興趣，可能需要自行研究

[關鍵字]: data.table, reshape2

--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 123.205.27.107
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1437467101.A.E6D.html

推 cywhale: great~ thanks for sharing~ 07/21 21:42

推 Edster: 語言果然是會成長的 07/21 22:40

推 dreler1: 推 07/22 09:26

※ 編輯: celestialgod (123.205.27.107), 07/22/2015 09:36:46

推 fifish89: 推推~ 07/22 10:09

[爆卦]table r是什麼？優點缺點精華區懶人包

為什麼這篇table r鄉民發文收入到精華區：因為在table r這個討論話題中，有許多相關的文章在討論，這篇最有參考價值！作者celestialgod (攸藍)看板R_Language標題[心得] 資料整理套件介紹-第一章...

table r 在 Agnes Chee謝嫣薇 Instagram 的精選貼文

你可能也想看看

搜尋相關網站

#1Chapter 7 基本統計函式| R 資料科學與統計 - Bookdown

#22 R 資料結構| 資料科學與R語言

#3table function - RDocumentation

#4R语言中的table()函数到底是什么类型，如何提取其中的结果？

#5R語言table()用法及代碼示例- 純淨天空

#6R 軟體套件介紹: data.table - 臺北醫學大學

#712. Two Way Tables — R Tutorial - Cyclismo.org

#8R語言學習筆記—— table 函式的應用- IT閱讀 - ITREAD01.COM

#9Introduction to data.table - CRAN

#10R语言学习-table()结果提取_wlt9037的博客

#11Table Function in R - Frequency table in R & cross table in R

#12data.table進階技巧及範例| R語言資料科學

#13How to Use the Table Function in R (With Examples)

#14How to Make Beautiful Tables in R - R for the Rest of Us

#15data.table R Package Cheat Sheet - DataCamp

#16Easily Create Presentation-Ready Display Tables • gt

#17Cross Tabulation and Table Creation - R

#18Extension of `data.frame` • data.table - GitLab

#19Create table from DataFrame in R - GeeksforGeeks

#20Frequencies & Crosstabs - Quick-R

#21data.table in R - The Complete Beginners Guide - Machine ...

#22R Tutorial: Data.Table Package - ProjectPro

#23How to Relabel Rows and Columns in an R Table - Displayr ...

#24How to convert a table to a data frame - Stack Overflow

#25gpa=read.table("D:gpa.txt",header=T) #將你的檔案讀入

#26R语言table()函数- 从前有座山，山上 - 博客园

#27[R] 使用data.table使處理資料變更快速（以unique為例）

#28Add a table into a Word document using R software ... - STHDA

#29Ontario Dashboard - Ontario COVID-19 Science Advisory Table

#30R : Data.Table Tutorial (with 50 Examples) - ListenData

#31R語言中使用read.table函數從文本文件和CSV文件中讀取數據

#32Scrape HTML Table using rvest | R-bloggers

#33prop.table函數 - 台部落

#34R 数据框 - 菜鸟教程

#35Chevaliers de la Table R - 公會簡介 - World of Warcraft

#36R Language Tutorial => Creating a data.table

#37Data Import | R Tutorial

#38Convert data.frame to data.table in R (Example) - Statistics ...

#39Parse an html table into a data frame — html_table • rvest

#40How Does the prop.table() Function Work - R-Lang

#41Create stylish tables in R using formattable - Little Miss Data

#42從資料表中刪除資料行- SQL Server

#43How to create tables in R with expandable rows | InfoWorld

#44R/看table or matrix特定欄位資料@ 不知不覺2014 - 痞客邦

#45懶癌必備-dplyr和data.table讓你的資料分析事半功倍 - 程式前沿

#4613 Relational data | R for Data Science

#47R包table1创建网页格式的描述性统计表Table 1 - 简书

#48dom - DataTables

#49Contingency table @ R language :: 隨意窩Xuite日誌

#50Table of Useful R commands

#51Castle Learning Chemistry Table R

#52How to Create a Two-Way Data Table with R - dummies

#53Long-term evaluation of non-submerged ITI implants. Part 1

#54Using R data.table to speed up my data science - Cyberhelp

#55Title:latex table in R - Amazon AWS

#56用R做hash table來對照rsid和allele names

#57Imaginarium Express - Railway Adventure Train Set with Table

#58一起幫忙解決難題，拯救IT 人的一天

#59Actuarial Tables | Internal Revenue Service

#60R语言中如何将table变为数据框听语音 - 百度经验

#61How To Make Frequency Table in R - Programming R Tutorials

#62R Script結果集寫入SQL Table | 史丹利好熱 - 點部落

#63R 數據導入讀取read.table函數詳解 - 壹讀

#64Recreate Publication-Quality, Interactive Tables in R using ...

#65Table R - Higher Degree By Research - Interdisciplinary Studies

#66Chapter 7 Single table dplyr functions | STAT 545

#67金門樸食à table, profile picture - Facebook

#68Chi-Square Analysis Using R - SPH

#69Reading Data Into R - Part 1 (read.table(), classes, and factors)

#70R語言-使用data.table物件Cross Join | ARON HACK 亞倫害的

#71超高性能数据处理包data.table | 粉丝日志

#72Tutorial: tbl_regression • gtsummary - Daniel D. Sjoberg

#73R Read.table Example - EndMemo

#74READ TXT in R Open txt FILE with read.table function - R Coder

#75交叉製表與表格建立(table) - 中華R軟體學會