2 データ準備

2.1 データ形式の概要

exametrikaは以下の4種類のデータ型に対応しています。

データ型	`response.type`	説明	例
二値	`binary`	正答(1)/誤答(0)	テストの採点結果
名義	`nominal`	順序性のないカテゴリ	複数選択肢（順序なし）
順序	`ordinal`	順序のあるカテゴリ	リッカート法（3件法, 5件法等）
多肢選択	`rated`	名義データ + 正答指定	多肢選択式テスト

2.2 dataFormat関数

dataFormat関数はデータをexametrikaの内部形式に変換します。各分析関数の内部でも自動的に呼ばれますが，データの構造を確認するために明示的に実行することもできます。

2.2.1 二値データの例

サンプルデータで見てみましょう。

dat <- dataFormat(J15S500)

変換されたオブジェクトには以下の情報が含まれます。

$U：データ行列（二値データの場合は正答/誤答）
$Z：欠測指示子行列（1 = 観測あり, 0 = 欠測）
$ItemLabel：項目ラベル
$ID：受検者ID
$response.type：データ型

dat$U |> head()

     Item01 Item02 Item03 Item04 Item05 Item06 Item07 Item08 Item09 Item10
[1,]      0      1      1      0      1      1      0      0      0      1
[2,]      1      1      1      1      1      1      0      1      0      1
[3,]      1      1      1      1      1      1      0      0      0      1
[4,]      1      1      1      1      1      1      1      1      0      0
[5,]      1      1      0      1      1      0      0      0      0      1
[6,]      1      1      1      1      1      1      1      1      0      0
     Item11 Item12 Item13 Item14 Item15
[1,]      1      0      1      0      1
[2,]      0      0      1      0      1
[3,]      0      0      1      1      1
[4,]      0      1      1      1      0
[5,]      0      0      0      1      0
[6,]      1      0      1      1      1

dat$Z |> head()

     Item01 Item02 Item03 Item04 Item05 Item06 Item07 Item08 Item09 Item10
[1,]      1      1      1      1      1      1      1      1      1      1
[2,]      1      1      1      1      1      1      1      1      1      1
[3,]      1      1      1      1      1      1      1      1      1      1
[4,]      1      1      1      1      1      1      1      1      1      1
[5,]      1      1      1      1      1      1      1      1      1      1
[6,]      1      1      1      1      1      1      1      1      1      1
     Item11 Item12 Item13 Item14 Item15
[1,]      1      1      1      1      1
[2,]      1      1      1      1      1
[3,]      1      1      1      1      1
[4,]      1      1      1      1      1
[5,]      1      1      1      1      1
[6,]      1      1      1      1      1

dat$response.type

[1] "binary"

2.2.2 欠測値の指定

欠測値がNA以外のコード（例: -99, 99）で表されている場合は，na引数で指定します。

data_with_na <- data.frame(
  item1 = c(0, 1, 1, 0, 1, 1, 1, 0, 0, 1),
  item2 = c(1, 0, 1, 0, 99, 99, 1, 1, 99, 0),
  id = paste0("student", 1:10)
)
result <- dataFormat(data_with_na, id = 3, na = 99)
result

Response Type: binary 
Binary Response Pattern
      item1 item2
 [1,]     0     1
 [2,]     1     0
 [3,]     1     1
 [4,]     0     0
 [5,]     1    -1
 [6,]     1    -1
 [7,]     1     1
 [8,]     0     1
 [9,]     0    -1
[10,]     1     0

Missing Pattern
      item1 item2
 [1,]     1     1
 [2,]     1     1
 [3,]     1     1
 [4,]     1     1
 [5,]     1     0
 [6,]     1     0
 [7,]     1     1
 [8,]     1     1
 [9,]     1     0
[10,]     1     1

Weight
[1] 1 1

2.2.3 ID列の指定

ID列は基本的に第1列目が参照されますが，id引数で列番号を指定することもできます。

result$ID

 [1] "student1"  "student2"  "student3"  "student4"  "student5"  "student6" 
 [7] "student7"  "student8"  "student9"  "student10"

result$U

      item1 item2
 [1,]     0     1
 [2,]     1     0
 [3,]     1     1
 [4,]     0     0
 [5,]     1    -1
 [6,]     1    -1
 [7,]     1     1
 [8,]     0     1
 [9,]     0    -1
[10,]     1     0

2.2.4 順序データの例

サンプルデータJ15S3810は4件法の順序データです。順序データの場合，データ行列は$Qに格納されます。

dat_ord <- J15S3810
dat_ord$response.type

[1] "ordinal"

dat_ord$Q |> head()

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
[1,]  2  2  3  0  3  1 -1 -1 -1  -1  -1  -1   1   2   2
[2,]  2  1  2 -1 -1 -1  2  1  3  -1  -1  -1   2   1   2
[3,]  3  2  2  2  3  3 -1 -1 -1  -1  -1  -1   3   3   2
[4,]  2  0  3 -1 -1 -1 -1 -1 -1   2   2   2   2   2   1
[5,]  2  2  1 -1 -1 -1  3  1  1  -1  -1  -1   2   1   3
[6,]  1  2  0  0  0  0 -1 -1 -1  -1  -1  -1   1   1   0

2.2.5 多肢選択（rated）データの例

サンプルデータJ35S5000は正答指定付きの多肢選択データです。$CAに正答反応が格納され，正誤判定された結果が$Uに入ります。

dat_rated <- J35S5000
dat_rated$response.type

[1] "rated"

dat_rated$Q |> head(3)

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
[1,]  3  4 -1  5  3  5  3  2  3   1   3   1   2   5   4   5  -1   2   2   3   2
[2,]  1  4  4  3  3  5  3  2  2   3   3   6   1   4   3   5   1   2   2   4   4
[3,]  3  3  3  3  3  5  3  6  4   1   3   1   1   4   4   4   3   2   2   3   1
     V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35
[1,]   1   4   3   1   3   3   3   3   4   3   3   6   4   4
[2,]   2   3   2   3   1   5   5   2   4   3   1   4   2   3
[3,]   3   3   4   1   4   3   2   3   4   3   4   1   3   3

dat_rated$CA

 [1] 3 3 3 3 2 5 3 4 4 4 3 6 2 4 4 5 1 2 3 1 1 1 3 4 3 4 3 2 2 4 3 4 5 2 1

dat_rated$U |> head(3)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    1    0    0    0    0    1    1    0    0     0     1     0     1     0
[2,]    0    0    0    1    0    1    1    0    0     0     1     1     0     1
[3,]    1    1    1    1    0    1    1    0    1     0     1     0     0     1
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,]     1     1     0     1     0     0     0     1     0     0     0     0
[2,]     0     1     1     1     0     0     0     0     1     0     1     0
[3,]     1     0     0     1     0     0     1     0     1     1     0     1
     [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
[1,]     1     0     0     1     1     0     0     0     0
[2,]     0     0     1     1     1     0     0     1     0
[3,]     1     1     0     1     1     1     0     0     0

2.2.6 自分の多値データを変換する

response.type引数でデータ型を明示的に指定できます。

# 順序データとして指定
result_ord <- dataFormat(ltm::Science[1:20, ], response.type = "ordinal")
result_ord

Response Type: ordinal 
Polytomous Response Pattern ( ordinal )
      Comfort Environment Work Future Technology Industry Benefit
 [1,]       4           4    4      3          4        3       2
 [2,]       3           4    3      3          3        3       3
 [3,]       3           2    2      2          4        4       3
 [4,]       3           3    2      2          4        4       3
 [5,]       3           1    4      4          2        3       1
 [6,]       4           3    4      3          3        4       3
 [7,]       3           2    2      3          4        4       4
 [8,]       3           2    2      3          3        4       4
 [9,]       3           3    3      4          4        4       2
[10,]       4           3    3      3          3        3       3
[11,]       3           3    3      4          2        3       3
[12,]       3           3    1      2          2        4       3
[13,]       3           2    3      3          4        3       3
[14,]       3           2    3      3          2        2       2
[15,]       3           4    2      3          4        3       2
[16,]       3           4    3      3          4        4       3
[17,]       3           3    1      2          3        2       2
[18,]       3           3    3      3          2        3       3
[19,]       3           2    2      3          3        3       2
[20,]       3           3    3      3          2        2       3

Missing Pattern
      Comfort Environment Work Future Technology Industry Benefit
 [1,]       1           1    1      1          1        1       1
 [2,]       1           1    1      1          1        1       1
 [3,]       1           1    1      1          1        1       1
 [4,]       1           1    1      1          1        1       1
 [5,]       1           1    1      1          1        1       1
 [6,]       1           1    1      1          1        1       1
 [7,]       1           1    1      1          1        1       1
 [8,]       1           1    1      1          1        1       1
 [9,]       1           1    1      1          1        1       1
[10,]       1           1    1      1          1        1       1
[11,]       1           1    1      1          1        1       1
[12,]       1           1    1      1          1        1       1
[13,]       1           1    1      1          1        1       1
[14,]       1           1    1      1          1        1       1
[15,]       1           1    1      1          1        1       1
[16,]       1           1    1      1          1        1       1
[17,]       1           1    1      1          1        1       1
[18,]       1           1    1      1          1        1       1
[19,]       1           1    1      1          1        1       1
[20,]       1           1    1      1          1        1       1

Weight
[1] 1 1 1 1 1 1 1

result_ord$CategoryLabel[1]

$Comfort
[1] "strongly disagree" "disagree"          "agree"            
[4] "strongly agree"

正答がある場合はCA引数で指定します。

data_ca <- data.frame(
  id = 1:10,
  item1 = c(1, NA, 2, 4, 1, 3, 5, 2, 2, 1),
  item2 = c(1, 3, 2, 4, 5, 1, 1, 2, 3, 5)
)
result_ca <- dataFormat(data_ca, CA = c(4, 2))
result_ca

Response Type: rated 
Polytomous Response Pattern ( rated )
      item1 item2
 [1,]     1     1
 [2,]    -1     3
 [3,]     2     2
 [4,]     4     4
 [5,]     1     5
 [6,]     3     1
 [7,]     5     1
 [8,]     2     2
 [9,]     2     3
[10,]     1     5

Correct Answers
[1] 4 2

Missing Pattern
      item1 item2
 [1,]     1     1
 [2,]     0     1
 [3,]     1     1
 [4,]     1     1
 [5,]     1     1
 [6,]     1     1
 [7,]     1     1
 [8,]     1     1
 [9,]     1     1
[10,]     1     1

Weight
[1] 1 1

result_ca$U

      [,1] [,2]
 [1,]    0    0
 [2,]   -1    0
 [3,]    0    1
 [4,]    1    0
 [5,]    0    0
 [6,]    0    0
 [7,]    0    0
 [8,]    0    1
 [9,]    0    0
[10,]    0    0

2.3 longdataFormat関数

ロング形式（1行1反応）のデータをexametrika形式に変換する関数です。Sid（受検者ID列），Qid（項目ID列），Resp（反応列）を列番号または列名で指定します。

# ロング形式のデータ例
long_data <- data.frame(
  ID = rep(paste0("S", 1:5), each = 3),
  Item = rep(paste0("Item", 1:3), times = 5),
  Response = c(1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0)
)
long_data

   ID  Item Response
1  S1 Item1        1
2  S1 Item2        0
3  S1 Item3        1
4  S2 Item1        0
5  S2 Item2        1
6  S2 Item3        1
7  S3 Item1        1
8  S3 Item2        1
9  S3 Item3        0
10 S4 Item1        0
11 S4 Item2        0
12 S4 Item3        1
13 S5 Item1        1
14 S5 Item2        0
15 S5 Item3        0

result_long <- longdataFormat(long_data, Sid = "ID", Qid = "Item", Resp = "Response")
result_long

2.4 CSVファイルからの読み込み

CSVファイルからデータを読み込んでdataFormatで変換する例です。

rawData <- read.csv("your_data.csv")
dat <- dataFormat(rawData, na = -99)

2.5 サンプルデータ一覧

exametrikaパッケージには以下のサンプルデータが含まれています。命名規則はJxxSxxx（J: 項目数, S: 受検者数）です。

データセット	項目数	受検者数	種別	主な用途
`J5S10`	5	10	二値	クイックテスト，BNM
`J5S1000`	5	1,000	順序	GRM
`J12S5000`	12	5,000	二値	LDLRA
`J15S500`	15	500	二値	IRT, LCA, LRA
`J15S3810`	15	3,810	順序（4件法）	順序LRA
`J20S400`	20	400	二値	BNM
`J20S600`	20	600	名義（4カテゴリ）	名義Biclustering
`J35S500`	35	500	順序（5件法）	順序Biclustering
`J35S515`	35	515	二値	Biclustering, Network
`J35S5000`	35	5,000	多肢選択	名義LRA
`J50S100`	50	100	二値（シミュレーション）	大規模テスト

各データセットはパッケージを読み込むだけで利用可能です。

# サンプルデータの利用例
dat <- J15S500
dat$U |> head()

     Item01 Item02 Item03 Item04 Item05 Item06 Item07 Item08 Item09 Item10
[1,]      0      1      1      0      1      1      0      0      0      1
[2,]      1      1      1      1      1      1      0      1      0      1
[3,]      1      1      1      1      1      1      0      0      0      1
[4,]      1      1      1      1      1      1      1      1      0      0
[5,]      1      1      0      1      1      0      0      0      0      1
[6,]      1      1      1      1      1      1      1      1      0      0
     Item11 Item12 Item13 Item14 Item15
[1,]      1      0      1      0      1
[2,]      0      0      1      0      1
[3,]      0      0      1      1      1
[4,]      0      1      1      1      0
[5,]      0      0      0      1      0
[6,]      1      0      1      1      1