This function performs Biclustering structure learning using the Infinite Relational Model (IRM) to automatically determine the optimal number of classes C and optimal number of fields F. It can be found in a single run of the analysis, but it takes a long computation time when the sample size S is large. This method incorporates the Chinese restaurant process and Gibbs sampling. In detail, See Section 7.8 in Shojima(2022).
Usage
Biclustering_IRM(
U,
Z = NULL,
w = NULL,
na = NULL,
gamma_c = 1,
gamma_f = 1,
max_iter = 100,
stable_limit = 5,
minSize = 20,
EM_limit = 20,
seed = 123,
verbose = TRUE
)Arguments
- U
U is either a data class of exametrika, or raw data. When raw data is given, it is converted to the exametrika class with the dataFormat function.
- Z
Z is a missing indicator matrix of the type matrix or data.frame
- w
w is item weight vector
- na
na argument specifies the numbers or characters to be treated as missing values.
- gamma_c
\(\gamma_C\) is the hyperparameter of the CRP and represents the attractiveness of a new Class. As \(\gamma_C\) increases, the student is more likely to be seated at a vacant class. The default is 1.
- gamma_f
\(\gamma_F\) is the hyperparameter of the CRP and represents the attractiveness of a new Field. The greater this value it more likely to be classified in the new field. The default is 1.
- max_iter
A maximum iteration number of IRM process. The default is 100.
- stable_limit
The IRM process exits the loop when the FRM stabilizes and no longer changes significantly. This option sets the maximum number of stable iterations, with a default of 5.
- minSize
A value used for readjusting the number of classes.If the size of each class is less than
minSize, the number of classes will be reduced. Note that this under limit of size is not used for either all correct or all incorrect class.- EM_limit
After IRM process, resizing the number of classes process will starts. This process using EM algorithm,
EM_limitis the maximum number of iteration with default of 20.- seed
Random seed for reproducibility. When a numeric value is provided,
set.seed(seed)is called before the Gibbs sampling begins, ensuring reproducible results. The default is123, which guarantees deterministic output. Set toNULLto disable seed setting and let the results depend on the current state of the random number generator.- verbose
verbose output Flag. default is TRUE
Value
- nobs
Sample size. The number of rows in the dataset.
- msg
A character string indicating the model type.
- testlength
Length of the test. The number of items included in the test.
- n_class
Optimal number of classes (new naming convention).
- n_field
Optimal number of fields (new naming convention).
- em_cycle
Number of EM algorithm iterations (new naming convention).
- Nclass
Optimal number of classes (deprecated, use n_class).
- Nfield
Optimal number of fields (deprecated, use n_field).
- EM_Cycle
Number of EM algorithm iterations (deprecated, use em_cycle).
- BRM
Bicluster Reference Matrix
- FRP
Field Reference Profile
- FRPIndex
Index of FFP includes the item location parameters B and Beta, the slope parameters A and Alpha, and the monotonicity indices C and Gamma.
- TRP
Test Reference Profile
- FMP
Field Membership Profile
- Students
Rank Membership Profile matrix.The s-th row vector of \(\hat{M}_R\), \(\hat{m}_R\), is the rank membership profile of Student s, namely the posterior probability distribution representing the student's belonging to the respective latent classes. It also includes the rank with the maximum estimated membership probability, as well as the rank-up odds and rank-down odds.
- LRD
Latent Rank Distribution. see also plot.exametrika
- LFD
Latent Field Distribution. see also plot.exametrika
- RMD
Rank Membership Distribution.
- TestFitIndices
Overall fit index for the test.See also TestFit
Examples
# \donttest{
# Fit a Biclustering model with automatic structure learning using IRM
# gamma_c and gamma_f are concentration parameters for the Chinese Restaurant Process
result <- Biclustering_IRM(J35S515, gamma_c = 1, gamma_f = 1, verbose = TRUE)
#> iter 1: match=0 nfld=15 ncls=30
#> iter 2: match=0 nfld=12 ncls=27
#> iter 3: match=1 nfld=12 ncls=24
#> iter 4: match=2 nfld=12 ncls=23
#> iter 5: match=3 nfld=12 ncls=23
#> iter 6: match=0 nfld=12 ncls=23
#> iter 7: match=1 nfld=12 ncls=23
#> iter 8: match=2 nfld=12 ncls=23
#> iter 9: match=3 nfld=12 ncls=21
#> iter 10: match=4 nfld=12 ncls=21
#> iter 11: match=5 nfld=12 ncls=21
#> Adjusting classes: BIC=-99592.5 ncls=21 (min size < 20)
#> Adjusting classes: BIC=-99980.4 ncls=20 (min size < 20)
#> Adjusting classes: BIC=-99959.7 ncls=19 (min size < 20)
#> Adjusting classes: BIC=-99988.3 ncls=18 (min size < 20)
#> Adjusting classes: BIC=-100001.3 ncls=17 (min size < 20)
# Display the Bicluster Reference Matrix (BRM) as a heatmap
# Shows the discovered clustering structure of items and students
plot(result, type = "Array")
# Plot Field Reference Profiles (FRP) in a 3-column grid
# Shows the probability patterns for each automatically determined field
plot(result, type = "FRP", nc = 3)
# Plot Test Reference Profile (TRP)
# Shows the overall response pattern across all fields
plot(result, type = "TRP")
# }
