outputs given dataset in a CEFA-friendly format, for factor analysis outdata = ensemble_export_cefa(indata,defs) This function currently takes 'indata' (in the format of either: an N x M data matrix, N observations and M variables; or an ensemble data struct), optionally calculates correlations between the variables, and writes either the correlational data, the raw data, or both as well as header information, to a text file in a format that CEFA (the Comprehensive Exploratory Factor Analysis program, Browne & Cudeck) can easily import. NOTE: if your indata is in ensemble data struct format, please use defs.var_idxs to identify those numerical vars that you wish to output (to exclude the non-numerical vars). If indata format is ensemble data struct, it will skip all rows that contain non-numerical characters in a column value. % % % % CEFA FORMAT nobs nvars data_format 0 var_names (var names) confirmatory_structure (confirmatory structure) covariance_structure (covariance structure) data % % % % END CEFA FORMAT FEATURE REQUESTS: - accept ensemble data structures and other formats - write out variable formats (polychor) - specify var names in 'defs' - specify a confirmatory factor structure - deal with non-num values in ensemble data struct indata formats (maybe automatically identify non-num columns?) REQUIRES indata - either an N x M data matrix (N obs and M vars) or an ensemble data struct defs.init_fid defs.init_fid.write2file defs.init_fid.print defs.init_fid.fname defs.init_fid.filemode defs.var_idxs - the numerical indices of columns (or vars) of indata that you would like to export to CEFA format. Default: 1:size(indata,2) or 1:length(indata.vars), depending on indata format defs.rev_score_idxs (optional) - a logical vector that indicates which vars are negatively worded items. If this parameter is specified, then the given variables will be reverse-scored. See: defs.rev_score_max. NOTE: rev_score_idxs refer to the variables that are present AFTER indata are masked by defs.var_idxs. THEREFORE, if your indata has 100 variables, and you specify def.var_idxs = [10 20 30 40 50], to specify the fact that the original variables 20 and 50 need to be reverse scored, you would set defs.rev_score_idxs = [2 5], NOT [20 50]. defs.rev_score_max (optional) - if defs.rev_score_idxs is specified, then for each index in that parameter, the matching value in rev_score_max, +1, will be subtracted from the given variable when reversing that variable's score. If no rev_score_max is specified, but rev_score_idxs is specified, then 1+max(var_values) will be subtracted from the given values. defs.parcel_idxs (optional) - cell array of vectors containing indices of variables to include in each parcel. If parcel_idxs{1} = [1 3 5], then the first, third and fifth variables (after taking into account defs.var_idxs) will be added together to construct parcel 1. if parcel_idxs{2} = [2 4 6] then the second, fourth, and sixth variables (After taking into account defs.vars_idxs) will be added together to construct parcel 2. defs.var_names - cell array of strings identifying output variables. If your input data format is ensemble data struct, and you do not specify defs.var_names, it will be set to indata.vars{var_idxs}. If your input data format is a numerical data matrix, var_names will be set to sprintf('var%d',iv) for iv=var_idxs. If you specify defs.parcel_idxs, var_names will be applied to parcels. defs.output_format {'raw','corr'} default: raw RETURNS outdata - contents of the CEFA import file FB 2010.03.30 FB 2010.05.25 - added support for raw output format (now the default), both numerical matrix and ensemble data struct input, reverse scoring, parcel creation, variable subsets, and variable naming
0001 function outdata = ensemble_export_cefa(indata,defs) 0002 0003 % outputs given dataset in a CEFA-friendly format, for factor analysis 0004 % 0005 % outdata = ensemble_export_cefa(indata,defs) 0006 % 0007 % This function currently takes 'indata' (in the format of either: an N x M 0008 % data matrix, N observations and M variables; or an ensemble data struct), 0009 % optionally calculates correlations between the variables, and writes 0010 % either the correlational data, the raw data, or both as well as header 0011 % information, to a text file in a format that CEFA (the Comprehensive 0012 % Exploratory Factor Analysis program, Browne & Cudeck) can easily import. 0013 % 0014 % NOTE: if your indata is in ensemble data struct format, please use 0015 % defs.var_idxs to identify those numerical vars that you wish to output 0016 % (to exclude the non-numerical vars). If indata format is ensemble data 0017 % struct, it will skip all rows that contain non-numerical characters in a 0018 % column value. 0019 % 0020 % % % % % CEFA FORMAT 0021 % 0022 % nobs nvars 0023 % data_format 0024 % 0 0025 % 0026 % var_names 0027 % (var names) 0028 % 0029 % confirmatory_structure 0030 % (confirmatory structure) 0031 % 0032 % covariance_structure 0033 % (covariance structure) 0034 % 0035 % data 0036 % 0037 % % % % % END CEFA FORMAT 0038 % 0039 % FEATURE REQUESTS: 0040 % - accept ensemble data structures and other formats 0041 % - write out variable formats (polychor) 0042 % - specify var names in 'defs' 0043 % - specify a confirmatory factor structure 0044 % - deal with non-num values in ensemble data struct indata formats 0045 % (maybe automatically identify non-num columns?) 0046 % 0047 % REQUIRES 0048 % indata - either an N x M data matrix (N obs and M vars) or an ensemble 0049 % data struct 0050 % defs.init_fid 0051 % defs.init_fid.write2file 0052 % defs.init_fid.print 0053 % defs.init_fid.fname 0054 % defs.init_fid.filemode 0055 % defs.var_idxs - the numerical indices of columns (or vars) of indata 0056 % that you would like to export to CEFA format. Default: 0057 % 1:size(indata,2) or 1:length(indata.vars), depending on indata 0058 % format 0059 % defs.rev_score_idxs (optional) - a logical vector that indicates which 0060 % vars are negatively worded items. If this parameter is specified, 0061 % then the given variables will be reverse-scored. See: 0062 % defs.rev_score_max. NOTE: rev_score_idxs refer to the variables 0063 % that are present AFTER indata are masked by defs.var_idxs. 0064 % THEREFORE, if your indata has 100 variables, and you specify 0065 % def.var_idxs = [10 20 30 40 50], to specify the fact that the 0066 % original variables 20 and 50 need to be reverse scored, you would 0067 % set defs.rev_score_idxs = [2 5], NOT [20 50]. 0068 % defs.rev_score_max (optional) - if defs.rev_score_idxs is specified, 0069 % then for each index in that parameter, the matching value in 0070 % rev_score_max, +1, will be subtracted from the given variable when 0071 % reversing that variable's score. If no rev_score_max is specified, 0072 % but rev_score_idxs is specified, then 1+max(var_values) will be 0073 % subtracted from the given values. 0074 % defs.parcel_idxs (optional) - cell array of vectors containing indices 0075 % of variables to include in each parcel. If parcel_idxs{1} = [1 3 0076 % 5], then the first, third and fifth variables (after taking into 0077 % account defs.var_idxs) will be added together to construct parcel 0078 % 1. if parcel_idxs{2} = [2 4 6] then the second, fourth, and sixth 0079 % variables (After taking into account defs.vars_idxs) will be added 0080 % together to construct parcel 2. 0081 % defs.var_names - cell array of strings identifying output variables. If 0082 % your input data format is ensemble data struct, and you do not 0083 % specify defs.var_names, it will be set to indata.vars{var_idxs}. If 0084 % your input data format is a numerical data matrix, var_names will 0085 % be set to sprintf('var%d',iv) for iv=var_idxs. If you specify 0086 % defs.parcel_idxs, var_names will be applied to parcels. 0087 % defs.output_format {'raw','corr'} default: raw 0088 % 0089 % RETURNS 0090 % outdata - contents of the CEFA import file 0091 % 0092 % FB 2010.03.30 0093 % FB 2010.05.25 - added support for raw output format (now the default), 0094 % both numerical matrix and ensemble data struct input, reverse scoring, 0095 % parcel creation, variable subsets, and variable naming 0096 0097 %% init vars 0098 outdata = []; 0099 0100 % init output file 0101 fid = ensemble_init_fid(defs.init_fid); 0102 0103 % get var_idxs 0104 if isfield(defs,'var_idxs'), var_idxs = defs.var_idxs; end 0105 if isfield(defs,'var_names'), var_names = defs.var_names; end 0106 0107 % get data 0108 if isstruct(indata) && isfield(indata,'data') 0109 if ~exist('var_idxs','var') 0110 var_idxs = 1:length(indata.vars); 0111 end 0112 data = [indata.data{var_idxs}]; 0113 if ~exist('var_names','var') 0114 var_names = {indata.vars{var_idxs}}; 0115 end 0116 elseif isnumeric(indata) 0117 if ~exist('var_idxs','var') 0118 var_idxs = 1:size(indata,2); 0119 end 0120 data = indata(:,var_idxs); 0121 if ~exist('var_names','var') 0122 var_names = cell(1,length(var_idxs)); 0123 for iv=1:length(var_idxs) 0124 var_names{iv} = sprint('var%d',var_idxs(iv)); 0125 end 0126 end 0127 else 0128 error('unknown indata format\n'); 0129 end 0130 0131 % describe data 0132 nobs = size(data,1); 0133 nvar = size(data,2); 0134 if isfield(defs,'output_format') 0135 fmt = defs.output_format; 0136 else 0137 fmt = 'raw'; 0138 end 0139 0140 % set data_type, 1=corr, 2=raw, 3=?, 4=? 0141 switch fmt 0142 case 'corr' 0143 data_type = 1; 0144 case 'raw' 0145 data_type = 2; 0146 otherwise 0147 error('unknown output format: %s\n',fmt); 0148 end 0149 0150 %% reverse score? 0151 if isfield(defs,'rev_score_idxs') 0152 rsis = defs.rev_score_idxs; 0153 nrsi = length(rsis); 0154 if isfield(defs,'rev_score_max') 0155 rsms = defs.rev_score_max; 0156 else 0157 rsms = []; 0158 end 0159 nrsm = length(rsms); 0160 for rsi=1:nrsi 0161 ridx = rsis(rsi); 0162 if rsi <= nrsm 0163 data(:,ridx) = (ones(nobs,1)*(rsms(rsi)+1)) - data(:,ridx); 0164 else 0165 data(:,ridx) = (ones(nobs,1)*(max(data(:,ridx))+1)) - data(:,ridx); 0166 end 0167 end 0168 end % if isfield(defs,'rev_score_idxs 0169 0170 %% create parcels? 0171 if isfield(defs,'parcel_idxs') 0172 npidx = length(defs.parcel_idxs); 0173 parcels = zeros(nobs,npidx); 0174 for k=1:npidx 0175 pidx = defs.parcel_idxs{k}; 0176 parcels(:,k) = sum(data(:,pidx),2); 0177 end % for k=1:npidx 0178 data = parcels; 0179 nvar = npidx; 0180 end % isfield(defs,'parcel_idxs 0181 0182 %% output headers 0183 0184 % write header info to file 0185 fprintf(fid,'%d %d\n%d\n0\n\n',nobs,nvar,data_type); 0186 0187 % write variable names 0188 if isempty(var_names) 0189 fprintf(fid,'0\n\n'); 0190 else 0191 fprintf(fid,'1\n'); 0192 fprintf(fid,'%s\n\n',cell2str(var_names,' ')); 0193 end 0194 0195 % confirmatory factor structure 0196 % % % currently not supported 0197 fprintf(fid,'0\n0\n\n') 0198 0199 %% write out data 0200 0201 % data format? 0202 switch fmt 0203 case 'raw' 0204 % write out data 0205 for k=1:nobs 0206 fprintf(fid,'%f ',data(k,:)); 0207 fprintf(fid,'\n'); 0208 end 0209 case 'corr' 0210 % calculate correlation 0211 r = corrcoef(data); 0212 0213 % write out data 0214 for j=1:nvar 0215 for k=1:j 0216 fprintf(fid,'%1.2f',r(j,k)); 0217 if k<j, fprintf(fid,' '); end 0218 end 0219 fprintf(fid,'\n'); 0220 end 0221 end 0222 0223 0224 fclose(fid); 0225 0226 %% save outdata 0227 outdata = ensemble_init_data_struct(); 0228 outdata.type = 'cefa_out'; 0229 if isfield(defs,'outDataName') 0230 outdata.name = defs.outDataName; 0231 else 0232 outdata.name = outdata.type; 0233 end 0234 outdata.vars = var_names; 0235 for k=1:length(outdata.vars) 0236 outdata.data{k} = data(:,k); 0237 end 0238 0239 fprintf(1,'ensemble_export_cefa: DONE!\n'); 0240