Home > database > ensemble_export_cefa.m

ensemble_export_cefa

PURPOSE ^

outputs given dataset in a CEFA-friendly format, for factor analysis

SYNOPSIS ^

function outdata = ensemble_export_cefa(indata,defs)

DESCRIPTION ^

 outputs given dataset in a CEFA-friendly format, for factor analysis
 
   outdata = ensemble_export_cefa(indata,defs)
 
 This function currently takes 'indata' (in the format of either: an N x M
 data matrix, N observations and M variables; or an ensemble data struct),
 optionally calculates correlations between the variables, and writes
 either the correlational data, the raw data, or both as well as header
 information, to a text file in a format that CEFA (the Comprehensive
 Exploratory Factor Analysis program, Browne & Cudeck) can easily import.
 
 NOTE: if your indata is in ensemble data struct format, please use
 defs.var_idxs to identify those numerical vars that you wish to output
 (to exclude the non-numerical vars). If indata format is ensemble data
 struct, it will skip all rows that contain non-numerical characters in a
 column value.
 
 % % % % CEFA FORMAT
 
 nobs nvars
 data_format
 0
 
 var_names
 (var names)
 
 confirmatory_structure
 (confirmatory structure)
 
 covariance_structure
 (covariance structure)
 
 data
 
 % % % % END CEFA FORMAT
 
 FEATURE REQUESTS:
   - accept ensemble data structures and other formats
   - write out variable formats (polychor)
   - specify var names in 'defs'
   - specify a confirmatory factor structure
   - deal with non-num values in ensemble data struct indata formats
       (maybe automatically identify non-num columns?)
 
 REQUIRES
   indata - either an N x M data matrix (N obs and M vars) or an ensemble
       data struct
   defs.init_fid
       defs.init_fid.write2file
       defs.init_fid.print
       defs.init_fid.fname
       defs.init_fid.filemode
   defs.var_idxs - the numerical indices of columns (or vars) of indata
       that you would like to export to CEFA format. Default:
       1:size(indata,2) or 1:length(indata.vars), depending on indata
       format
   defs.rev_score_idxs (optional) - a logical vector that indicates which
       vars are negatively worded items. If this parameter is specified,
       then the given variables will be reverse-scored. See:
       defs.rev_score_max. NOTE: rev_score_idxs refer to the variables
       that are present AFTER indata are masked by defs.var_idxs.
       THEREFORE, if your indata has 100 variables, and you specify
       def.var_idxs = [10 20 30 40 50], to specify the fact that the
       original variables 20 and 50 need to be reverse scored, you would
       set defs.rev_score_idxs = [2 5], NOT [20 50].
   defs.rev_score_max (optional) - if defs.rev_score_idxs is specified,
       then for each index in that parameter, the matching value in
       rev_score_max, +1, will be subtracted from the given variable when
       reversing that variable's score. If no rev_score_max is specified,
       but rev_score_idxs is specified, then 1+max(var_values) will be
       subtracted from the given values.
   defs.parcel_idxs (optional) - cell array of vectors containing indices
       of variables to include in each parcel. If parcel_idxs{1} = [1 3
       5], then the first, third and fifth variables (after taking into
       account defs.var_idxs) will be added together to construct parcel
       1. if parcel_idxs{2} = [2 4 6] then the second, fourth, and sixth
       variables (After taking into account defs.vars_idxs) will be added
       together to construct parcel 2.
   defs.var_names - cell array of strings identifying output variables. If
       your input data format is ensemble data struct, and you do not
       specify defs.var_names, it will be set to indata.vars{var_idxs}. If
       your input data format is a numerical data matrix, var_names will
       be set to sprintf('var%d',iv) for iv=var_idxs. If you specify
       defs.parcel_idxs, var_names will be applied to  parcels.
   defs.output_format {'raw','corr'} default: raw
       
 RETURNS
   outdata - contents of the CEFA import file
 
 FB 2010.03.30
 FB 2010.05.25 - added support for raw output format (now the default),
   both numerical matrix and ensemble data struct input, reverse scoring,
   parcel creation, variable subsets, and variable naming

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function outdata = ensemble_export_cefa(indata,defs)
0002 
0003 % outputs given dataset in a CEFA-friendly format, for factor analysis
0004 %
0005 %   outdata = ensemble_export_cefa(indata,defs)
0006 %
0007 % This function currently takes 'indata' (in the format of either: an N x M
0008 % data matrix, N observations and M variables; or an ensemble data struct),
0009 % optionally calculates correlations between the variables, and writes
0010 % either the correlational data, the raw data, or both as well as header
0011 % information, to a text file in a format that CEFA (the Comprehensive
0012 % Exploratory Factor Analysis program, Browne & Cudeck) can easily import.
0013 %
0014 % NOTE: if your indata is in ensemble data struct format, please use
0015 % defs.var_idxs to identify those numerical vars that you wish to output
0016 % (to exclude the non-numerical vars). If indata format is ensemble data
0017 % struct, it will skip all rows that contain non-numerical characters in a
0018 % column value.
0019 %
0020 % % % % % CEFA FORMAT
0021 %
0022 % nobs nvars
0023 % data_format
0024 % 0
0025 %
0026 % var_names
0027 % (var names)
0028 %
0029 % confirmatory_structure
0030 % (confirmatory structure)
0031 %
0032 % covariance_structure
0033 % (covariance structure)
0034 %
0035 % data
0036 %
0037 % % % % % END CEFA FORMAT
0038 %
0039 % FEATURE REQUESTS:
0040 %   - accept ensemble data structures and other formats
0041 %   - write out variable formats (polychor)
0042 %   - specify var names in 'defs'
0043 %   - specify a confirmatory factor structure
0044 %   - deal with non-num values in ensemble data struct indata formats
0045 %       (maybe automatically identify non-num columns?)
0046 %
0047 % REQUIRES
0048 %   indata - either an N x M data matrix (N obs and M vars) or an ensemble
0049 %       data struct
0050 %   defs.init_fid
0051 %       defs.init_fid.write2file
0052 %       defs.init_fid.print
0053 %       defs.init_fid.fname
0054 %       defs.init_fid.filemode
0055 %   defs.var_idxs - the numerical indices of columns (or vars) of indata
0056 %       that you would like to export to CEFA format. Default:
0057 %       1:size(indata,2) or 1:length(indata.vars), depending on indata
0058 %       format
0059 %   defs.rev_score_idxs (optional) - a logical vector that indicates which
0060 %       vars are negatively worded items. If this parameter is specified,
0061 %       then the given variables will be reverse-scored. See:
0062 %       defs.rev_score_max. NOTE: rev_score_idxs refer to the variables
0063 %       that are present AFTER indata are masked by defs.var_idxs.
0064 %       THEREFORE, if your indata has 100 variables, and you specify
0065 %       def.var_idxs = [10 20 30 40 50], to specify the fact that the
0066 %       original variables 20 and 50 need to be reverse scored, you would
0067 %       set defs.rev_score_idxs = [2 5], NOT [20 50].
0068 %   defs.rev_score_max (optional) - if defs.rev_score_idxs is specified,
0069 %       then for each index in that parameter, the matching value in
0070 %       rev_score_max, +1, will be subtracted from the given variable when
0071 %       reversing that variable's score. If no rev_score_max is specified,
0072 %       but rev_score_idxs is specified, then 1+max(var_values) will be
0073 %       subtracted from the given values.
0074 %   defs.parcel_idxs (optional) - cell array of vectors containing indices
0075 %       of variables to include in each parcel. If parcel_idxs{1} = [1 3
0076 %       5], then the first, third and fifth variables (after taking into
0077 %       account defs.var_idxs) will be added together to construct parcel
0078 %       1. if parcel_idxs{2} = [2 4 6] then the second, fourth, and sixth
0079 %       variables (After taking into account defs.vars_idxs) will be added
0080 %       together to construct parcel 2.
0081 %   defs.var_names - cell array of strings identifying output variables. If
0082 %       your input data format is ensemble data struct, and you do not
0083 %       specify defs.var_names, it will be set to indata.vars{var_idxs}. If
0084 %       your input data format is a numerical data matrix, var_names will
0085 %       be set to sprintf('var%d',iv) for iv=var_idxs. If you specify
0086 %       defs.parcel_idxs, var_names will be applied to  parcels.
0087 %   defs.output_format {'raw','corr'} default: raw
0088 %
0089 % RETURNS
0090 %   outdata - contents of the CEFA import file
0091 %
0092 % FB 2010.03.30
0093 % FB 2010.05.25 - added support for raw output format (now the default),
0094 %   both numerical matrix and ensemble data struct input, reverse scoring,
0095 %   parcel creation, variable subsets, and variable naming
0096 
0097 %% init vars
0098 outdata = [];
0099 
0100 % init output file
0101 fid = ensemble_init_fid(defs.init_fid);
0102 
0103 % get var_idxs
0104 if isfield(defs,'var_idxs'), var_idxs = defs.var_idxs; end
0105 if isfield(defs,'var_names'), var_names = defs.var_names; end
0106 
0107 % get data
0108 if isstruct(indata) && isfield(indata,'data')
0109   if ~exist('var_idxs','var')
0110     var_idxs = 1:length(indata.vars);
0111   end
0112   data = [indata.data{var_idxs}];
0113   if ~exist('var_names','var')
0114     var_names = {indata.vars{var_idxs}};
0115   end
0116 elseif isnumeric(indata)
0117   if ~exist('var_idxs','var')
0118     var_idxs = 1:size(indata,2);
0119   end
0120   data = indata(:,var_idxs);
0121   if ~exist('var_names','var')
0122     var_names = cell(1,length(var_idxs));
0123     for iv=1:length(var_idxs)
0124       var_names{iv} = sprint('var%d',var_idxs(iv));
0125     end
0126   end
0127 else
0128   error('unknown indata format\n');
0129 end
0130 
0131 % describe data
0132 nobs = size(data,1);
0133 nvar = size(data,2);
0134 if isfield(defs,'output_format')
0135   fmt = defs.output_format;
0136 else
0137   fmt = 'raw';
0138 end
0139 
0140 % set data_type, 1=corr, 2=raw, 3=?, 4=?
0141 switch fmt
0142   case 'corr'
0143     data_type = 1;
0144   case 'raw'
0145     data_type = 2;
0146   otherwise
0147     error('unknown output format: %s\n',fmt);
0148 end
0149 
0150 %% reverse score?
0151 if isfield(defs,'rev_score_idxs')
0152   rsis = defs.rev_score_idxs;
0153   nrsi = length(rsis);
0154   if isfield(defs,'rev_score_max')
0155     rsms = defs.rev_score_max;
0156   else
0157     rsms = [];
0158   end
0159   nrsm = length(rsms);
0160   for rsi=1:nrsi
0161     ridx = rsis(rsi);
0162     if rsi <= nrsm
0163       data(:,ridx) = (ones(nobs,1)*(rsms(rsi)+1)) - data(:,ridx);
0164     else
0165       data(:,ridx) = (ones(nobs,1)*(max(data(:,ridx))+1)) - data(:,ridx);
0166     end
0167   end
0168 end % if isfield(defs,'rev_score_idxs
0169 
0170 %% create parcels?
0171 if isfield(defs,'parcel_idxs')
0172   npidx = length(defs.parcel_idxs);
0173   parcels = zeros(nobs,npidx);
0174   for k=1:npidx
0175     pidx = defs.parcel_idxs{k};
0176     parcels(:,k) = sum(data(:,pidx),2);
0177   end % for k=1:npidx
0178   data = parcels;
0179   nvar = npidx;
0180 end % isfield(defs,'parcel_idxs
0181 
0182 %% output headers
0183 
0184 % write header info to file
0185 fprintf(fid,'%d %d\n%d\n0\n\n',nobs,nvar,data_type);
0186 
0187 % write variable names
0188 if isempty(var_names)
0189   fprintf(fid,'0\n\n');
0190 else
0191   fprintf(fid,'1\n');
0192   fprintf(fid,'%s\n\n',cell2str(var_names,' '));
0193 end
0194 
0195 % confirmatory factor structure
0196 % % % currently not supported
0197 fprintf(fid,'0\n0\n\n')
0198 
0199 %% write out data
0200 
0201 % data format?
0202 switch fmt
0203   case 'raw'
0204     % write out data
0205     for k=1:nobs
0206       fprintf(fid,'%f ',data(k,:));
0207       fprintf(fid,'\n');
0208     end
0209   case 'corr'
0210     % calculate correlation
0211     r = corrcoef(data);
0212 
0213     % write out data
0214     for j=1:nvar
0215       for k=1:j
0216         fprintf(fid,'%1.2f',r(j,k));
0217         if k<j, fprintf(fid,' '); end
0218       end
0219       fprintf(fid,'\n');
0220     end
0221 end
0222 
0223 
0224 fclose(fid);
0225 
0226 %% save outdata
0227 outdata = ensemble_init_data_struct();
0228 outdata.type = 'cefa_out';
0229 if isfield(defs,'outDataName')
0230   outdata.name = defs.outDataName;
0231 else
0232   outdata.name = outdata.type;
0233 end
0234 outdata.vars = var_names;
0235 for k=1:length(outdata.vars)
0236   outdata.data{k} = data(:,k);
0237 end
0238 
0239 fprintf(1,'ensemble_export_cefa: DONE!\n');
0240

Generated on Fri 22-Mar-2019 04:00:52 by m2html © 2003