Home > database > ensemble_reshape_data.m

ensemble_reshape_data

PURPOSE ^

Reshapes a data struct by making new variables out of the unique levels

SYNOPSIS ^

function out_st = ensemble_reshape_data(data_st,params)

DESCRIPTION ^

 Reshapes a data struct by making new variables out of the unique levels
 in xfmVar.

 out_st = ensemble_reshape_data(data_st,params);
 
 USAGE:
   data_st - an Ensemble data struct
   params - see below

 In order for this tranformation to work it requires 3 sets of variables,
 specified in params. The following fields must be present in
 params.ensemble_reshape_data_st

 REQUIRED:
 xfmVar - This is the name of the variable whose unqiue values will form
          new variables in the out_st Ensemble data struct. This will
          typically be a 'compqid_str' variable (created by
          ensemble_attach_compqid_str)

 valueVars - this is the variable whose values will populate the new
          variables. This would typically be 'response_enum' or
          'response_text'. If multiple values are specified, both are
          checked. If a value is contained in both of them, an error is
          thrown. If a value is contained in only 1, it is copied. If
          neither contains a value, a NaN or empty string is placed,
          depending on type.

 keyVars - this is the set of variables, which in association with xfmVar,
          will identify single (unique) rows in the data struct. If unique
          rows are not found, and error will be generated

 OPTIONAL:
 var_name_map - If present, this structure is used to map levels of xfmVar 
          to names that will be used as variable names in the output
          data_st
 copyVars - additional variables from the data_st that will be copied to
          the output structure

 See also: ensemble_data_by_question(), ensemble_export_respnstim(), ensemble_vals2vars()

 The difference between ensemble_reshape_data_st and ensemble_vals2vars is
 that the latter will not handle multi-variable keys.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function out_st = ensemble_reshape_data(data_st,params)
0002 % Reshapes a data struct by making new variables out of the unique levels
0003 % in xfmVar.
0004 %
0005 % out_st = ensemble_reshape_data(data_st,params);
0006 %
0007 % USAGE:
0008 %   data_st - an Ensemble data struct
0009 %   params - see below
0010 %
0011 % In order for this tranformation to work it requires 3 sets of variables,
0012 % specified in params. The following fields must be present in
0013 % params.ensemble_reshape_data_st
0014 %
0015 % REQUIRED:
0016 % xfmVar - This is the name of the variable whose unqiue values will form
0017 %          new variables in the out_st Ensemble data struct. This will
0018 %          typically be a 'compqid_str' variable (created by
0019 %          ensemble_attach_compqid_str)
0020 %
0021 % valueVars - this is the variable whose values will populate the new
0022 %          variables. This would typically be 'response_enum' or
0023 %          'response_text'. If multiple values are specified, both are
0024 %          checked. If a value is contained in both of them, an error is
0025 %          thrown. If a value is contained in only 1, it is copied. If
0026 %          neither contains a value, a NaN or empty string is placed,
0027 %          depending on type.
0028 %
0029 % keyVars - this is the set of variables, which in association with xfmVar,
0030 %          will identify single (unique) rows in the data struct. If unique
0031 %          rows are not found, and error will be generated
0032 %
0033 % OPTIONAL:
0034 % var_name_map - If present, this structure is used to map levels of xfmVar
0035 %          to names that will be used as variable names in the output
0036 %          data_st
0037 % copyVars - additional variables from the data_st that will be copied to
0038 %          the output structure
0039 %
0040 % See also: ensemble_data_by_question(), ensemble_export_respnstim(), ensemble_vals2vars()
0041 %
0042 % The difference between ensemble_reshape_data_st and ensemble_vals2vars is
0043 % that the latter will not handle multi-variable keys.
0044 
0045 % 04Sep2014 Petr Janata
0046 
0047 % Turn off categorical class warnings
0048 warning('off','stats:categorical:subsasgn:NewLevelsAdded')
0049 
0050 %% Check input parameters
0051 if nargin < 2
0052   error('%s: data_st and params inputs required', mfilename)
0053 end
0054 
0055 if ~isfield(params,mfilename)
0056   error('params.%s structure is required',mfilename)
0057 end
0058 
0059 requiredVars = {'xfmVar','valueVars','keyVars'};
0060 missingMask = ~ismember(requiredVars,fieldnames(params.(mfilename)));
0061 if any(missingMask)
0062   error('Required fields missing from params.%s: %s', mfilename, cell2str(requiredVars(missingMask),','))
0063 end
0064 
0065 if isfield(params,'verbose')
0066   verbose = params.verbose;
0067 else
0068   verbose = 1;
0069 end
0070 
0071 % Make variable lists a bit more accessible
0072 xfmVar = params.(mfilename).xfmVar;
0073 
0074 keyVars = [params.(mfilename).keyVars xfmVar];
0075 nkeys = length(keyVars);
0076 
0077 valueVars = params.(mfilename).valueVars;
0078 
0079 if isfield(params.(mfilename),'copyVars')
0080   copyVars = params.(mfilename).copyVars;
0081 else
0082   copyVars = {};
0083 end
0084 ncopy = length(copyVars);
0085 
0086 % Get column indexing for input data
0087 cols = set_var_col_const(data_st.vars);
0088 
0089 %% Perform any desired filtering
0090 if isfield(params,'filt')
0091   data_st = ensemble_filter(data_st, params.filt);
0092 end
0093 
0094 %% Extract the key variables and xfmVar and check for uniqueness
0095 nrows = size(data_st.data{cols.(keyVars{1})},1);
0096 
0097 % Convert each variable to a categorical type
0098 for ikey = 1:nkeys
0099   keymtx(:,ikey) = nominal(data_st.data{cols.(keyVars{ikey})});
0100 end
0101 
0102 % Check for uniqueness
0103 nonUniqueMask = check_unique_rows(keymtx, verbose);
0104 
0105 if any(nonUniqueMask)
0106   error('Key variables do not return unique values')
0107 end
0108 
0109 %% Get the set of output keys (all keys but xfmVar key)
0110 outkeymtx = unique(keymtx(:,1:end-1),'rows');
0111 noutRows = size(outkeymtx,1);
0112 
0113 %% Extract a matrix with all possible values
0114 valmtx = data_st.data(ismember(data_st.vars,valueVars));
0115 nvals = length(valmtx);
0116 
0117 % Get types for each of the value variables
0118 valtype = cell(1,nvals);
0119 for ival = 1:nvals
0120   valtype{ival} = class(valmtx{ival});
0121 end
0122 
0123 %% Initialize the output structure
0124 out_st = ensemble_init_data_struct;
0125 
0126 % Add the variables we want to copy to the list and initialize their types
0127 if ncopy
0128   out_st.vars = copyVars;
0129   ocols = set_var_col_const(out_st.vars);
0130   for icopy = 1:ncopy
0131     currVar = copyVars{icopy};
0132     currType = class(data_st.data{cols.(currVar)});
0133     switch currType
0134       case {'numeric','double'}
0135         out_st.data{ocols.(currVar)} = nan(noutRows,1);
0136       case 'logical'
0137         out_st.data{ocols.(currVar)} = false(noutRows,1);
0138       case 'cell'
0139         out_st.data{ocols.(currVar)} = cell(noutRows,1);
0140       otherwise
0141         error('No initialization for type: %s', currType)
0142     end
0143   end
0144 end
0145 
0146 %% Create new variables in output structure
0147 [levelMaskMtx, newVars] = make_mask_mtx(data_st.data{cols.(xfmVar)});
0148 numNew = length(newVars);
0149 newSrc = cell(1,numNew);
0150 
0151 for inew = 1:numNew
0152   currLevel = newVars{inew};
0153   
0154   % Find rows in the input data corresponding to this variable
0155   levelMask = levelMaskMtx(:,strcmp(newVars,currLevel));
0156 
0157   % See if we want to remap its name
0158   if isfield(params.(mfilename).var_name_map,currLevel)
0159     varName = params.(mfilename).var_name_map.(currLevel);
0160   else
0161     varName = currLevel;
0162   end
0163   
0164   % Place the name in the output structure if necesary
0165   out_st.vars{end+1} = varName;
0166   ocols = set_var_col_const(out_st.vars);
0167  
0168   % Check which of the value variables we have data in for this level
0169   haveData = false(1,nvals);
0170   for ival = 1:nvals
0171     currData = data_st.data{cols.(valueVars{ival})}(levelMask);
0172     switch valtype{ival}
0173       case {'numeric','double','logical'}
0174         haveData(ival) = any(currData);
0175       case 'cell'
0176         haveData(ival) = any(~cellfun('isempty', currData));
0177     end
0178   end
0179   
0180   if ~any(haveData)
0181     error('No data available in any of the value variables')
0182   end
0183   
0184   if sum(haveData) > 1
0185     error('More than one value variable has data for level (%s): %s', ...
0186       currLevel, cell2str(valueVars(haveData),','))
0187   end
0188   
0189   valVar = valueVars{haveData};
0190   currType = valtype{haveData};
0191   newSrc{inew} = valVar;
0192   
0193   % Initialize the output variable
0194   switch currType
0195     case {'numeric','double'}
0196       out_st.data{ocols.(varName)} = nan(noutRows,1);
0197     case 'logical'
0198       out_st.data{ocols.(varName)} = false(noutRows,1);
0199     case 'cell'
0200       out_st.data{ocols.(varName)} = cell(noutRows,1);
0201     otherwise
0202       error('No initialization for type: %s', currType)
0203   end
0204 end % new variable initialization
0205 
0206 
0207 %% Copy data to the new variable
0208 
0209 % Loop over values in the key matrix
0210 for irow = 1:noutRows
0211   currKey = outkeymtx(irow,:);
0212   outkeyMask = ismember(keymtx(:,1:end-1),currKey,'rows');
0213   
0214   for inew = 1:numNew
0215     currLevel = newVars{inew};
0216     
0217     % See if we want to remap its name
0218     if isfield(params.(mfilename).var_name_map,currLevel)
0219       varName = params.(mfilename).var_name_map.(currLevel);
0220     else
0221       varName = currLevel;
0222     end
0223  
0224     % Find rows corresponding to this variable
0225     levelMask = levelMaskMtx(:,strcmp(newVars,currLevel));
0226     
0227     % Get the composite mask
0228     compMask = outkeyMask & levelMask;
0229     
0230     % Skip row if not relevant
0231     if ~any(compMask)
0232       continue
0233     end
0234     
0235     % Copy the value variable
0236     out_st.data{ocols.(varName)}(irow) = data_st.data{cols.(newSrc{inew})}(compMask);
0237   
0238  end % for inew
0239  
0240  % Now copy the other variables we want to carryover
0241  for icopy = 1:ncopy
0242    tmp = data_st.data{cols.(copyVars{icopy})}(outkeyMask);
0243    if ~iscell(tmp) && any(isnan(tmp))
0244      continue
0245    end
0246    
0247    outval = unique(tmp);
0248    if numel(outval) > 1
0249      error('Too many (%d) values found for variable we want to copy (%s)', numel(outval), copyVars{icopy})
0250    else
0251      out_st.data{ocols.(copyVars{icopy})}(irow) = outval;
0252    end
0253  end % for icopy
0254     
0255  
0256 end % for irow
0257 
0258 
0259 end % function

Generated on Wed 20-Sep-2023 04:00:50 by m2html © 2003