Filters data in data_st according to criteria specified in filt. [data_st] = ensemble_filt(data_st,filt); Filters data in the data structure (data_st) according to the exclusion and inclusion parameters specified in filt.exclude and filt.include. Each of those structures contains either or .and or a .or field or both, which in turn contains a set of fields whose names are matched against the list of variable names in data_st.vars in order to find the column of data in data_st.data to filter. When specified in the .all structure, all of the conditions in the fields must evaluate to true in order for the data to be removed (if part of an exclude structure) or retained (if part of an include structure). When specified in a .any structure, any of the conditions have to evaluate to true. Examples: filt.exclude.any.subject_id = {'^tmp_.*','^01ttf.*','01zin79271','08tgs78071'}; filt.exclude.any.session_id = [1873 1984 1523:1576]; would cause any rows that have subject IDs beginning with tmp_ to be removed, along with the specific subject IDs given by the 3rd and 4th elements in the cell array of strings, as well as any sessions that match the session IDs given in the list. Note: regexp is used for filtering strings, so string filters must conform to regexp rules. DATE/TIME FILTERING PARAMETERS filt.include.all.date_time.start=datenum('01-Jan-1901'); filt.include.all.date_time.stop=datenum('01-Jan-2020'); start and stop serve as greater-than and less-than operators, respectively. To include the start or stop value, use start_inc and stop_inc.
0001 function [data_st] = ensemble_filter(data_st,filt) 0002 % Filters data in data_st according to criteria specified in filt. 0003 % 0004 % [data_st] = ensemble_filt(data_st,filt); 0005 % 0006 % Filters data in the data structure (data_st) according to the exclusion and 0007 % inclusion parameters specified in filt.exclude and filt.include. Each of 0008 % those structures contains either or .and or a .or field or both, which in 0009 % turn contains a set of fields whose names are matched against the 0010 % list of variable names in data_st.vars in order to find the column of data in 0011 % data_st.data to filter. 0012 % 0013 % When specified in the .all structure, all of the conditions in the fields 0014 % must evaluate to true in order for the data to be removed (if part of an 0015 % exclude structure) or retained (if part of an include structure). When 0016 % specified in a .any structure, any of the conditions have to evaluate to true. 0017 % 0018 % Examples: 0019 % filt.exclude.any.subject_id = {'^tmp_.*','^01ttf.*','01zin79271','08tgs78071'}; 0020 % filt.exclude.any.session_id = [1873 1984 1523:1576]; 0021 % would cause any rows that have subject IDs beginning with tmp_ to be removed, 0022 % along with the specific subject IDs given by the 3rd and 4th elements in the 0023 % cell array of strings, as well as any sessions that match the session IDs 0024 % given in the list. 0025 % 0026 % Note: regexp is used for filtering strings, so string filters must 0027 % conform to regexp rules. 0028 % 0029 % DATE/TIME FILTERING PARAMETERS 0030 % 0031 % filt.include.all.date_time.start=datenum('01-Jan-1901'); 0032 % filt.include.all.date_time.stop=datenum('01-Jan-2020'); 0033 % 0034 % start and stop serve as greater-than and less-than operators, 0035 % respectively. To include the start or stop value, use start_inc and 0036 % stop_inc. 0037 0038 % 01/31/07 Petr Janata - adapted from ensemble_apply_crit.m (which didn't have 0039 % the added layer of and/or logic) 0040 % 0041 % 02/08/07 Stefan Tomic - added 'exact' argument to strmatch 0042 % 03/15/07 S.T. - added support for using NaN as a filtering criteria 0043 % 09/25/07 PJ - returns if filt spec is empty 0044 % 11/14/11 PJ - added handling of scalar data embedded in cells 0045 % 24Jul2013 PJ - Fixed handling of start/stop (greater than, less than) 0046 % to properly follow the all/any logic. 0047 % 27Sep2013 PJ - minor bug fix associated with failure to initialize tmp 0048 % variable during evaluation of lt gt criteria 0049 % 09Jan2014 PJ - expanded list of wildcards that trigger regexp usage to 0050 % '[*^$]' from '[*]' 0051 % 10Feb2015 PJ - added gt, gte, lt, lte for greater than, greater than 0052 % equal to, less than, less than equal to logic. Previously, 0053 % only use start, start_inc, stop, stop_inc 0054 0055 %deal with the possibility that params struct was specified 0056 %directly rather than passing "params.filt" 0057 if(isfield(filt,'filt')) 0058 filt = filt.filt; 0059 end 0060 0061 % If not filtering is specified, then return 0062 if isempty(filt) 0063 return 0064 end 0065 0066 % hack to accomodate ensemble_jobman_parallel_wrapper passing in the hash 0067 % fb 2010.06.19 0068 % pj 2010.11.16 - extended to remove other bad variables 0069 bad_fields = {'hash','ensemble_jobman_interactive'}; 0070 for ibad = 1:length(bad_fields) 0071 if isfield(filt,bad_fields{ibad}) 0072 filt = rmfield(filt,bad_fields{ibad}); 0073 end 0074 end 0075 0076 if isempty(data_st) 0077 fprintf('%s: empty data struct\n', mfilename); 0078 return 0079 end 0080 0081 if(iscell(data_st)) 0082 data_st = data_st{1}; 0083 end 0084 0085 0086 crit_types_to_proc = fieldnames(filt); 0087 ntypes = length(crit_types_to_proc); 0088 0089 nvars = length(data_st.vars); 0090 0091 % Make sure we have data to filter 0092 if all(cellfun('isempty', data_st.data)) 0093 fprintf('%s: data structure contains no data\n', mfilename); 0094 return 0095 end 0096 0097 % Loop over include and exclude structures 0098 for itype = 1:ntypes 0099 type_str = crit_types_to_proc{itype}; 0100 0101 % Determine which of the logic operations we're going to perform 0102 logic_types = fieldnames(filt.(type_str)); 0103 nlog = length(logic_types); 0104 0105 if ~all(ismember(logic_types,{'all','any'})) 0106 msgstr = sprintf(['ensemble_filter: Found logic types other than ''all'' and' ... 0107 ' ''any''\n']); 0108 error(msgstr) 0109 end 0110 0111 for ilog = 1:nlog 0112 logic_str = logic_types{ilog}; 0113 0114 % Get a list of the fields to construct masks for 0115 flds = fieldnames(filt.(type_str).(logic_str)); 0116 nflds = length(flds); 0117 0118 % Loop over all of the fields associated with this criterion type 0119 curr_mask = []; 0120 for ifld = 1:nflds 0121 fld_str = flds{ifld}; 0122 0123 % Find the field string in the list of variable names 0124 data_col = strmatch(fld_str,data_st.vars,'exact'); 0125 if isempty(data_col) 0126 fprintf('ensemble_filter: Did not find criterion field (%s) in list of variables\n',fld_str); 0127 continue 0128 end 0129 0130 0131 0132 % Check to see if fld_str is a structure containing limits 0133 if isstruct(filt.(type_str).(logic_str).(fld_str)) 0134 limit_flds = fieldnames(filt.(type_str).(logic_str).(fld_str)); 0135 nlim = length(limit_flds); 0136 0137 tmp = true(size(data_st.data{data_col},1),nlim); 0138 for ilim = 1:nlim 0139 limit_str = limit_flds{ilim}; 0140 crit_val = filt.(type_str).(logic_str).(fld_str).(limit_str); 0141 0142 % Make sure there is only one criterion value 0143 if length(crit_val) > 1 0144 fprintf('ensemble_filter: Too many criterion values\n'); 0145 continue 0146 end 0147 0148 % Make sure some criteria are specified for this field 0149 if isempty(crit_val) 0150 continue 0151 end 0152 0153 switch limit_str 0154 case {'start','gt'} 0155 tmp2 = data_st.data{data_col} > crit_val; 0156 case {'start_inc','gte'} 0157 tmp2 = data_st.data{data_col} >= crit_val; 0158 case {'stop','lt'} 0159 tmp2 = data_st.data{data_col} < crit_val; 0160 case {'stop_inc','lte'} 0161 tmp2 = data_st.data{data_col} <= crit_val; 0162 otherwise 0163 error('Unknown limit string: %s', limit_str) 0164 end % switch limit_str 0165 0166 tmp(:,ilim) = tmp2; % conjoin masks 0167 end % for ilim 0168 0169 % Finalize tmp based on the desired logic 0170 if strcmp(logic_str,'all') 0171 tmp = all(tmp,2); 0172 else 0173 tmp = any(tmp,2); 0174 end 0175 0176 else 0177 crit_vals = filt.(type_str).(logic_str).(fld_str); 0178 0179 % Make sure some criteria are specified for this field 0180 if isempty(crit_vals) 0181 continue 0182 end 0183 0184 %if the crit_val is NaN, need to use isnan function, 0185 %otherwise we can use ismember 0186 if ~iscell(crit_vals) && any(isnan(crit_vals)) 0187 tmp = isnan(data_st.data{data_col}); 0188 else 0189 % Check to see if were are dealing with numbers embedded in cells 0190 if iscell(data_st.data{data_col}) 0191 numMask = cellfun(@isnumeric,data_st.data{data_col}); 0192 if all(numMask) 0193 lengthMask = cellfun('length',data_st.data{data_col}) == 1; 0194 if all(lengthMask) 0195 tmp = ismember(cat(1,data_st.data{data_col}{:}), crit_vals); 0196 else 0197 error('Cannot handle non-scalar numeric data') 0198 end 0199 else 0200 tmp = ismember(data_st.data{data_col}, crit_vals); 0201 end 0202 else 0203 tmp = ismember(data_st.data{data_col}, crit_vals); 0204 end 0205 end 0206 0207 % Check to see if any of the criterion values have wildcards, in which case 0208 % we need to switch to regexp 0209 if iscellstr(crit_vals) || ischar(crit_vals) 0210 is_wild = ~cellfun('isempty',regexp(crit_vals,'[*$^]')); 0211 wild_idxs = find(is_wild); 0212 for iwild = 1:length(wild_idxs) 0213 tmp = tmp|~cellfun('isempty',regexp(data_st.data{data_col}, ... 0214 crit_vals{wild_idxs(iwild)})); 0215 end 0216 end 0217 end % if isstruct(filt.(type_str).(fld_str)) 0218 curr_mask(:,end+1) = tmp; 0219 end % for ifld 0220 0221 % 0222 % Now construct the final mask that will be used for removing unwanted data 0223 % 0224 0225 if strcmp(type_str,'include') 0226 if strcmp(logic_str,'all') 0227 mask_vect = all(curr_mask,2); 0228 else 0229 mask_vect = any(curr_mask,2); 0230 end 0231 mask_vect = ~mask_vect; % toggle the mask to be an exclusion mask 0232 else 0233 if strcmp(logic_str,'all') 0234 mask_vect = all(curr_mask,2); 0235 else 0236 mask_vect = any(curr_mask,2); 0237 end 0238 end 0239 0240 % Perform row extraction 0241 for ivar = 1:nvars 0242 data_st.data{ivar}(mask_vect,:) = []; 0243 end 0244 end % for ilog 0245 end % for itype=