Home > database > ensemble_filter.m

ensemble_filter

PURPOSE ^

Filters data in data_st according to criteria specified in filt.

SYNOPSIS ^

function [data_st] = ensemble_filter(data_st,filt)

DESCRIPTION ^

 Filters data in data_st according to criteria specified in filt.

 [data_st] = ensemble_filt(data_st,filt);

 Filters data in the data structure (data_st) according to the exclusion and
 inclusion parameters specified in filt.exclude and filt.include.  Each of
 those structures contains either or .and or a .or field or both, which in
 turn contains a set of fields whose names are matched against the
 list of variable names in data_st.vars in order to find the column of data in
 data_st.data to filter.

 When specified in the .all structure, all of the conditions in the fields
 must evaluate to true in order for the data to be removed (if part of an
 exclude structure) or retained (if part of an include structure).  When
 specified in a .any structure, any of the conditions have to evaluate to true.

 Examples:
 filt.exclude.any.subject_id = {'^tmp_.*','^01ttf.*','01zin79271','08tgs78071'};
 filt.exclude.any.session_id = [1873 1984  1523:1576];
 would cause any rows that have subject IDs beginning with tmp_ to be removed,
 along with the specific subject IDs given by the 3rd and 4th elements in the
 cell array of strings, as well as any sessions that match the session IDs
 given in the list.

 Note: regexp is used for filtering strings, so string filters must
 conform to regexp rules.

 DATE/TIME FILTERING PARAMETERS

 filt.include.all.date_time.start=datenum('01-Jan-1901');
 filt.include.all.date_time.stop=datenum('01-Jan-2020');

 start and stop serve as greater-than and less-than operators,
 respectively. To include the start or stop value, use start_inc and
 stop_inc.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [data_st] = ensemble_filter(data_st,filt)
0002 % Filters data in data_st according to criteria specified in filt.
0003 %
0004 % [data_st] = ensemble_filt(data_st,filt);
0005 %
0006 % Filters data in the data structure (data_st) according to the exclusion and
0007 % inclusion parameters specified in filt.exclude and filt.include.  Each of
0008 % those structures contains either or .and or a .or field or both, which in
0009 % turn contains a set of fields whose names are matched against the
0010 % list of variable names in data_st.vars in order to find the column of data in
0011 % data_st.data to filter.
0012 %
0013 % When specified in the .all structure, all of the conditions in the fields
0014 % must evaluate to true in order for the data to be removed (if part of an
0015 % exclude structure) or retained (if part of an include structure).  When
0016 % specified in a .any structure, any of the conditions have to evaluate to true.
0017 %
0018 % Examples:
0019 % filt.exclude.any.subject_id = {'^tmp_.*','^01ttf.*','01zin79271','08tgs78071'};
0020 % filt.exclude.any.session_id = [1873 1984  1523:1576];
0021 % would cause any rows that have subject IDs beginning with tmp_ to be removed,
0022 % along with the specific subject IDs given by the 3rd and 4th elements in the
0023 % cell array of strings, as well as any sessions that match the session IDs
0024 % given in the list.
0025 %
0026 % Note: regexp is used for filtering strings, so string filters must
0027 % conform to regexp rules.
0028 %
0029 % DATE/TIME FILTERING PARAMETERS
0030 %
0031 % filt.include.all.date_time.start=datenum('01-Jan-1901');
0032 % filt.include.all.date_time.stop=datenum('01-Jan-2020');
0033 %
0034 % start and stop serve as greater-than and less-than operators,
0035 % respectively. To include the start or stop value, use start_inc and
0036 % stop_inc.
0037 
0038 % 01/31/07 Petr Janata - adapted from ensemble_apply_crit.m (which didn't have
0039 %                        the added layer of and/or logic)
0040 %
0041 % 02/08/07 Stefan Tomic - added 'exact' argument to strmatch
0042 % 03/15/07 S.T. - added support for using NaN as a filtering criteria
0043 % 09/25/07 PJ - returns if filt spec is empty
0044 % 11/14/11 PJ - added handling of scalar data embedded in cells
0045 % 24Jul2013 PJ - Fixed handling of start/stop (greater than, less than)
0046 %                to properly follow the all/any logic.
0047 % 27Sep2013 PJ - minor bug fix associated with failure to initialize tmp
0048 %                variable during evaluation of lt gt criteria
0049 % 09Jan2014 PJ - expanded list of wildcards that trigger regexp usage to
0050 %                '[*^$]' from '[*]'
0051 % 10Feb2015 PJ - added gt, gte, lt, lte for greater than, greater than
0052 %                equal to, less than, less than equal to logic. Previously,
0053 %                only use start, start_inc, stop, stop_inc
0054 
0055 %deal with the possibility that params struct was specified
0056 %directly rather than passing "params.filt"
0057 if(isfield(filt,'filt'))
0058     filt = filt.filt;
0059 end
0060 
0061 % If not filtering is specified, then return
0062 if isempty(filt)
0063     return
0064 end
0065 
0066 % hack to accomodate ensemble_jobman_parallel_wrapper passing in the hash
0067 % fb 2010.06.19
0068 % pj 2010.11.16 - extended to remove other bad variables
0069 bad_fields = {'hash','ensemble_jobman_interactive'};
0070 for ibad = 1:length(bad_fields)
0071     if isfield(filt,bad_fields{ibad})
0072         filt = rmfield(filt,bad_fields{ibad});
0073     end
0074 end
0075 
0076 if isempty(data_st)
0077     fprintf('%s: empty data struct\n', mfilename);
0078     return
0079 end
0080 
0081 if(iscell(data_st))
0082     data_st = data_st{1};
0083 end
0084 
0085 
0086 crit_types_to_proc = fieldnames(filt);
0087 ntypes = length(crit_types_to_proc);
0088 
0089 nvars = length(data_st.vars);
0090 
0091 % Make sure we have data to filter
0092 if all(cellfun('isempty', data_st.data))
0093   fprintf('%s: data structure contains no data\n', mfilename);
0094   return
0095 end
0096       
0097 % Loop over include and exclude structures
0098 for itype = 1:ntypes
0099     type_str = crit_types_to_proc{itype};
0100     
0101     % Determine which of the logic operations we're going to perform
0102     logic_types = fieldnames(filt.(type_str));
0103     nlog = length(logic_types);
0104     
0105     if ~all(ismember(logic_types,{'all','any'}))
0106         msgstr = sprintf(['ensemble_filter: Found logic types other than ''all'' and' ...
0107             ' ''any''\n']);
0108         error(msgstr)
0109     end
0110     
0111     for ilog = 1:nlog
0112         logic_str = logic_types{ilog};
0113         
0114         % Get a list of the fields to construct masks for
0115         flds = fieldnames(filt.(type_str).(logic_str));
0116         nflds = length(flds);
0117         
0118         % Loop over all of the fields associated with this criterion type
0119         curr_mask = [];
0120         for ifld = 1:nflds
0121             fld_str = flds{ifld};
0122             
0123             % Find the field string in the list of variable names
0124             data_col = strmatch(fld_str,data_st.vars,'exact');
0125             if isempty(data_col)
0126                 fprintf('ensemble_filter: Did not find criterion field (%s) in list of variables\n',fld_str);
0127                 continue
0128       end
0129             
0130 
0131       
0132             % Check to see if fld_str is a structure containing limits
0133             if isstruct(filt.(type_str).(logic_str).(fld_str))
0134                 limit_flds = fieldnames(filt.(type_str).(logic_str).(fld_str));
0135                 nlim = length(limit_flds);
0136                 
0137                 tmp = true(size(data_st.data{data_col},1),nlim);
0138                 for ilim = 1:nlim
0139                     limit_str = limit_flds{ilim};
0140                     crit_val = filt.(type_str).(logic_str).(fld_str).(limit_str);
0141                     
0142                     % Make sure there is only one criterion value
0143                     if length(crit_val) > 1
0144                         fprintf('ensemble_filter: Too many criterion values\n');
0145                         continue
0146                     end
0147                     
0148                     % Make sure some criteria are specified for this field
0149                     if isempty(crit_val)
0150                         continue
0151                     end
0152                     
0153                     switch limit_str
0154                         case {'start','gt'}
0155                             tmp2 = data_st.data{data_col} > crit_val;
0156                         case {'start_inc','gte'}
0157                             tmp2 = data_st.data{data_col} >= crit_val;
0158                         case {'stop','lt'}
0159                             tmp2 = data_st.data{data_col} < crit_val;
0160                         case {'stop_inc','lte'}
0161                             tmp2 = data_st.data{data_col} <= crit_val;
0162             otherwise
0163               error('Unknown limit string: %s', limit_str)
0164                     end % switch limit_str
0165                     
0166           tmp(:,ilim) = tmp2;  % conjoin masks
0167                 end % for ilim
0168         
0169         % Finalize tmp based on the desired logic
0170         if strcmp(logic_str,'all')
0171           tmp = all(tmp,2);
0172         else
0173           tmp = any(tmp,2);
0174         end
0175         
0176             else
0177                 crit_vals = filt.(type_str).(logic_str).(fld_str);
0178                 
0179                 % Make sure some criteria are specified for this field
0180                 if isempty(crit_vals)
0181                     continue
0182                 end
0183                 
0184                 %if the crit_val is NaN, need to use isnan function,
0185                 %otherwise we can use ismember
0186                 if ~iscell(crit_vals) && any(isnan(crit_vals))
0187                     tmp = isnan(data_st.data{data_col});
0188                 else
0189                     % Check to see if were are dealing with numbers embedded in cells
0190                     if iscell(data_st.data{data_col})
0191                         numMask = cellfun(@isnumeric,data_st.data{data_col});
0192                         if all(numMask)
0193                             lengthMask = cellfun('length',data_st.data{data_col}) == 1;
0194                             if all(lengthMask)
0195                                 tmp = ismember(cat(1,data_st.data{data_col}{:}), crit_vals);
0196                             else
0197                                 error('Cannot handle non-scalar numeric data')
0198                             end
0199                         else
0200                             tmp = ismember(data_st.data{data_col}, crit_vals);
0201                         end
0202                     else
0203                         tmp = ismember(data_st.data{data_col}, crit_vals);
0204                     end
0205                 end
0206                 
0207                 % Check to see if any of the criterion values have wildcards, in which case
0208                 % we need to switch to regexp
0209                 if iscellstr(crit_vals) || ischar(crit_vals)
0210                     is_wild = ~cellfun('isempty',regexp(crit_vals,'[*$^]'));
0211                     wild_idxs = find(is_wild);
0212                     for iwild = 1:length(wild_idxs)
0213                         tmp = tmp|~cellfun('isempty',regexp(data_st.data{data_col}, ...
0214                             crit_vals{wild_idxs(iwild)}));
0215                     end
0216                 end
0217             end % if isstruct(filt.(type_str).(fld_str))
0218             curr_mask(:,end+1) = tmp;
0219         end % for ifld
0220         
0221         %
0222         % Now construct the final mask that will be used for removing unwanted data
0223         %
0224         
0225         if strcmp(type_str,'include')
0226             if strcmp(logic_str,'all')
0227                 mask_vect = all(curr_mask,2);
0228             else
0229                 mask_vect = any(curr_mask,2);
0230             end
0231             mask_vect = ~mask_vect; % toggle the mask to be an exclusion mask
0232         else
0233             if strcmp(logic_str,'all')
0234                 mask_vect = all(curr_mask,2);
0235             else
0236                 mask_vect = any(curr_mask,2);
0237             end
0238         end
0239         
0240         % Perform row extraction
0241         for ivar = 1:nvars
0242             data_st.data{ivar}(mask_vect,:) = [];
0243         end
0244     end % for ilog
0245 end % for itype=

Generated on Wed 20-Sep-2023 04:00:50 by m2html © 2003