[Pipelines] Mask Generation Parameters

I noticed that the arguments you can pass to the call method of MaskGenerationPipeline in mask_generation.py are similar but not identical to the parameters passed to the __init__ method of SamAutomaticMaskGeneratorin the official SAM GitHub Repo. Some of the parameters like box_nms_threshor points_per_sidedon’t exist in the HuggingFace repo.

I was wondering what’s the reason behind this omission. As far as I can tell these parameters are still used by SAM2 and it seems like the models that can be used for the mask generation task on HuggingFace are basically all SAM derivatives.

1 Like

The lack of parameters seems intentional.


The missing parameters are intentional.
Hugging Face’s MaskGenerationPipeline is not a thin wrapper around Meta’s SamAutomaticMaskGenerator / SAM2AutomaticMaskGenerator. It is a separate automatic-mask-generation implementation with a different set of knobs, designed to work uniformly across multiple SAM-style models. Parameters that the HF implementation does not actually use, like box_nms_thresh and points_per_side, are not exposed.

I will unpack that in concrete terms.


1. What each implementation actually exposes

Meta: SamAutomaticMaskGenerator

From the official SAM repo:

Constructor parameters include:

  • Sampling:

    • points_per_side
    • points_per_batch
    • point_grids
  • Crop schedule:

    • crop_n_layers
    • crop_overlap_ratio
    • crop_n_points_downscale_factor
  • Quality and stability:

    • pred_iou_thresh
    • stability_score_thresh
    • stability_score_offset
  • NMS:

    • box_nms_thresh (within a crop)
    • crop_nms_thresh (between crops)
  • Small-region cleanup:

    • min_mask_region_area
  • Output format:

    • output_mode (binary_mask, uncompressed_rle, coco_rle)

So Meta’s AMG is a fairly “expert” interface with distinct knobs for:

  • grid density
  • crop strategy
  • per-crop and cross-crop NMS
  • morphological cleanup
  • and output encoding.

SAM2 and derivatives keep essentially the same parameter style (plus SAM2-specific flags such as use_m2m). For example, geospatial wrappers around SAM2 still configure points_per_side, box_nms_thresh, crop_n_points_downscale_factor, min_mask_region_area, and use_m2m. (samgeo.gishub.org)

Hugging Face: MaskGenerationPipeline.__call__

From the Transformers mask_generation.py you attached:

Parameters picked up by _sanitize_parameters:

  • Preprocess kwargs (grid and crops):

    • points_per_batch
    • points_per_crop
    • crops_n_layers
    • crop_overlap_ratio
    • crop_n_points_downscale_factor
    • timeout
  • Forward kwargs (quality and cleanup):

    • pred_iou_thresh
    • stability_score_thresh
    • stability_score_offset
    • mask_threshold
    • max_hole_area
    • max_sprinkle_area
  • Postprocess kwargs:

    • crops_nms_thresh
    • output_rle_mask
    • output_bboxes_mask

Any other keyword argument, such as points_per_side or box_nms_thresh, is silently ignored by the pipeline because it is not added in _sanitize_parameters.

The HF tasks page for mask generation and the SAM/SAM2 docs show the same public surface: you configure points_per_batch, points_per_crop, crops_n_layers, pred_iou_thresh, stability_score_thresh, crops_nms_thresh, etc., not Meta’s full set. (Hugging Face)


2. Concrete parameter mismatches

2.1 points_per_side vs points_per_crop

  • Meta:

    • points_per_side controls how many points are sampled along each axis. Total points per crop are points_per_side**2.
    • There is also point_grids for custom grids.
  • HF:

    • Uses points_per_crop directly, interpreted as “how many points to sample in this crop” when calling image_processor.generate_crop_boxes.
    • Grid construction is handled internally by the SamImageProcessor / Sam2ImageProcessorFast, and the pipeline does not expose a point_grids concept. (GitHub)

So HF reparameterizes the sampling density:

  • Same idea (uniform point grid over each crop), but:

    • No explicit points_per_side.
    • No ability to pass custom point_grids through the pipeline.
  • If you want equivalent density, you roughly set points_per_crop ≈ points_per_side**2, but the exact layout is controlled by the processor, not the pipeline.

2.2 box_nms_thresh + crop_nms_thresh vs crops_nms_thresh

  • Meta:

    • box_nms_thresh = per-crop NMS IoU threshold. Removes duplicates within one crop.
    • crop_nms_thresh = cross-crop NMS IoU threshold. Removes duplicates between different crops.

    Logic:

    1. Generate masks for each crop.
    2. NMS within each crop (box_nms_thresh).
    3. NMS across all crops (crop_nms_thresh).
  • HF:

    • Only exposes crops_nms_thresh.
    • After processing all batches, the pipeline calls a single image_processor.post_process_for_mask_generation(all_masks, all_scores, all_boxes, crops_nms_thresh).
    • There is no separate per-crop NMS stage; the processor runs one global NMS pass over all candidate masks.

Because HF does not implement per-crop NMS, there is no meaningful place to plug in box_nms_thresh. Adding a parameter that the algorithm never uses would be confusing, so it is omitted instead.

2.3 min_mask_region_area vs max_hole_area / max_sprinkle_area

  • Meta:

    • min_mask_region_area controls postprocessing that removes small islands and small holes in masks, via remove_small_regions.
  • HF:

    • _forward optionally calls image_processor.post_process_masks with max_hole_area and max_sprinkle_area, then calls it again to resize and optionally binarize.

    • These parameters control:

      • Filling holes up to a given area.
      • Removing tiny “sprinkles” up to a given area.

So HF splits the single “minimum region area” concept into two more explicit morphological thresholds. The effect is similar (remove small artifacts), but it is not a direct 1:1 mapping, and the parameter name changes.

2.4 Other AMG-only arguments

Meta’s AMG and SAM2’s SAM2AutomaticMaskGenerator also expose parameters that have no stable meaning across all HF-supported mask-generation models:

  • point_grids
  • output_mode (binary vs RLE variants)
  • SAM2-specific knobs such as use_m2m, multimask_output, etc. (samgeo.gishub.org)

HF’s MaskGenerationPipeline does not use these concepts at all. Instead it offers:

  • output_rle_mask and output_bboxes_mask flags to optionally return extra outputs.
  • A fixed internal representation for masks and bounding boxes.

Again, parameters that have no effect on the HF algorithm are not exposed.


3. Why Hugging Face omits those SAM/SAM2 parameters

There is no official HF comment that says “we deliberately removed box_nms_thresh and points_per_side because X”, but the design is clear from the code and docs.

Reasoning, point by point:

3.1 HF pipeline is a separate implementation, not a wrapper

  • The HF task “mask generation” defines its own 3-stage pipeline:

    1. preprocess: generate_crop_boxes + point grid + cropping.
    2. forward: run model, get pred_masks and iou_scores.
    3. postprocess: mask resizing, filtering, and a single NMS step.
  • That logic lives in MaskGenerationPipeline and in the vision processors (SamImageProcessor, Sam2ImageProcessorFast). It is not calling Meta’s SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator anywhere. (GitHub)

Because it is an independent implementation, it:

  • Keeps the general idea (grid of points, crops, thresholds, NMS, small region cleanup).
  • But it is free to choose its own parameterization and internal steps.

3.2 Single API across multiple SAM-like models

HF wants one mask-generation task that works for:

  • SAM v1 models (facebook/sam-vit-*).
  • SAM-HQ, MedSAM, other fine-tuned variants.
  • SAM2-based models. (Hugging Face)

To do that, they define a small set of operations that all supporting processors can implement:

  • generate_crop_boxes
  • post_process_masks
  • filter_masks
  • post_process_for_mask_generation (GitHub)

Then they expose only those parameters that make sense for every supported backend:

  • Grid and crop density in terms of points_per_crop, crops_n_layers, etc.
  • Quality thresholds (pred_iou_thresh, stability_score_thresh).
  • One NMS threshold (crops_nms_thresh).
  • Generic cleanup knobs (max_hole_area, max_sprinkle_area).

Things that are specific to one particular implementation (e.g. SAM’s per-crop NMS, SAM2’s use_m2m, custom point grids) would complicate that common API, so they are left out.

3.3 Pipeline philosophy: “few powerful knobs”

Hugging Face pipelines are designed as high-level, opinionated interfaces. The docs for the mask-generation task describe it as:

  • “Automatic mask generation for images using SamForMaskGeneration.”
  • With a small set of configuration options. (Hugging Face)

This fits the general Transformers design:

  • Pipelines expose a minimal parameter set.

  • For advanced control, you drop down to:

    • the model + processor level, or
    • the original third-party library (Meta’s SAM/SAM2 repos in this case). (Hugging Face Forums)

So from HF’s perspective:

  • Parameters like points_per_batch, points_per_crop, crops_n_layers, and the filtering thresholds cover 95% of what users need.
  • Extra “power user” knobs from Meta’s AMG are better handled in custom code or directly in the SAM/SAM2 repos.

4. What this means for SAM2 and other derivatives

You are correct that SAM2’s own automatic mask generator still uses the full AMG-style set of parameters. For example, geospatial SAM2 wrappers show configuration like: (samgeo.gishub.org)

sam2 = SamGeo2(
    model_id="sam2-hiera-large",
    points_per_side=64,
    points_per_batch=128,
    pred_iou_thresh=0.7,
    stability_score_thresh=0.92,
    stability_score_offset=0.7,
    crop_n_layers=1,
    box_nms_thresh=0.7,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=25,
    use_m2m=True,
)

So:

  • Meta SAM2 code path: still has points_per_side, box_nms_thresh, min_mask_region_area, use_m2m, etc.
  • HF mask-generation pipeline for SAM2: uses the same model weights but its own sampling and postprocessing pipeline, which ignores those SAM2-specific AMG knobs and uses HF’s parameter set instead. (Hugging Face)

This is why you see the discrepancy:

  • The models are SAM derivatives.
  • The automatic mask generator logic is HF’s own, not SAM2’s SAM2AutomaticMaskGenerator.

5. If you need the “missing” knobs

If you actually require per-crop NMS tuning or explicit points_per_side, there are only a few real options:

  1. Use Meta’s SAM/SAM2 repos directly

    • Call SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator with all their original kwargs.
  2. Use a wrapper that ports Meta’s AMG to HF checkpoints

    • Several projects load HF model weights into SAM/SAM2 code and keep the full AMG parameterization (for example in geospatial and research code). (samgeo.gishub.org)
  3. Reimplement the AMG logic on top of HF models

    • Use SamModel / SAM2 model classes plus the HF processors.
    • Port the grid, crop, NMS, and region-cleanup logic from automatic_mask_generator.py.
    • Then you can expose exactly the parameter set you want.

If you stay on pipeline("mask-generation"), you are constrained to HF’s reduced parameter surface by design.


Summary

  • The Hugging Face MaskGenerationPipeline re-implements automatic mask generation on top of SamForMaskGeneration and the SAM/SAM2 image processors. It does not call Meta’s SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator.

  • Because the internal algorithm is different, some Meta AMG knobs have no direct counterpart:

    • points_per_side is replaced by points_per_crop and internal grid generation.
    • Per-crop box_nms_thresh is gone; HF uses a single global crops_nms_thresh.
    • min_mask_region_area is replaced by max_hole_area and max_sprinkle_area.
  • HF also wants one stable pipeline API that works for SAM, SAM-HQ, SAM2, and other derivatives, so it exposes only parameters that are meaningful across all of them. (Hugging Face)

  • SAM2 and many downstream projects still use the full AMG parameter set, including points_per_side and box_nms_thresh. If you need those, you must use Meta’s AMG implementation or a custom wrapper, not the HF mask-generation pipeline. (samgeo.gishub.org)

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.