Batch-Compress a Photo Folder and Bundle It as a ZIP
Point this Python script at a photo folder and it batch-compresses JPG, PNG, WebP, BMP, and TIFF images at one of three levels, runs the encoding in parallel via `ProcessPoolExecutor`, and bundles the results into a ZIP. A CRC plus file-count integrity check gates the optional deletion step so compressed images are never removed from a corrupted archive.
Built for the moment you need to shrink hundreds of site photos or product shots without torching the quality. `Pillow` handles the encoding, `ProcessPoolExecutor` fans the work across CPU cores - 1, `zipfile` wraps the output for distribution, and `testzip()` gates the delete step so compressed images are only removed from a verifiably intact archive. The whole "compress everything before uploading" chore collapses into a single run.
What this script can do
- Batch compression of a photo folder (JPG / PNG / WebP / BMP / TIFF)
- Three presets: Light, Balanced, and Strong
- Multiprocess parallelism across CPU cores - 1
- JPEG EXIF preservation and PNG lossless optimization
- CRC + file-count integrity check after zip creation
- Compressed-image deletion only when the archive passes verification
- Full summary log written alongside the output
Import the .pybes file into Pybes and the script — along with its config fields — loads automatically.
Config fields
These are the config fields this script uses. Enter values through the Pybes GUI at runtime.
photo_dir Folder Required Photo folder
Folder holding the images to compress. JPG / PNG / WebP / BMP / TIFF are detected automatically.
output_dir Folder Required Output folder
Destination for the compressed images, the zip, and the log. Use a folder separate from the source.
compression_level Dropdown Required Compression level
Pick one: Light (quality 85, no resize) / Balanced (quality 75, cap 2560px) / Strong (quality 60, cap 1920px).
Default: Balanced (recommended)
create_zip Checkbox Required Create ZIP archive
Bundle the compressed images into a single zip. Recommended when sharing the output.
Default: true
delete_compressed Checkbox Required Delete compressed images
After zip creation, delete the compressed image files (never the originals) — only if the CRC + file-count check passes.
Default: false
zip_basename Text Required ZIP base name (required even if ZIP is off)
Prefix for the zip and log filenames. The actual zip name is `{value}_photos_YYYYMMDD_HHMMSS.zip`.
Code walkthrough
import sys
import json
import os
import zipfile
from datetime import datetime
from concurrent.futures import ProcessPoolExecutor, as_completed
from PIL import Image
# Assumptions:
# - Supported formats: JPG/JPEG, PNG, WebP, BMP, TIFF
# - Output filenames keep the original name and extension
# - PNG stays lossless; only optimize=True is applied (to preserve transparency)
# - Per-level parameters use common quality/size tradeoffs
# - Parallelism uses a process pool (image encoding is CPU-bound, so not threads)
# - Worker count is CPU cores - 1 (minimum 1)
# - zip filename is "{basename}_photos_{timestamp}.zip" under the output folder
# - zip stores the compressed images flat (no subfolders)
# - zip compression is ZIP_DEFLATED (pre-compressed images do not shrink much, but it is the standard choice)
# - Compressed images are deleted only when zip creation succeeds AND testzip() is OK AND the file counts match
# - A summary log "{basename}_log_{timestamp}.txt" is written to the output folder
# Per-level settings
compression_settings = {
"Light (preserve quality)": {"jpeg_quality": 85, "max_size": None},
"Light (preserve quality)": {"jpeg_quality": 85, "max_size": None},
"Balanced (recommended)": {"jpeg_quality": 75, "max_size": 2560},
"Balanced (recommended)": {"jpeg_quality": 75, "max_size": 2560},
"Strong (smallest size)": {"jpeg_quality": 60, "max_size": 1920},
"Strong (smallest size)": {"jpeg_quality": 60, "max_size": 1920},
}
supported_extensions = {".jpg", ".jpeg", ".png", ".webp", ".bmp", ".tif", ".tiff"}
def compress_image(args):
"""Compress one file. Called by the process pool."""
input_path, output_path, jpeg_quality, max_size = args
filename = os.path.basename(input_path)
ext = os.path.splitext(filename)[1].lower()
try:
original_size = os.path.getsize(input_path)
with Image.open(input_path) as img:
exif = img.info.get("exif")
if max_size is not None:
w, h = img.size
longest_edge = max(w, h)
if longest_edge > max_size:
ratio = max_size / longest_edge
new_w = int(w * ratio)
new_h = int(h * ratio)
img = img.resize((new_w, new_h), Image.LANCZOS)
if ext in (".jpg", ".jpeg"):
if img.mode not in ("RGB", "L"):
img = img.convert("RGB")
save_kwargs = {
"format": "JPEG",
"quality": jpeg_quality,
"optimize": True,
"progressive": True,
}
if exif:
save_kwargs["exif"] = exif
img.save(output_path, **save_kwargs)
elif ext == ".png":
img.save(output_path, format="PNG", optimize=True)
elif ext == ".webp":
img.save(output_path, format="WEBP", quality=jpeg_quality, method=6)
elif ext == ".bmp":
img.save(output_path, format="BMP")
elif ext in (".tif", ".tiff"):
img.save(output_path, format="TIFF", compression="tiff_lzw")
compressed_size = os.path.getsize(output_path)
return {
"ok": True,
"filename": filename,
"original_size": original_size,
"compressed_size": compressed_size,
}
except Exception as e:
return {
"ok": False,
"filename": filename,
"error": str(e),
}
def main():
with open(sys.argv[1], encoding="utf-8") as f:
inputs = json.load(f)
photo_dir = inputs["photo_dir"]
output_dir = inputs["output_dir"]
compression_level = inputs["compression_level"]
basename = inputs["zip_basename"]
create_zip_flag = inputs["create_zip"] == "true"
delete_flag = inputs["delete_compressed"] == "true"
if compression_level not in compression_settings:
normalized = compression_level.replace("(", "(").replace(")", ")")
if normalized in compression_settings:
compression_level = normalized
else:
print(f"Error: invalid compression level: {compression_level}", file=sys.stderr)
sys.exit(1)
setting = compression_settings[compression_level]
jpeg_quality = setting["jpeg_quality"]
max_size = setting["max_size"]
cpu_cores = os.cpu_count() or 1
worker_count = max(1, cpu_cores - 1)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
print(f"Base name: {basename}")
print(f"Compression level: {compression_level}")
print(f"JPEG quality: {jpeg_quality}")
print(f"Longest edge cap: {max_size if max_size else 'keep original size'}")
print(f"CPU cores: {cpu_cores} / worker count: {worker_count}")
print(f"Create zip: {'yes' if create_zip_flag else 'no'}")
print(f"Delete compressed images: {'yes (after zip integrity + file-count check)' if delete_flag else 'no'}")
print(f"Input folder: {photo_dir}")
print(f"Output folder: {output_dir}")
print("-" * 50)
os.makedirs(output_dir, exist_ok=True)
target_files = []
for filename in os.listdir(photo_dir):
full_path = os.path.join(photo_dir, filename)
if os.path.isfile(full_path):
ext = os.path.splitext(filename)[1].lower()
if ext in supported_extensions:
target_files.append(filename)
total = len(target_files)
if total == 0:
print("No supported image files were found.")
return
print(f"Target files: {total}")
print("-" * 50)
tasks = [
(
os.path.join(photo_dir, filename),
os.path.join(output_dir, filename),
jpeg_quality,
max_size,
)
for filename in target_files
]
success_count = 0
fail_count = 0
total_original = 0
total_compressed = 0
done = 0
success_files = []
fail_files = []
per_file_results = [] # used for the summary log
with ProcessPoolExecutor(max_workers=worker_count) as executor:
future_to_name = {
executor.submit(compress_image, task): task[0] for task in tasks
}
for future in as_completed(future_to_name):
done += 1
result = future.result()
if result["ok"]:
orig = result["original_size"]
comp = result["compressed_size"]
total_original += orig
total_compressed += comp
reduction = (1 - comp / orig) * 100 if orig > 0 else 0
print(
f"[{done}/{total}] {result['filename']} "
f"{orig/1024:.1f}KB → {comp/1024:.1f}KB "
f"({reduction:+.1f}%)"
)
success_count += 1
success_files.append(result["filename"])
per_file_results.append({
"filename": result["filename"],
"original_size": orig,
"compressed_size": comp,
"reduction_rate": reduction,
"status": "success",
})
else:
print(
f"[{done}/{total}] {result['filename']} failed: {result['error']}",
file=sys.stderr,
)
fail_count += 1
fail_files.append(result["filename"])
per_file_results.append({
"filename": result["filename"],
"error": result["error"],
"status": "failed",
})
print("-" * 50)
print(f"Done: success {success_count} / failed {fail_count}")
total_reduction = 0.0
if total_original > 0:
total_reduction = (1 - total_compressed / total_original) * 100
print(
f"Total size: {total_original/1024/1024:.2f}MB → "
f"{total_compressed/1024/1024:.2f}MB "
f"({total_reduction:+.1f}%)"
)
print(f"Output folder: {output_dir}")
# zip archive
zip_ok = False
zip_path = None
zip_filename = None
if create_zip_flag and success_files:
print("-" * 50)
print("Creating zip archive...")
zip_filename = f"{basename}_photos_{timestamp}.zip"
zip_path = os.path.join(output_dir, zip_filename)
try:
with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zf:
for i, filename in enumerate(success_files, 1):
file_path = os.path.join(output_dir, filename)
zf.write(file_path, arcname=filename)
print(f" [{i}/{len(success_files)}] added {filename}")
zip_size = os.path.getsize(zip_path)
print(f"zip created: {zip_filename} ({zip_size/1024/1024:.2f}MB)")
print(f"zip path: {zip_path}")
zip_ok = True
except Exception as e:
print(f"zip creation failed: {e}", file=sys.stderr)
elif not create_zip_flag:
print("Skipped zip creation.")
# zip integrity check + file-count match -> delete compressed images if OK
zip_integrity_result = "not checked"
deletion_result = "not run"
if delete_flag and zip_ok and zip_path:
print("-" * 50)
print("Verifying zip integrity and file count...")
try:
with zipfile.ZipFile(zip_path, "r") as zf:
bad_file = zf.testzip()
if bad_file is not None:
zip_integrity_result = f"CRC failed: {bad_file}"
print(
f"Integrity check failed: {bad_file} may be corrupted.",
file=sys.stderr,
)
print("Skipping compressed-image deletion for safety.", file=sys.stderr)
deletion_result = "skipped (CRC failure)"
else:
zip_contents = set(zf.namelist())
expected_files = set(success_files)
missing_files = expected_files - zip_contents
extra_files = zip_contents - expected_files
print(f" Compressed targets: {len(expected_files)} / in zip: {len(zip_contents)}")
if missing_files:
zip_integrity_result = f"file count mismatch: {len(missing_files)} missing"
print(
f"File-count check failed: {len(missing_files)} files missing from the zip.",
file=sys.stderr,
)
for f in sorted(missing_files):
print(f" missing: {f}", file=sys.stderr)
print("Skipping compressed-image deletion for safety.", file=sys.stderr)
deletion_result = "skipped (file count mismatch)"
else:
zip_integrity_result = "OK"
if extra_files:
print(f"Warning: zip contains {len(extra_files)} unexpected files.")
for f in sorted(extra_files):
print(f" unexpected: {f}")
print("Integrity and file count OK. Deleting compressed images...")
deleted_count = 0
delete_fail_count = 0
for filename in success_files:
file_path = os.path.join(output_dir, filename)
try:
os.remove(file_path)
print(f" deleted: {filename}")
deleted_count += 1
except Exception as e:
print(f" delete failed: {filename} - {e}", file=sys.stderr)
delete_fail_count += 1
print(f"Deletion complete: {deleted_count} deleted / {delete_fail_count} failed")
deletion_result = f"complete (deleted {deleted_count} / failed {delete_fail_count})"
except Exception as e:
zip_integrity_result = f"error during check: {e}"
print(f"Error during integrity check: {e}", file=sys.stderr)
print("Skipping compressed-image deletion for safety.", file=sys.stderr)
deletion_result = "skipped (check error)"
elif delete_flag and not zip_ok:
print("zip creation did not succeed; skipping compressed-image deletion.")
deletion_result = "skipped (zip not created)"
# Summary log
print("-" * 50)
print("Writing summary log...")
log_filename = f"{basename}_log_{timestamp}.txt"
log_path = os.path.join(output_dir, log_filename)
try:
with open(log_path, "w", encoding="utf-8") as log:
log.write(f"{'=' * 50}\n")
log.write(f" Run summary\n")
log.write(f"{'=' * 50}\n")
log.write(f"Base name : {basename}\n")
log.write(f"Run at : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
log.write(f"Compression level: {compression_level}\n")
log.write(f"JPEG quality : {jpeg_quality}\n")
log.write(f"Longest edge cap : {max_size if max_size else 'keep original size'}\n")
log.write(f"Input folder : {photo_dir}\n")
log.write(f"Output folder : {output_dir}\n")
log.write(f"\n{'─' * 50}\n")
log.write(f" Results\n")
log.write(f"{'─' * 50}\n")
log.write(f"Target files : {total}\n")
log.write(f"Success : {success_count}\n")
log.write(f"Failed : {fail_count}\n")
if total_original > 0:
log.write(f"Total size : {total_original/1024/1024:.2f}MB → {total_compressed/1024/1024:.2f}MB ({total_reduction:+.1f}%)\n")
log.write(f"\n{'─' * 50}\n")
log.write(f" zip / deletion\n")
log.write(f"{'─' * 50}\n")
log.write(f"Create zip : {'yes' if create_zip_flag else 'no'}\n")
if zip_filename:
log.write(f"zip filename : {zip_filename}\n")
log.write(f"zip path : {zip_path}\n")
log.write(f"zip integrity : {zip_integrity_result}\n")
log.write(f"Image deletion : {deletion_result}\n")
log.write(f"\n{'─' * 50}\n")
log.write(f" Per-file results\n")
log.write(f"{'─' * 50}\n")
for r in sorted(per_file_results, key=lambda x: x["filename"]):
if r["status"] == "success":
log.write(
f" [success] {r['filename']} "
f"{r['original_size']/1024:.1f}KB → {r['compressed_size']/1024:.1f}KB "
f"({r['reduction_rate']:+.1f}%)\n"
)
else:
log.write(f" [failed] {r['filename']} error: {r['error']}\n")
log.write(f"{'=' * 50}\n")
print(f"Log saved: {log_filename}")
print(f"Log path: {log_path}")
except Exception as e:
print(f"Log write failed: {e}", file=sys.stderr)
if __name__ == "__main__":
main() The header pulls in the standard library essentials (sys / json / os / zipfile / datetime), concurrent.futures for the process pool, and Pillow (PIL) for the actual image work. The # Assumptions: block is the design contract written in comment form: PNG stays lossless, deletion only runs after the zip is verified, ZIP_DEFLATED is picked for standardization rather than size. The 圧縮設定 dict keys each preset to a jpeg_quality / max_size pair, so changing a preset later is a one-line dictionary edit.
圧縮処理() is the per-file worker invoked by the process pool. Image.open is wrapped in a with block so the file handle always closes. JPEG gets optimize=True + progressive=True and inherits EXIF, PNG uses optimize=True alone (lossless — quality knobs do not apply), WebP uses method=6 for the slowest/smallest setting, TIFF picks LZW compression, and BMP falls through with no compression option. The function returns {"ok": True/False, ...} so that even when a single image throws, the exception is captured and the pool keeps running.
ProcessPoolExecutor(max_workers=ワーカー数) spawns the pool and submits every task up front into the future_to_name dict. as_completed then yields results in finish order, not submit order, so the log prints "[3/120] IMG_0042.jpg …" the moment each worker wraps. Success rows push into ファイル別結果 with status success; failures push with status: "failed" and the error message. That array becomes the input for the summary log at the end.
The delete path is gated by a three-stage check. zf.testzip() walks every member and verifies its CRC. If that passes, the script compares set(zf.namelist()) against set(成功ファイル一覧), looking for anything the compression step produced but that did not make it into the archive. Only when both checks return clean does it move to os.remove. Any single failure — CRC mismatch, missing filename, or an exception during the check — falls straight through to a skip with a recorded reason.
How it works
Compression presets trade quality for size
圧縮設定 carries three levels. Light runs at quality 85 with no resize, Balanced at quality 75 capped to 2560px on the longest edge, and Strong at quality 60 capped to 1920px. Balanced is the right default for reports and SharePoint uploads, Light fits archival work where resize is unwanted, and Strong fits email attachments where the inbox limit is the hard constraint.
ProcessPoolExecutor keeps every CPU core busy
Image re-encoding is CPU-bound, so ProcessPoolExecutor is the right tool — threading would hit the GIL and stay single-core. Worker count defaults to os.cpu_count() - 1 (minimum 1), leaving one core free for the OS and other apps. as_completed streams results as each worker finishes, so the log shows real progress rather than going silent until the whole batch wraps.
Per-extension branches inside the worker
Inside 圧縮処理() each extension picks its own img.save() parameters. JPEG uses optimize=True + progressive=True with EXIF passthrough, PNG uses optimize=True alone (lossless — the quality knob does not apply), WebP uses method=6 for the highest compression effort, and TIFF uses LZW. BMP has no compression option and passes through. PNG ignoring jpeg_quality is intentional, not a bug.
Three-stage integrity check before any delete
Even with delete_compressed turned on, nothing is removed until (1) zipfile.ZipFile.testzip() clears CRC verification, (2) the set of filenames inside the zip matches the set of successfully compressed files, and (3) both succeed. Any single failure — corrupt member, missing file, or an exception during the check — falls through to a skip. A half-written zip never gets to delete source data.
Customization
Add or tune a compression preset
Extend the 圧縮設定 dict with any combination you need — for example "Ultra (email)": {"jpeg_quality": 45, "max_size": 1280} — and add the same label to the field's options list so it shows up in the UI. Rough jpeg_quality guidelines: 90+ is visually lossless, around 70 is fine for the web, and anything under 50 starts to look like a thumbnail.
Pin the worker count
Replace ワーカー数 = max(1, コア数 - 1) with a fixed value like ワーカー数 = 4 when you want headroom for other work, or set it to コア数 to use every core. On a shared server, 2 to 4 is usually the polite upper bound.
Swap the resize algorithm
img.resize((新w, 新h), Image.LANCZOS) uses Lanczos, the quality-first option. For a faster middle ground, use Image.BILINEAR; for the fastest (but visibly blocky) result, Image.NEAREST. NEAREST is fine for pixel art but unusable for photos.
Troubleshooting
PIL.UnidentifiedImageError: cannot identify image file
Pillow cannot decode the file even though the extension looks right. Either it is corrupt, or the extension lies about the real format (an iPhone HEIC saved as .jpg is the classic case). Confirm it opens in Explorer preview; if that fails, re-export the source. For HEIC, install pillow-heif and add from pillow_heif import register_heif_opener; register_heif_opener() near the imports to unlock the format.
BrokenProcessPool stops the run partway through
One worker ran out of memory — often on a 4K-plus image — or Pillow crashed on a malformed file, and the whole pool went down with it. Drop ワーカー数 = max(1, コア数 - 1) to ワーカー数 = 2 and rerun. To isolate the offender, temporarily set ワーカー数 = 1; the last filename logged before the crash is the one that blew up.
PermissionError [WinError 32] when deleting compressed images
A viewer — Explorer preview, the Photos app, an image editor — still holds the file open, so os.remove fails. The log shows delete failed on those entries, but the zip is already complete and validated, so no data is lost. Close the viewer (and the Explorer preview pane), then manually clean up the remaining files in the output folder. Everything worth keeping is still inside the zip.
The zip is barely smaller than the source folder
Expected — JPEG and WebP are already entropy-compressed, so ZIP_DEFLATED has almost nothing left to squeeze out of them. The zip step is there for bundling, not for extra compression. To shrink the actual output, switch 圧縮度 to Strong (smallest size), or edit 圧縮設定 and lower max_size (for example to 1280px) on the preset you use.
FAQ
Why processes instead of threads?
Python's GIL (Global Interpreter Lock) prevents threading.Thread from running CPU-bound work across multiple cores. Image re-encoding is entirely CPU-bound, so only ProcessPoolExecutor delivers real parallelism here. Threads are still the right tool for I/O-bound work like network calls — just not for this workload.
Is jpeg_quality ignored on PNG?
Yes. PNG is lossless and has no quality parameter, so the jpeg_quality value never enters the PNG branch. PNG only honors optimize=True, which tunes the filter/compression choice while keeping pixels bit-exact. That makes it the safe default for transparent icons or screenshots with text.
Why do some preset keys appear twice in the dict?
It is belt-and-braces to match UI values that differ only in parenthesis style (Japanese full-width vs. ASCII half-width). Python overwrites duplicate dict keys with the later value, which is fine because both entries carry identical payloads. Combined with the normalize step, the script never fails on a punctuation mismatch from the input form.
Are the original (uncompressed) images ever deleted?
No. The delete step only removes compressed files in 出力先フォルダ; files in 写真フォルダ (the source) are never touched. The source is treated as read-only throughout, so there is no code path that can wipe it. To clear the originals as well, move or archive them manually after the run, or add a separate cleanup script.
Import the .pybes file into Pybes and the script — along with its config fields — loads automatically.