File Operations Intermediate Updated: 2026-04-26

List Folder Files Script

Q: Does this work on folders with tens of thousands of files?

Yes. `os.scandir` is memory-efficient, so ~50,000 files typically finish within a minute. That said, the script holds `records` entirely in memory — for folders exceeding a million files, refactor to streaming writes as you walk the tree.

Q: Are hidden files and .DS_Store included?

Yes. `os.scandir` returns every entry the OS exposes, including Windows hidden-attribute files and macOS `.DS_Store`. Add `if entry.name.startswith("."): continue` inside `collect_files` if you want to exclude dotfiles.

List every file in a Windows folder with name, extension, size, and modified time, then export to Excel, CSV, TXT, or Markdown. Recursive subfolder traversal and per-extension aggregation make this ideal for file audits and inventory checks.

A Python script that walks a folder, captures every file's name, extension, size, and modified time, and exports the list to Excel, CSV, TXT, or Markdown. Recursive subfolder traversal is supported out of the box — perfect for auditing shared drives or running deliverable checks without clicking through Explorer properties one file at a time.

What this script can do

List files with modified time, size, and extension
Recursive subfolder traversal
Automatic per-extension count and total size aggregation
Export to xlsx / csv / txt / md
Skip permission-denied folders automatically

Download list-folder-files.pybes

Import the .pybes file into Pybes and the script — along with its config fields — loads automatically.

Config fields

These are the config fields this script uses. Enter values through the Pybes GUI at runtime.

target_dir Folder Required

Target folder

Folder to list files from

output_format Dropdown Required

Output format

One of xlsx / csv / txt / md

Options: xlsx, csv, txt, md

Default: xlsx

include_detail Checkbox Required

Include details

Include extension, size, and modified time (true / false)

Default: true

include_subfolders Checkbox Required

Include subfolders

Recursively walk into subfolders (true / false)

Default: true

output_dir Folder Required

Output folder

Destination folder for the result file

Code walkthrough

import sys
import json
import os
import datetime

with open(sys.argv[1], encoding="utf-8") as f:
    inputs = json.load(f)

target_dir = inputs["target_dir"]
output_format = inputs["output_format"]
include_detail = inputs["include_detail"] == "true"
include_subfolders = inputs["include_subfolders"] == "true"
output_dir = inputs["output_dir"]

print(f"Target folder: {target_dir}")
print(f"Output format: {output_format}")
print(f"Include details: {include_detail}")
print(f"Include subfolders: {include_subfolders}")

# Collect the file list
records = []

def collect_files(root):
    try:
        entries = os.scandir(root)
    except PermissionError as e:
        print(f"Access denied (skipping): {root}", file=sys.stderr)
        return

    for entry in entries:
        try:
            if entry.is_file(follow_symlinks=False):
                rel_path = os.path.relpath(entry.path, target_dir)
                stat = entry.stat()
                size_kb = round(stat.st_size / 1024, 2)
                modified = datetime.datetime.fromtimestamp(stat.st_mtime).strftime("%Y-%m-%d %H:%M:%S")
                ext = os.path.splitext(entry.name)[1].lower() or "(none)"
                if include_detail:
                    records.append({
                        "filename": entry.name,
                        "relative_path": rel_path,
                        "extension": ext,
                        "size_kb": size_kb,
                        "modified_at": modified,
                    })
                else:
                    records.append({
                        "filename": entry.name,
                        "relative_path": rel_path,
                        "extension": ext,
                        "size_kb": size_kb,  # kept internally for summary calculations even when detail is off
                    })
            elif entry.is_dir(follow_symlinks=False) and include_subfolders:
                collect_files(entry.path)
        except Exception as e:
            print(f"Skipping: {entry.path} - {e}", file=sys.stderr)

collect_files(target_dir)
print(f"{len(records)} files detected")

if len(records) == 0:
    print("No files found. Exiting.")
    sys.exit(0)

# Summary calculation
total_count = len(records)
total_size_kb = round(sum(r["size_kb"] for r in records), 2)
total_size_mb = round(total_size_kb / 1024, 2)

# Per-extension aggregation
ext_summary = {}
for r in records:
    ext = r["extension"]
    if ext not in ext_summary:
        ext_summary[ext] = {"extension": ext, "file_count": 0, "total_size_kb": 0.0}
    ext_summary[ext]["file_count"] += 1
    ext_summary[ext]["total_size_kb"] += r["size_kb"]
for v in ext_summary.values():
    v["total_size_kb"] = round(v["total_size_kb"], 2)
    v["total_size_mb"] = round(v["total_size_kb"] / 1024, 2)
ext_summary_rows = sorted(ext_summary.values(), key=lambda x: x["file_count"], reverse=True)

# Strip size from output records when detail is off
if not include_detail:
    output_records = [{"filename": r["filename"], "relative_path": r["relative_path"]} for r in records]
else:
    output_records = records

# Timestamped filename
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
os.makedirs(output_dir, exist_ok=True)

try:
    if output_format == "xlsx":
        import openpyxl
        from openpyxl.styles import Font, PatternFill, Alignment
        from openpyxl.utils import get_column_letter

        output_path = os.path.join(output_dir, f"filelist_{timestamp}.xlsx")
        wb = openpyxl.Workbook()

        # ── Sheet 1: File list ──
        ws1 = wb.active
        ws1.title = "File list"
        headers = list(output_records[0].keys())
        ws1.append(headers)
        for cell in ws1[1]:
            cell.font = Font(bold=True)
            cell.fill = PatternFill("solid", fgColor="D9E1F2")
        for rec in output_records:
            ws1.append([rec[h] for h in headers])
        # Summary row
        ws1.append([])
        summary_row_idx = ws1.max_row + 1
        ws1.append(["Total", "", f"Files: {total_count}  Total size: {total_size_kb} KB ({total_size_mb} MB)"])
        for cell in ws1[ws1.max_row]:
            cell.font = Font(bold=True)
            cell.fill = PatternFill("solid", fgColor="FFF2CC")
        # Auto-adjust column widths
        for col in ws1.columns:
            max_len = max((len(str(cell.value or "")) for cell in col), default=10)
            ws1.column_dimensions[get_column_letter(col[0].column)].width = min(max_len + 4, 60)

        # ── Sheet 2: Per-extension summary ──
        ws2 = wb.create_sheet("Per-extension summary")
        ext_headers = ["extension", "file_count", "total_size_kb", "total_size_mb"]
        ws2.append(ext_headers)
        for cell in ws2[1]:
            cell.font = Font(bold=True)
            cell.fill = PatternFill("solid", fgColor="D9E1F2")
        for row in ext_summary_rows:
            ws2.append([row[h] for h in ext_headers])
        # Total row
        ws2.append([])
        ws2.append(["Total", total_count, total_size_kb, total_size_mb])
        for cell in ws2[ws2.max_row]:
            cell.font = Font(bold=True)
            cell.fill = PatternFill("solid", fgColor="FFF2CC")
        for col in ws2.columns:
            max_len = max((len(str(cell.value or "")) for cell in col), default=10)
            ws2.column_dimensions[get_column_letter(col[0].column)].width = min(max_len + 4, 40)

        wb.save(output_path)

    elif output_format == "csv":
        import csv
        output_path = os.path.join(output_dir, f"filelist_{timestamp}.csv")
        headers = list(output_records[0].keys())
        with open(output_path, "w", encoding="utf-8", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=headers)
            writer.writeheader()
            writer.writerows(output_records)
            # Summary rows
            f.write("\n")
            f.write(f"Total files,{total_count}\n")
            f.write(f"Total size (KB),{total_size_kb}\n")
            f.write(f"Total size (MB),{total_size_mb}\n")
            f.write("\nPer-extension summary\n")
            ext_writer = csv.DictWriter(f, fieldnames=["extension", "file_count", "total_size_kb", "total_size_mb"])
            ext_writer.writeheader()
            ext_writer.writerows(ext_summary_rows)
            f.write(f"Total,{total_count},{total_size_kb},{total_size_mb}\n")

    elif output_format == "txt":
        output_path = os.path.join(output_dir, f"filelist_{timestamp}.txt")
        headers = list(output_records[0].keys())
        lines = []
        lines.append("=" * 60)
        lines.append("File list")
        lines.append("=" * 60)
        lines.append(f"Target folder : {target_dir}")
        lines.append(f"Generated     : {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        lines.append(f"File count    : {total_count}")
        lines.append(f"Total size    : {total_size_kb} KB ({total_size_mb} MB)")
        lines.append("=" * 60)
        lines.append("")
        # Compute column widths for padding
        col_widths = {h: max(len(h), max(len(str(r[h])) for r in output_records)) for h in headers}
        header_line = "  ".join(h.ljust(col_widths[h]) for h in headers)
        lines.append(header_line)
        lines.append("-" * len(header_line))
        for rec in output_records:
            lines.append("  ".join(str(rec[h]).ljust(col_widths[h]) for h in headers))
        lines.append("")
        lines.append("=" * 60)
        lines.append("Per-extension summary")
        lines.append("=" * 60)
        ext_headers = ["extension", "file_count", "total_size_kb", "total_size_mb"]
        ext_col_widths = {h: max(len(h), max(len(str(r[h])) for r in ext_summary_rows)) for h in ext_headers}
        ext_header_line = "  ".join(h.ljust(ext_col_widths[h]) for h in ext_headers)
        lines.append(ext_header_line)
        lines.append("-" * len(ext_header_line))
        for row in ext_summary_rows:
            lines.append("  ".join(str(row[h]).ljust(ext_col_widths[h]) for h in ext_headers))
        lines.append("-" * len(ext_header_line))
        lines.append("  ".join(str(v).ljust(ext_col_widths[h]) for h, v in zip(ext_headers, ["Total", total_count, total_size_kb, total_size_mb])))
        with open(output_path, "w", encoding="utf-8") as f:
            f.write("\n".join(lines))

    elif output_format == "md":
        output_path = os.path.join(output_dir, f"filelist_{timestamp}.md")
        headers = list(output_records[0].keys())
        lines = []
        lines.append("# File list")
        lines.append("")
        lines.append(f"Target folder: `{target_dir}`  ")
        lines.append(f"Generated: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}  ")
        lines.append(f"Files: **{total_count}**  Total size: **{total_size_kb} KB ({total_size_mb} MB)**  ")
        lines.append("")
        # File list table
        lines.append("## File list")
        lines.append("")
        lines.append("| " + " | ".join(headers) + " |")
        lines.append("| " + " | ".join(["---"] * len(headers)) + " |")
        for rec in output_records:
            lines.append("| " + " | ".join(str(rec[h]) for h in headers) + " |")
        lines.append("")
        # Per-extension summary table
        lines.append("## Per-extension summary")
        lines.append("")
        ext_headers = ["extension", "file_count", "total_size_kb", "total_size_mb"]
        lines.append("| " + " | ".join(ext_headers) + " |")
        lines.append("| " + " | ".join(["---"] * len(ext_headers)) + " |")
        for row in ext_summary_rows:
            lines.append("| " + " | ".join(str(row[h]) for h in ext_headers) + " |")
        lines.append(f"| **Total** | **{total_count}** | **{total_size_kb}** | **{total_size_mb}** |")
        with open(output_path, "w", encoding="utf-8") as f:
            f.write("\n".join(lines))

    else:
        print(f"Unsupported output format: {output_format}", file=sys.stderr)
        sys.exit(1)

    print(f"Output completed: {output_path}")

except Exception as e:
    print(f"Output error: {e}", file=sys.stderr)
    sys.exit(1)

print("Done")

L1–18

After importing standard-library modules (sys / json / os / datetime), the script reads the JSON config passed in sys.argv[1] into an inputs dictionary. Note that Pybes delivers checkbox values as the strings "true" / "false" — the script explicitly compares to "true" to convert them into Python booleans.

L20–56

collect_files recursively walks the folder tree, appending each file's name, relative path, extension, size, and modified time to records. A PermissionError on a folder is caught at the os.scandir call, and per-entry exceptions are swallowed and logged so one bad file does not abort the entire walk.

L65–81

Computes total count and total size (KB / MB), plus a per-extension summary of { extension, count, total size }. The summary dictionary is built first and then sorted() — relying on the Python 3.7+ guarantee that dictionaries preserve insertion order.

L93–240

Dispatches on output_format for xlsx / csv / txt / md. xlsx uses openpyxl with two sheets (file list + extension summary); csv uses csv.DictWriter; txt builds a padded text table via ljust; md writes a Markdown table. The whole block is wrapped in try / except so output-folder permission errors surface in the log rather than aborting silently.

How it works

Using os.scandir for fast traversal

os.scandir is faster than os.listdir because each directory entry exposes stat (size, modified time) without an extra system call. The collect_files function calls itself recursively to walk the entire tree while collecting each file's metadata in a single pass.

Per-extension aggregation via a dictionary

An ext_summary dictionary accumulates { extension → count / total size } as files are discovered, then sorted() orders the result by count descending. Rounding is applied once at the end to avoid floating-point drift during accumulation.

Branching output by format

xlsx uses openpyxl with two sheets (file list + extension summary); csv uses csv.DictWriter; txt builds a text table via ljust padding; md writes a Markdown table. Every output includes a YYYYMMDD_HHMMSS timestamp in the filename, so repeated runs never overwrite prior results.

Customization

Filter by specific extensions

Add a one-line filter such as if ext not in (".xlsx", ".pdf"): continue just before records.append(...). To exclude extensions instead, flip the condition: if ext in (".tmp", ".log"): continue.

Sort by modified time or size

After collect_files(target_dir), add records.sort(key=lambda r: r["更新日時"], reverse=True) to sort newest first. Swap the key to r["サイズ(KB)"] to sort by size.

Add a regex filename filter

Add import re at the top, build pattern = re.compile(r"^invoice_.*"), then add if not pattern.match(entry.name): continue. Handy for constraints like "only files starting with invoice_" without restructuring the rest of the script.

Troubleshooting

PermissionError is raised and the script aborts

collect_files already catches PermissionError from os.scandir and skips the folder after logging it to sys.stderr. If the script still aborts, the output folder itself is likely unwritable — point it to a writable location such as C:\Users\<you>\Documents.

PermissionError: [Errno 13] when saving xlsx

A previous output file is likely open in Excel, which holds a write lock. Close the file or choose a different output folder and rerun. The timestamped filename prevents collisions unless runs happen within the same second.

Japanese characters are garbled when the CSV is opened in Excel

The CSV is UTF-8 encoded but Excel assumes Shift_JIS. Either import via Data → From Text/CSV with UTF-8 selected explicitly, or change encoding="utf-8" to encoding="utf-8-sig" (adds a BOM) so Excel opens it correctly by default.

FAQ

Does this work on folders with tens of thousands of files?

Yes. os.scandir is memory-efficient, so ~50,000 files typically finish within a minute. That said, the script holds records entirely in memory — for folders exceeding a million files, refactor to streaming writes as you walk the tree.

Are hidden files and .DS_Store included?

Yes. os.scandir returns every entry the OS exposes, including Windows hidden-attribute files and macOS .DS_Store. Add if entry.name.startswith("."): continue inside collect_files if you want to exclude dotfiles.

Why openpyxl instead of pandas?

Because we want to apply cell styles (fill colors, bold headers). pandas.to_excel works for plain data, but serious formatting eventually requires dropping into openpyxl anyway — so using it directly keeps the code shorter. Pair with pandas only if you need DataFrame-style transforms on the rows.

See more common questions →