List Folder Files Script
List every file in a Windows folder with name, extension, size, and modified time, then export to Excel, CSV, TXT, or Markdown. Recursive subfolder traversal and per-extension aggregation make this ideal for file audits and inventory checks.
A Python script that walks a folder, captures every file's name, extension, size, and modified time, and exports the list to Excel, CSV, TXT, or Markdown. Recursive subfolder traversal is supported out of the box — perfect for auditing shared drives or running deliverable checks without clicking through Explorer properties one file at a time.
What this script can do
- List files with modified time, size, and extension
- Recursive subfolder traversal
- Automatic per-extension count and total size aggregation
- Export to xlsx / csv / txt / md
- Skip permission-denied folders automatically
Import the .pybes file into Pybes and the script — along with its config fields — loads automatically.
Config fields
These are the config fields this script uses. Enter values through the Pybes GUI at runtime.
target_dir Folder Required Target folder
Folder to list files from
output_format Dropdown Required Output format
One of xlsx / csv / txt / md
Default: xlsx
include_detail Checkbox Required Include details
Include extension, size, and modified time (true / false)
Default: true
include_subfolders Checkbox Required Include subfolders
Recursively walk into subfolders (true / false)
Default: true
output_dir Folder Required Output folder
Destination folder for the result file
Code walkthrough
import sys
import json
import os
import datetime
with open(sys.argv[1], encoding="utf-8") as f:
inputs = json.load(f)
target_dir = inputs["target_dir"]
output_format = inputs["output_format"]
include_detail = inputs["include_detail"] == "true"
include_subfolders = inputs["include_subfolders"] == "true"
output_dir = inputs["output_dir"]
print(f"Target folder: {target_dir}")
print(f"Output format: {output_format}")
print(f"Include details: {include_detail}")
print(f"Include subfolders: {include_subfolders}")
# Collect the file list
records = []
def collect_files(root):
try:
entries = os.scandir(root)
except PermissionError as e:
print(f"Access denied (skipping): {root}", file=sys.stderr)
return
for entry in entries:
try:
if entry.is_file(follow_symlinks=False):
rel_path = os.path.relpath(entry.path, target_dir)
stat = entry.stat()
size_kb = round(stat.st_size / 1024, 2)
modified = datetime.datetime.fromtimestamp(stat.st_mtime).strftime("%Y-%m-%d %H:%M:%S")
ext = os.path.splitext(entry.name)[1].lower() or "(none)"
if include_detail:
records.append({
"filename": entry.name,
"relative_path": rel_path,
"extension": ext,
"size_kb": size_kb,
"modified_at": modified,
})
else:
records.append({
"filename": entry.name,
"relative_path": rel_path,
"extension": ext,
"size_kb": size_kb, # kept internally for summary calculations even when detail is off
})
elif entry.is_dir(follow_symlinks=False) and include_subfolders:
collect_files(entry.path)
except Exception as e:
print(f"Skipping: {entry.path} - {e}", file=sys.stderr)
collect_files(target_dir)
print(f"{len(records)} files detected")
if len(records) == 0:
print("No files found. Exiting.")
sys.exit(0)
# Summary calculation
total_count = len(records)
total_size_kb = round(sum(r["size_kb"] for r in records), 2)
total_size_mb = round(total_size_kb / 1024, 2)
# Per-extension aggregation
ext_summary = {}
for r in records:
ext = r["extension"]
if ext not in ext_summary:
ext_summary[ext] = {"extension": ext, "file_count": 0, "total_size_kb": 0.0}
ext_summary[ext]["file_count"] += 1
ext_summary[ext]["total_size_kb"] += r["size_kb"]
for v in ext_summary.values():
v["total_size_kb"] = round(v["total_size_kb"], 2)
v["total_size_mb"] = round(v["total_size_kb"] / 1024, 2)
ext_summary_rows = sorted(ext_summary.values(), key=lambda x: x["file_count"], reverse=True)
# Strip size from output records when detail is off
if not include_detail:
output_records = [{"filename": r["filename"], "relative_path": r["relative_path"]} for r in records]
else:
output_records = records
# Timestamped filename
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
os.makedirs(output_dir, exist_ok=True)
try:
if output_format == "xlsx":
import openpyxl
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils import get_column_letter
output_path = os.path.join(output_dir, f"filelist_{timestamp}.xlsx")
wb = openpyxl.Workbook()
# ── Sheet 1: File list ──
ws1 = wb.active
ws1.title = "File list"
headers = list(output_records[0].keys())
ws1.append(headers)
for cell in ws1[1]:
cell.font = Font(bold=True)
cell.fill = PatternFill("solid", fgColor="D9E1F2")
for rec in output_records:
ws1.append([rec[h] for h in headers])
# Summary row
ws1.append([])
summary_row_idx = ws1.max_row + 1
ws1.append(["Total", "", f"Files: {total_count} Total size: {total_size_kb} KB ({total_size_mb} MB)"])
for cell in ws1[ws1.max_row]:
cell.font = Font(bold=True)
cell.fill = PatternFill("solid", fgColor="FFF2CC")
# Auto-adjust column widths
for col in ws1.columns:
max_len = max((len(str(cell.value or "")) for cell in col), default=10)
ws1.column_dimensions[get_column_letter(col[0].column)].width = min(max_len + 4, 60)
# ── Sheet 2: Per-extension summary ──
ws2 = wb.create_sheet("Per-extension summary")
ext_headers = ["extension", "file_count", "total_size_kb", "total_size_mb"]
ws2.append(ext_headers)
for cell in ws2[1]:
cell.font = Font(bold=True)
cell.fill = PatternFill("solid", fgColor="D9E1F2")
for row in ext_summary_rows:
ws2.append([row[h] for h in ext_headers])
# Total row
ws2.append([])
ws2.append(["Total", total_count, total_size_kb, total_size_mb])
for cell in ws2[ws2.max_row]:
cell.font = Font(bold=True)
cell.fill = PatternFill("solid", fgColor="FFF2CC")
for col in ws2.columns:
max_len = max((len(str(cell.value or "")) for cell in col), default=10)
ws2.column_dimensions[get_column_letter(col[0].column)].width = min(max_len + 4, 40)
wb.save(output_path)
elif output_format == "csv":
import csv
output_path = os.path.join(output_dir, f"filelist_{timestamp}.csv")
headers = list(output_records[0].keys())
with open(output_path, "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=headers)
writer.writeheader()
writer.writerows(output_records)
# Summary rows
f.write("\n")
f.write(f"Total files,{total_count}\n")
f.write(f"Total size (KB),{total_size_kb}\n")
f.write(f"Total size (MB),{total_size_mb}\n")
f.write("\nPer-extension summary\n")
ext_writer = csv.DictWriter(f, fieldnames=["extension", "file_count", "total_size_kb", "total_size_mb"])
ext_writer.writeheader()
ext_writer.writerows(ext_summary_rows)
f.write(f"Total,{total_count},{total_size_kb},{total_size_mb}\n")
elif output_format == "txt":
output_path = os.path.join(output_dir, f"filelist_{timestamp}.txt")
headers = list(output_records[0].keys())
lines = []
lines.append("=" * 60)
lines.append("File list")
lines.append("=" * 60)
lines.append(f"Target folder : {target_dir}")
lines.append(f"Generated : {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
lines.append(f"File count : {total_count}")
lines.append(f"Total size : {total_size_kb} KB ({total_size_mb} MB)")
lines.append("=" * 60)
lines.append("")
# Compute column widths for padding
col_widths = {h: max(len(h), max(len(str(r[h])) for r in output_records)) for h in headers}
header_line = " ".join(h.ljust(col_widths[h]) for h in headers)
lines.append(header_line)
lines.append("-" * len(header_line))
for rec in output_records:
lines.append(" ".join(str(rec[h]).ljust(col_widths[h]) for h in headers))
lines.append("")
lines.append("=" * 60)
lines.append("Per-extension summary")
lines.append("=" * 60)
ext_headers = ["extension", "file_count", "total_size_kb", "total_size_mb"]
ext_col_widths = {h: max(len(h), max(len(str(r[h])) for r in ext_summary_rows)) for h in ext_headers}
ext_header_line = " ".join(h.ljust(ext_col_widths[h]) for h in ext_headers)
lines.append(ext_header_line)
lines.append("-" * len(ext_header_line))
for row in ext_summary_rows:
lines.append(" ".join(str(row[h]).ljust(ext_col_widths[h]) for h in ext_headers))
lines.append("-" * len(ext_header_line))
lines.append(" ".join(str(v).ljust(ext_col_widths[h]) for h, v in zip(ext_headers, ["Total", total_count, total_size_kb, total_size_mb])))
with open(output_path, "w", encoding="utf-8") as f:
f.write("\n".join(lines))
elif output_format == "md":
output_path = os.path.join(output_dir, f"filelist_{timestamp}.md")
headers = list(output_records[0].keys())
lines = []
lines.append("# File list")
lines.append("")
lines.append(f"Target folder: `{target_dir}` ")
lines.append(f"Generated: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')} ")
lines.append(f"Files: **{total_count}** Total size: **{total_size_kb} KB ({total_size_mb} MB)** ")
lines.append("")
# File list table
lines.append("## File list")
lines.append("")
lines.append("| " + " | ".join(headers) + " |")
lines.append("| " + " | ".join(["---"] * len(headers)) + " |")
for rec in output_records:
lines.append("| " + " | ".join(str(rec[h]) for h in headers) + " |")
lines.append("")
# Per-extension summary table
lines.append("## Per-extension summary")
lines.append("")
ext_headers = ["extension", "file_count", "total_size_kb", "total_size_mb"]
lines.append("| " + " | ".join(ext_headers) + " |")
lines.append("| " + " | ".join(["---"] * len(ext_headers)) + " |")
for row in ext_summary_rows:
lines.append("| " + " | ".join(str(row[h]) for h in ext_headers) + " |")
lines.append(f"| **Total** | **{total_count}** | **{total_size_kb}** | **{total_size_mb}** |")
with open(output_path, "w", encoding="utf-8") as f:
f.write("\n".join(lines))
else:
print(f"Unsupported output format: {output_format}", file=sys.stderr)
sys.exit(1)
print(f"Output completed: {output_path}")
except Exception as e:
print(f"Output error: {e}", file=sys.stderr)
sys.exit(1)
print("Done") After importing standard-library modules (sys / json / os / datetime), the script reads the JSON config passed in sys.argv[1] into an inputs dictionary. Note that Pybes delivers checkbox values as the strings "true" / "false" — the script explicitly compares to "true" to convert them into Python booleans.
collect_files recursively walks the folder tree, appending each file's name, relative path, extension, size, and modified time to records. A PermissionError on a folder is caught at the os.scandir call, and per-entry exceptions are swallowed and logged so one bad file does not abort the entire walk.
Computes total count and total size (KB / MB), plus a per-extension summary of { extension, count, total size }. The summary dictionary is built first and then sorted() — relying on the Python 3.7+ guarantee that dictionaries preserve insertion order.
Dispatches on output_format for xlsx / csv / txt / md. xlsx uses openpyxl with two sheets (file list + extension summary); csv uses csv.DictWriter; txt builds a padded text table via ljust; md writes a Markdown table. The whole block is wrapped in try / except so output-folder permission errors surface in the log rather than aborting silently.
How it works
Using os.scandir for fast traversal
os.scandir is faster than os.listdir because each directory entry exposes stat (size, modified time) without an extra system call. The collect_files function calls itself recursively to walk the entire tree while collecting each file's metadata in a single pass.
Per-extension aggregation via a dictionary
An ext_summary dictionary accumulates { extension → count / total size } as files are discovered, then sorted() orders the result by count descending. Rounding is applied once at the end to avoid floating-point drift during accumulation.
Branching output by format
xlsx uses openpyxl with two sheets (file list + extension summary); csv uses csv.DictWriter; txt builds a text table via ljust padding; md writes a Markdown table. Every output includes a YYYYMMDD_HHMMSS timestamp in the filename, so repeated runs never overwrite prior results.
Customization
Filter by specific extensions
Add a one-line filter such as if ext not in (".xlsx", ".pdf"): continue just before records.append(...). To exclude extensions instead, flip the condition: if ext in (".tmp", ".log"): continue.
Sort by modified time or size
After collect_files(target_dir), add records.sort(key=lambda r: r["更新日時"], reverse=True) to sort newest first. Swap the key to r["サイズ(KB)"] to sort by size.
Add a regex filename filter
Add import re at the top, build pattern = re.compile(r"^invoice_.*"), then add if not pattern.match(entry.name): continue. Handy for constraints like "only files starting with invoice_" without restructuring the rest of the script.
Troubleshooting
PermissionError is raised and the script aborts
collect_files already catches PermissionError from os.scandir and skips the folder after logging it to sys.stderr. If the script still aborts, the output folder itself is likely unwritable — point it to a writable location such as C:\Users\<you>\Documents.
PermissionError: [Errno 13] when saving xlsx
A previous output file is likely open in Excel, which holds a write lock. Close the file or choose a different output folder and rerun. The timestamped filename prevents collisions unless runs happen within the same second.
Japanese characters are garbled when the CSV is opened in Excel
The CSV is UTF-8 encoded but Excel assumes Shift_JIS. Either import via Data → From Text/CSV with UTF-8 selected explicitly, or change encoding="utf-8" to encoding="utf-8-sig" (adds a BOM) so Excel opens it correctly by default.
FAQ
Does this work on folders with tens of thousands of files?
Yes. os.scandir is memory-efficient, so ~50,000 files typically finish within a minute. That said, the script holds records entirely in memory — for folders exceeding a million files, refactor to streaming writes as you walk the tree.
Are hidden files and .DS_Store included?
Yes. os.scandir returns every entry the OS exposes, including Windows hidden-attribute files and macOS .DS_Store. Add if entry.name.startswith("."): continue inside collect_files if you want to exclude dotfiles.
Why openpyxl instead of pandas?
Because we want to apply cell styles (fill colors, bold headers). pandas.to_excel works for plain data, but serious formatting eventually requires dropping into openpyxl anyway — so using it directly keeps the code shorter. Pair with pandas only if you need DataFrame-style transforms on the rows.
Import the .pybes file into Pybes and the script — along with its config fields — loads automatically.