AgentSkillsCN

pdfco

掌握 PDF.co MCP 服务器的连接技巧。在进行 HTML 转 PDF、文档操作,以及构建可靠 PDF 的最佳实践时,可灵活运用此方法。

SKILL.md
--- frontmatter
name: pdfco
description: Connection skill for PDF.co MCP server. Provides guidance on HTML-to-PDF conversion, document manipulation, and best practices for reliable PDF generation.
metadata:
  version: 1.0.0
  category: development
  tags:
    - pdf
    - documents
    - conversion
    - html
    - mcp
  author:
    name: NimbleBrain
    url: https://www.nimblebrain.ai

PDF.co Integration

This skill provides behavioral guidance for interacting with the PDF.co MCP server.

Tool Overview

ToolPurposeWhen to Use
html_to_pdfConvert HTML to PDFReports, quotes, invoices, certificates
pdf_to_textExtract text from PDFDocument analysis, content extraction
merge_pdfsCombine multiple PDFsConsolidating documents
split_pdfExtract pages from PDFBreaking apart documents
add_watermarkAdd watermark to PDFBranding, draft marking

html_to_pdf (Primary Tool)

Converts HTML content to a downloadable PDF document.

Parameters

ParameterRequiredDefaultDescription
htmlYes-HTML content (full document or fragment)
nameNo"document.pdf"Output filename
marginsNo"10mm"Page margins (CSS format)
paperSizeNo"Letter"Paper size: Letter, A4, Legal, Tabloid
orientationNo"Portrait"Portrait or Landscape
headerNo-HTML for page header
footerNo-HTML for page footer

HTML Best Practices

CRITICAL: Use inline CSS. External stylesheets will not load.

Reliable HTML structure:

html
<html>
<head>
  <style>
    body { font-family: Arial, sans-serif; padding: 20px; }
    h1 { color: #333; border-bottom: 2px solid #333; padding-bottom: 10px; }
    table { width: 100%; border-collapse: collapse; margin: 20px 0; }
    th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
    th { background-color: #f5f5f5; font-weight: bold; }
    .total { font-weight: bold; background-color: #f9f9f9; }
  </style>
</head>
<body>
  <h1>Document Title</h1>
  <!-- Content here -->
</body>
</html>

What Works Well

  • Tables - Render reliably for invoices, quotes, data
  • Basic CSS - Colors, fonts, borders, padding, margins
  • Images - Use base64 data URLs or absolute URLs
  • Web-safe fonts - Arial, Helvetica, Times New Roman, Georgia

What May Have Issues

  • Flexbox/Grid - Complex layouts may not render as expected
  • External resources - Fonts, images from external URLs may fail
  • JavaScript - Not executed
  • CSS variables - May not be supported

Document Type Patterns

For Quotes/Invoices:

html
<table>
  <thead>
    <tr><th>Item</th><th>Qty</th><th>Price</th><th>Total</th></tr>
  </thead>
  <tbody>
    <tr><td>Product A</td><td>2</td><td>$50</td><td>$100</td></tr>
  </tbody>
  <tfoot>
    <tr class="total"><td colspan="3">Total</td><td>$100</td></tr>
  </tfoot>
</table>

For Reports:

html
<h1>Monthly Report</h1>
<h2>Summary</h2>
<p>Key findings...</p>
<h2>Data</h2>
<table>...</table>
<h2>Recommendations</h2>
<ul><li>Action item 1</li></ul>

For Certificates:

html
<div style="text-align: center; padding: 40px;">
  <h1 style="font-size: 36px;">Certificate of Completion</h1>
  <p style="font-size: 24px; margin: 40px 0;">This certifies that</p>
  <p style="font-size: 32px; font-weight: bold;">John Doe</p>
  <p style="margin-top: 40px;">has successfully completed the program.</p>
</div>

pdf_to_text

Extracts text content from PDF documents.

Parameters

ParameterRequiredDescription
urlYesURL to the PDF file
pagesNoPage range: "1-3", "1,3,5", or "all"

Best For

  • Text-based PDFs (not scanned images)
  • Extracting content for analysis
  • Converting PDF content to editable text

Limitations

  • Scanned documents need OCR (use pdf_ocr instead)
  • Formatting is lost (tables become plain text)
  • Complex layouts may have jumbled text order

merge_pdfs

Combines multiple PDF files into a single document.

Parameters

ParameterRequiredDescription
urlsYesArray of PDF URLs to merge
nameNoOutput filename

Usage Notes

  • PDFs merge in the order provided
  • All page sizes are preserved
  • Bookmarks may not transfer

split_pdf

Extracts specific pages from a PDF.

Parameters

ParameterRequiredDescription
urlYesURL to the PDF file
pagesYesPages to extract: "1-3", "1,3,5", "1-3,7-9"

add_watermark

Adds text watermark to PDF pages.

Parameters

ParameterRequiredDescription
urlYesURL to the PDF file
textYesWatermark text (e.g., "DRAFT", "CONFIDENTIAL")
pagesNoWhich pages (default: all)
opacityNo0-1, lower = more transparent

Error Recovery

"HTML parsing failed"

  • Check for unclosed tags
  • Ensure valid HTML structure
  • Remove problematic CSS

"Timeout"

  • HTML may be too complex
  • Simplify styles and structure
  • Split into multiple documents

"Invalid URL"

  • For pdf_to_text, merge, split: ensure URL is accessible
  • Use publicly accessible URLs or presigned URLs

Best Practices

  1. Start simple - Basic HTML first, add styling incrementally
  2. Test structure - Tables > divs for data layouts
  3. Inline everything - CSS, small images as base64
  4. Set paper size - Match intended print format
  5. Use margins - Prevent content from touching edges