Markdown Extraction Support
We've just released a new feature in CaptureKit based on user feedback: Markdown extraction. This update enhances CaptureKit's content extraction capabilities, allowing you to retrieve clean, structured text from web pages.
📝 Markdown Extraction (in /content API)
You can now extract the Markdown representation of web pages using the /content endpoint. This makes it easier to work with the textual content of web pages in a format that's both human-readable and machine-processable.
Example Request
GET https://api.capturekit.dev/content?access_key=<your-access-key>&url=https://capturekit.dev&include_markdown=true
Example Response
{
"success": true,
"data": {
"metadata": { ... },
"links": { ... },
"html": "<html><body><h1>Hello, world!</h1></body></html>",
"markdown": "# Hello, world!"
}
}
Parameters
- url (string, required): URL of the webpage
- access_key (string, required): Your API key
- include_markdown (boolean, optional): Set to true to include Markdown data (defaults to false)
Why Markdown?
Markdown provides several advantages over raw HTML:
- Readability: Markdown is cleaner and easier to read than HTML
- Simplicity: It removes unnecessary styling and formatting
- Portability: Easy to use in various applications and platforms
- Text Processing: Ideal for content analysis, summarization, and AI processing
Use Cases
- Content Management: Import web content directly into your CMS
- AI Processing: Feed web content to LLMs and other AI systems in a clean format
- Documentation: Extract documentation from websites for offline use
- Knowledge Bases: Build internal knowledge repositories from web content
Final Notes
This feature was developed in direct response to user feedback. We're committed to building CaptureKit to meet your real-world needs.
Have ideas for more features? Let us know! We're actively developing CaptureKit based on user input.
Thanks for being part of the journey!