A module for parsing and making sense of robots.txt files.
Go to file
2024-05-29 14:15:48 +02:00
.vscode fix(core): update 2021-01-03 03:41:46 +00:00
test fix(core): update 2021-01-03 03:41:46 +00:00
ts fix(core): update 2021-01-03 03:41:46 +00:00
.gitignore fix(core): update 2021-01-03 03:41:46 +00:00
.gitlab-ci.yml fix(core): update 2021-01-03 03:41:46 +00:00
license fix(core): update 2021-01-03 03:41:46 +00:00
npmextra.json update tsconfig 2024-04-14 18:13:19 +02:00
package-lock.json 1.0.2 2021-01-03 03:41:47 +00:00
package.json update description 2024-05-29 14:15:48 +02:00
pnpm-lock.yaml switch to new org scheme 2023-07-11 01:28:57 +02:00
readme.hints.md update tsconfig 2024-04-14 18:13:19 +02:00
readme.md update tsconfig 2024-04-14 18:13:19 +02:00
tsconfig.json update tsconfig 2024-04-01 21:40:30 +02:00

@push.rocks/smartrobots

a module for making sense of robots.txt

Install

To install @push.rocks/smartrobots, you need to have Node.js installed on your system. Once Node.js is installed, you can install @push.rocks/smartrobots by running the following command in your terminal:

npm install @push.rocks/smartrobots --save

This command will download and install @push.rocks/smartrobots and its dependencies into your project's node_modules directory.

Usage

This tutorial will guide you through utilizing @push.rocks/smartrobots, a TypeScript module designed for interpreting robots.txt files of websites. By working through various scenarios, you'll learn how to efficiently parse and work with the data provided by robots.txt.

Prerequisites

Before diving into the examples, ensure you are familiar with basic TypeScript syntax and concepts. You also need a basic understanding of what robots.txt is and its significance in web development.

Basic Setup

First, ensure you have imported Smartrobots from the @push.rocks/smartrobots package in your TypeScript file:

import { Smartrobots } from '@push.rocks/smartrobots';

Instantiate the Smartrobots class to get started:

const mySmartrobots = new Smartrobots();

Parsing robots.txt from a URL

One common use case is to parse the robots.txt file directly from a website. The Smartrobots class provides an easy-to-use method to achieve this, as demonstrated below:

async function parseRobotsFromUrl() {
  const url = 'https://example.com/robots.txt'; // Replace with the URL to the desired robots.txt
  try {
    const parsedData = await mySmartrobots.parseRobotsTxtFromUrl(url);
    console.log('Parsed robots.txt data:', parsedData);
  } catch (error) {
    console.error('Error parsing robots.txt from URL:', error);
  }
}

parseRobotsFromUrl();

This function asynchronously fetches the robots.txt file from the specified URL and logs the parsed content to the console.

Parsing a robots.txt String

If you already have the contents of a robots.txt file as a string, you can parse it directly using the parseRobotsTxt method. Here's how:

async function parseRobotsFromString(robotsTxtString: string) {
  try {
    const parsedData = await mySmartrobots.parseRobotsTxt(robotsTxtString);
    console.log('Parsed robots.txt data:', parsedData);
  } catch (error) {
    console.error('Error parsing robots.txt string:', error);
  }
}

// Example robots.txt string
const robotsTxtString = `
User-agent: *
Disallow: /secret-page
Sitemap: https://example.com/sitemap.xml
`;

parseRobotsFromString(robotsTxtString);

This function takes a string representation of a robots.txt file, parses it, and logs the results. In the example string, there are directives for user-agents and a sitemap URL.

Understanding the Parsed Data

The parsed data from robots.txt is returned as an object. In its current implementation, @push.rocks/smartrobots focuses on extracting sitemap URLs. Here's a sample output from parsing the example robots.txt string:

{
  "sitemaps": ["https://example.com/sitemap.xml"]
}

You can extend the parsing logic based on your requirements to handle more directives from robots.txt.

Conclusion

@push.rocks/smartrobots provides a straightforward and efficient way to interpret robots.txt files in TypeScript projects. Whether you're fetching and parsing robots.txt from a URL or working with its contents as a string, this module simplifies the process, allowing you to focus on utilizing the data rather than parsing intricacies.

Remember, robots.txt files are publicly accessible and should be used responsibly following web standards and etiquette.

For more advanced use cases, consider contributing to or extending the functionality of @push.rocks/smartrobots to cover a broader range of directives and scenarios.

This repository contains open-source code that is licensed under the MIT License. A copy of the MIT License can be found in the license file within this repository.

Please note: The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH and are not included within the scope of the MIT license granted herein. Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines, and any usage must be approved in writing by Task Venture Capital GmbH.

Company Information

Task Venture Capital GmbH
Registered at District court Bremen HRB 35230 HB, Germany

For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.

By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.