id: io-file-source title: File source connector sidebar_label: “File source connector”

:::note

You can download all the Pulsar connectors on download page.

:::

The File source connector pulls messages from files in directories and persists the messages to Pulsar topics.

Configuration

The configuration of the file source connector has the following properties.

Property

NameTypeRequiredDefaultDescription
inputDirectoryStringtrueNo default valueThe input directory to pull files.
recurseBooleanfalsetrueWhether to pull files from subdirectory or not.
keepFileBooleanfalsefalseIf set to true, the file is not deleted after it is processed, which means the file can be picked up continually.
fileFilterStringfalse[^\.].*The file whose name matches the given regular expression is picked up.
pathFilterStringfalseNULLIf recurse is set to true, the subdirectory whose path matches the given regular expression is scanned.
minimumFileAgeIntegerfalse0The minimum age that a file can be processed.

Any file younger than minimumFileAge (according to the last modification date) is ignored.
maximumFileAgeLongfalseLong.MAX_VALUEThe maximum age that a file can be processed.

Any file older than maximumFileAge (according to last modification date) is ignored.
minimumSizeIntegerfalse1The minimum size (in bytes) that a file can be processed.
maximumSizeDoublefalseDouble.MAX_VALUEThe maximum size (in bytes) that a file can be processed.
ignoreHiddenFilesBooleanfalsetrueWhether the hidden files should be ignored or not.
pollingIntervalLongfalse10000LIndicates how long to wait before performing a directory listing.
numWorkersIntegerfalse1The number of worker threads that process files.

This allows you to process a larger number of files concurrently.

However, setting this to a value greater than 1 makes the data from multiple files mixed in the target topic.
processedFileSuffixStringfalseNULLIf set, do not delete but only rename file that has been processed.

This config only work when ‘keepFile’ property is false.

Example

Before using the File source connector, you need to create a configuration file through one of the following methods.

  • JSON

    {
       "configs": {
          "inputDirectory": "/Users/david",
          "recurse": true,
          "keepFile": true,
          "fileFilter": "[^\\.].*",
          "pathFilter": "*",
          "minimumFileAge": 0,
          "maximumFileAge": 9999999999,
          "minimumSize": 1,
          "maximumSize": 5000000,
          "ignoreHiddenFiles": true,
          "pollingInterval": 5000,
          "numWorkers": 1,
          "processedFileSuffix": ".processed_done"
       }
    }
    
  • YAML

    configs:
        inputDirectory: "/Users/david"
        recurse: true
        keepFile: true
        fileFilter: "[^\\.].*"
        pathFilter: "*"
        minimumFileAge: 0
        maximumFileAge: 9999999999
        minimumSize: 1
        maximumSize: 5000000
        ignoreHiddenFiles: true
        pollingInterval: 5000
        numWorkers: 1
        processedFileSuffix: ".processed_done"
    

Usage

Here is an example of using the File source connecter.

  1. Pull a Pulsar image.

    docker pull apachepulsar/pulsar:{version}
    
  2. Start Pulsar standalone.

    docker run -d -it -p 6650:6650 -p 8080:8080 -v $PWD/data:/pulsar/data --name pulsar-standalone apachepulsar/pulsar:{version} bin/pulsar standalone
    
  3. Create a configuration file file-connector.yaml.

    configs:
        inputDirectory: "/opt"
    
  4. Copy the configuration file file-connector.yaml to the container.

    docker cp connectors/file-connector.yaml pulsar-standalone:/pulsar/
    
  5. Download the File source connector.

    curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/pulsar/pulsar-{version}/connectors/pulsar-io-file-{version}.nar
    
  6. Copy it to the connectors folder, then restart the container.

    docker cp pulsar-io-file-{version}.nar pulsar-standalone:/pulsar/connectors/
    docker restart pulsar-standalone
    
  7. Start the File source connector.

    docker exec -it pulsar-standalone /bin/bash
    
    ./bin/pulsar-admin sources localrun \
       --archive /pulsar/connectors/pulsar-io-file-{version}.nar \
       --name file-test \
       --destination-topic-name  pulsar-file-test \
       --source-config-file /pulsar/file-connector.yaml
    
  8. Start a consumer.

    ./bin/pulsar-client consume -s file-test -n 0 pulsar-file-test
    
  9. Write the message to the file test.txt.

    echo "hello world!" > /opt/test.txt
    

    The following information appears on the consumer terminal window.

    ----- got message -----
    hello world!