exporters.writers package

Submodules

exporters.writers.base_writer module

class exporters.writers.base_writer.BaseWriter(options, metadata, *args, **kwargs)[source]

Bases: exporters.pipeline.base_pipeline_item.BasePipelineItem

This module receives a batch and writes it where needed.

close()[source]

Close all buffers, cleaning all temporary files.

finish_writing()[source]

This method is hook for operations to be done after everything has been written (e.g. consistency checks, write a checkpoint, etc).

The default implementation calls self._check_write_consistency if option check_consistency is True.

flush()[source]

Ensure all remaining buffers are written.

get_all_metadata(module='writer')[source]
get_metadata(key, module='writer')[source]
grouping_info
hash_algorithm = None
increment_written_items()[source]
set_metadata(key, value, module='writer')[source]
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
update_metadata(data, module='writer')[source]
write(path, key)[source]

Receive path to buffer file and group key and write its contents to the configured destination.

Should be implemented in derived classes.

It’s called when it’s time to flush a buffer, usually by either write_batch() or flush() methods.

write_batch(batch)[source]

Buffer a batch of items to be written and update internal counters.

Calling this method doesn’t guarantee that all items have been written. To ensure everything has been written you need to call flush().

exception exporters.writers.base_writer.InconsistentWriteState[source]

Bases: exceptions.Exception

This exception is thrown when write state is inconsistent with expected final state

exception exporters.writers.base_writer.ItemsLimitReached[source]

Bases: exceptions.Exception

This exception is thrown when the desired items number has been reached

exporters.writers.console_writer module

class exporters.writers.console_writer.ConsoleWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

It is just a writer with testing purposes. It prints every item in console.

It has no other options.

close()[source]
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write_batch(batch)[source]

exporters.writers.fs_writer module

class exporters.writers.fs_writer.FSWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to local file system files. It is a File Based writer, so it has filebase option available

  • filebase (str)
    Path to store the exported files
get_file_suffix(path, prefix)[source]

Gets a valid filename

supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=None)[source]

exporters.writers.ftp_writer module

class exporters.writers.ftp_writer.FTPWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to FTP server. It is a File Based writer, so it has filebase option available

  • host (str)
    Ftp server ip
  • port (int)
    Ftp port
  • ftp_user (str)
    Ftp user
  • ftp_password (str)
    Ftp password
  • filebase (str)
    Path to store the exported files
build_ftp_instance()[source]
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 21, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'ftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_USER'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'ftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_PASSWORD'}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(*args, **kw)[source]
exception exporters.writers.ftp_writer.FtpCreateDirsException[source]

Bases: exceptions.Exception

exporters.writers.s3_writer module

class exporters.writers.s3_writer.S3Writer(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to S3 bucket. It is a File Based writer, so it has filebase option available

  • bucket (str)
    Name of the bucket to write the items to.
  • aws_access_key_id (str)
    Public acces key to the s3 bucket.
  • aws_secret_access_key (str)
    Secret access key to the s3 bucket.
  • filebase (str)
    Base path to store the items in the bucket.
  • aws_region (str)
    AWS region to connect to.
  • save_metadata (bool)
    Save key’s items count as metadata. Default: True
  • filebase
    Path to store the exported files
close()[source]

Called to clean all possible tmp files created during the process.

get_file_suffix(path, prefix)[source]
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'aws_region': {'default': None, 'type': (<type 'basestring'>,)}, 'save_metadata': {'default': True, 'required': False, 'type': <type 'bool'>}, 'host': {'default': None, 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'aws_access_key_id': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_LOGIN'}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'bucket': {'type': (<type 'basestring'>,)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'aws_secret_access_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_SECRET'}, 'save_pointer': {'default': None, 'type': (<type 'basestring'>,)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=None)[source]
exporters.writers.s3_writer.multipart_upload(*args, **kwds)[source]
exporters.writers.s3_writer.should_use_multipart_upload(path, bucket)[source]

exporters.writers.sftp_writer module

class exporters.writers.sftp_writer.SFTPWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to SFTP server. It is a File Based writer, so it has filebase option available

  • host (str)
    SFtp server ip
  • port (int)
    SFtp port
  • sftp_user (str)
    SFtp user
  • sftp_password (str)
    SFtp password
  • filebase (str)
    Path to store the exported files
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 22, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'sftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_PASSWORD'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'sftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_USER'}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(*args, **kw)[source]

exporters.contrib.writers.odo module

Module contents

class exporters.writers.ConsoleWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

It is just a writer with testing purposes. It prints every item in console.

It has no other options.

close()[source]
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write_batch(batch)[source]
class exporters.writers.FSWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to local file system files. It is a File Based writer, so it has filebase option available

  • filebase (str)
    Path to store the exported files
get_file_suffix(path, prefix)[source]

Gets a valid filename

supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=None)[source]
class exporters.writers.FTPWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to FTP server. It is a File Based writer, so it has filebase option available

  • host (str)
    Ftp server ip
  • port (int)
    Ftp port
  • ftp_user (str)
    Ftp user
  • ftp_password (str)
    Ftp password
  • filebase (str)
    Path to store the exported files
build_ftp_instance()[source]
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 21, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'ftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_USER'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'ftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_PASSWORD'}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(*args, **kw)[source]
class exporters.writers.SFTPWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to SFTP server. It is a File Based writer, so it has filebase option available

  • host (str)
    SFtp server ip
  • port (int)
    SFtp port
  • sftp_user (str)
    SFtp user
  • sftp_password (str)
    SFtp password
  • filebase (str)
    Path to store the exported files
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 22, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'sftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_PASSWORD'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'sftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_USER'}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(*args, **kw)[source]
class exporters.writers.S3Writer(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to S3 bucket. It is a File Based writer, so it has filebase option available

  • bucket (str)
    Name of the bucket to write the items to.
  • aws_access_key_id (str)
    Public acces key to the s3 bucket.
  • aws_secret_access_key (str)
    Secret access key to the s3 bucket.
  • filebase (str)
    Base path to store the items in the bucket.
  • aws_region (str)
    AWS region to connect to.
  • save_metadata (bool)
    Save key’s items count as metadata. Default: True
  • filebase
    Path to store the exported files
close()[source]

Called to clean all possible tmp files created during the process.

get_file_suffix(path, prefix)[source]
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'aws_region': {'default': None, 'type': (<type 'basestring'>,)}, 'save_metadata': {'default': True, 'required': False, 'type': <type 'bool'>}, 'host': {'default': None, 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'aws_access_key_id': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_LOGIN'}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'bucket': {'type': (<type 'basestring'>,)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'aws_secret_access_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_SECRET'}, 'save_pointer': {'default': None, 'type': (<type 'basestring'>,)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=None)[source]
class exporters.writers.MailWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

Send emails with items files attached

  • email (str)
    Email address where data will be sent
  • subject (str)
    Subject of the email
  • from (str)
    Sender of the email
  • max_mails_sent (str)
    maximum amount of emails that will be sent
send_mail(*args, **kw)[source]
supported_options = {'access_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_MAIL_AWS_ACCESS_KEY'}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'from': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_MAIL_FROM'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'file_name': {'default': None, 'type': (<type 'basestring'>,)}, 'max_mails_sent': {'default': 5, 'type': (<type 'int'>, <type 'long'>)}, 'emails': {'type': <class 'exporters.utils.list[unicode]'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'secret_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_MAIL_AWS_SECRET_KEY'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}, 'subject': {'type': (<type 'basestring'>,)}}
write(dump_path, group_key=None, file_name=None)[source]
class exporters.writers.CloudSearchWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

This writer stores items in CloudSearch Amazon Web Services service (https://aws.amazon.com/es/cloudsearch/)

supported_options = {'access_key': {'default': None, 'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_CLOUDSEARCH_ACCESS_KEY'}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'id_field': {'default': '_key', 'type': (<type 'basestring'>,), 'help': 'Field to use as identifier'}, 'endpoint_url': {'type': (<type 'basestring'>,), 'help': 'Document Endpoint (e.g.: http://doc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com)'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'secret_key': {'default': None, 'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_CLOUDSEARCH_SECRET_KEY'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(*args, **kw)[source]
class exporters.writers.ReduceWriter(*args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

This writer allow exporters to make aggregation of items data and print the results

  • code (str)
    Python code defining a reduce_function(item, accumulator=None)
reduced_result
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'code': {'type': (<type 'basestring'>,), 'help': 'Python code defining a reduce_function(item, accumulator=None)'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'source_path': {'default': None, 'type': (<type 'basestring'>,), 'help': 'Source path, useful for debugging/inspecting tools'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write_batch(batch)[source]
class exporters.writers.HubstorageReduceWriter(*args, **kwargs)[source]

Bases: exporters.writers.reduce_writer.ReduceWriter

This writer allow exporters to make aggregation of items data and push results into Scrapinghub Hubstorage collections

  • code (str)
    Python code defining a reduce_function(item, accumulator=None)
  • collection_url (str)
    Hubstorage Collection URL
  • key (str)
    Element key where to push the accumulated result
  • apikey (dict)
    Hubstorage API key
finish_writing()[source]
get_result(**extra)[source]
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'code': {'type': (<type 'basestring'>,), 'help': 'Python code defining a reduce_function(item, accumulator=None)'}, 'key': {'type': (<type 'basestring'>,), 'help': 'Element key where to push the accumulated result'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'collection_url': {'type': (<type 'basestring'>,), 'help': 'Hubstorage Collection URL'}, 'apikey': {'type': (<type 'basestring'>,), 'help': 'Hubstorage API key'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'source_path': {'default': None, 'type': (<type 'basestring'>,), 'help': 'Source path, useful for debugging/inspecting tools'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write_batch(batch)[source]
class exporters.writers.AggregationStatsWriter(*args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

This writer keeps track of keys occurences in dataset items. It provides information about the number and percentage of every possible key in a dataset.

It has no other options.

close()[source]
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write_batch(batch)[source]

Receives the batch and writes it. This method is usually called from a manager.

class exporters.writers.AzureBlobWriter(*args, **kw)[source]

Bases: exporters.writers.base_writer.BaseWriter

Writes items to azure blob containers.

  • account_name (str)
    Public acces name of the azure account.
  • account_key (str)
    Public acces key to the azure account.
  • container (str)
    Blob container name.
VALID_CONTAINER_NAME_RE = '[a-zA-Z0-9-]{3,63}'
hash_algorithm = 'md5'
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'container': {'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'account_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_KEY'}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'account_name': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_NAME'}}
write(dump_path, group_key=None)[source]
class exporters.writers.AzureFileWriter(options, meta, *args, **kw)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to azure file shares. It is a File Based writer, so it has filebase option available

  • account_name (str)
    Public acces name of the azure account.
  • account_key (str)
    Public acces key to the azure account.
  • share (str)
    File share name.
  • filebase (str)
    Base path to store the items in the share.
get_file_suffix(path, prefix)[source]
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'share': {'type': (<type 'basestring'>,)}, 'account_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_KEY'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'account_name': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_NAME'}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=None)[source]
class exporters.writers.DropboxWriter(*args, **kw)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to dropbox folder. options available

  • access_token (str)
    Oauth access token for Dropbox api.
  • filebase (str)
    Base path to store the items in the share.
get_file_suffix(path, prefix)[source]
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'access_token': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_DROPBOXWRITER_TOKEN'}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=False)[source]
class exporters.writers.GDriveWriter(*args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to Google Drive account. It is a File Based writer, so it has filebase

  • client_secret (object)
    JSON object containing client secrets (client-secret.json) file obtained when creating the google drive API key.
  • credentials (object)
    JSON object containing credentials, obtained by authenticating the application using the bin/get_gdrive_credentials.py ds script
  • filebase (str)
    Path to store the exported files
get_file_suffix(path, prefix)[source]

Gets a valid filename

supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'credentials': {'type': <type 'object'>}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'client_secret': {'type': <type 'object'>}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(*args, **kw)[source]
class exporters.writers.GStorageWriter(options, *args, **kwargs)[source]

Bases: exporters.writers.filebase_base_writer.FilebaseBaseWriter

Writes items to Google Storage buckets. It is a File Based writer, so it has filebase option available

  • filebase (str)
    Path to store the exported files
  • project (str)
    Valid project name
  • bucket (str)
    Google Storage bucket name
  • credentials (str or dict)
    Object with valid Google credentials, could be set using env variable EXPORTERS_GSTORAGE_CREDS_RESOURCE which should include reference to credentials JSON file installed with setuptools. This reference should have form “package_name:file_path”
supported_options = {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'credentials': {'type': (<type 'dict'>, <type 'basestring'>), 'env_fallback': 'EXPORTERS_GSTORAGE_CREDS_RESOURCE'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'project': {'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'bucket': {'type': (<type 'basestring'>,)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write(dump_path, group_key=None, file_name=None)[source]
write_stream(stream, file_obj)[source]
class exporters.writers.HubstorageWriter(*args, **kwargs)[source]

Bases: exporters.writers.base_writer.BaseWriter

This writer sends items into Scrapinghub Hubstorage collection.

  • apikey (str)
    API key with access to the project where the items are being generated.
  • project_id (str)
    Id of the project.
  • collection_name (str)
    Name of the collection of items.
  • key_field (str)
    Record field which should be used as Hubstorage item key
flush()[source]
supported_options = {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'apikey': {'type': (<type 'basestring'>,), 'help': 'Hubstorage API key', 'env_fallback': 'EXPORTERS_HS_APIKEY'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'collection_name': {'type': (<type 'basestring'>,), 'help': 'Name of the collection of items'}, 'key_field': {'default': '_key', 'type': (<type 'basestring'>,), 'help': 'Record field which should be used as Hubstorage item key'}, 'project_id': {'type': (<type 'basestring'>,), 'help': 'Id of the project'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}
write_batch(batch)[source]