exporters.writers package¶
Submodules¶
exporters.writers.base_writer module¶
-
class
exporters.writers.base_writer.
BaseWriter
(options, metadata, *args, **kwargs)[source]¶ Bases:
exporters.pipeline.base_pipeline_item.BasePipelineItem
This module receives a batch and writes it where needed.
-
finish_writing
()[source]¶ This method is hook for operations to be done after everything has been written (e.g. consistency checks, write a checkpoint, etc).
The default implementation calls self._check_write_consistency if option check_consistency is True.
-
grouping_info
¶
-
hash_algorithm
= None¶
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
exporters.writers.console_writer module¶
-
class
exporters.writers.console_writer.
ConsoleWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
It is just a writer with testing purposes. It prints every item in console.
It has no other options.
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
exporters.writers.fs_writer module¶
-
class
exporters.writers.fs_writer.
FSWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to local file system files. It is a File Based writer, so it has filebase option available
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
exporters.writers.ftp_writer module¶
-
class
exporters.writers.ftp_writer.
FTPWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to FTP server. It is a File Based writer, so it has filebase option available
- host (str)
- Ftp server ip
- port (int)
- Ftp port
- ftp_user (str)
- Ftp user
- ftp_password (str)
- Ftp password
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 21, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'ftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_USER'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'ftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_PASSWORD'}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
exporters.writers.s3_writer module¶
-
class
exporters.writers.s3_writer.
S3Writer
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to S3 bucket. It is a File Based writer, so it has filebase option available
- bucket (str)
- Name of the bucket to write the items to.
- aws_access_key_id (str)
- Public acces key to the s3 bucket.
- aws_secret_access_key (str)
- Secret access key to the s3 bucket.
- filebase (str)
- Base path to store the items in the bucket.
- aws_region (str)
- AWS region to connect to.
- save_metadata (bool)
- Save key’s items count as metadata. Default: True
- filebase
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'aws_region': {'default': None, 'type': (<type 'basestring'>,)}, 'save_metadata': {'default': True, 'required': False, 'type': <type 'bool'>}, 'host': {'default': None, 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'aws_access_key_id': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_LOGIN'}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'bucket': {'type': (<type 'basestring'>,)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'aws_secret_access_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_SECRET'}, 'save_pointer': {'default': None, 'type': (<type 'basestring'>,)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
exporters.writers.sftp_writer module¶
-
class
exporters.writers.sftp_writer.
SFTPWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to SFTP server. It is a File Based writer, so it has filebase option available
- host (str)
- SFtp server ip
- port (int)
- SFtp port
- sftp_user (str)
- SFtp user
- sftp_password (str)
- SFtp password
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 22, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'sftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_PASSWORD'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'sftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_USER'}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
exporters.contrib.writers.odo module¶
Module contents¶
-
class
exporters.writers.
ConsoleWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
It is just a writer with testing purposes. It prints every item in console.
It has no other options.
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
-
class
exporters.writers.
FSWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to local file system files. It is a File Based writer, so it has filebase option available
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
FTPWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to FTP server. It is a File Based writer, so it has filebase option available
- host (str)
- Ftp server ip
- port (int)
- Ftp port
- ftp_user (str)
- Ftp user
- ftp_password (str)
- Ftp password
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 21, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'ftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_USER'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'ftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_FTP_PASSWORD'}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
SFTPWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to SFTP server. It is a File Based writer, so it has filebase option available
- host (str)
- SFtp server ip
- port (int)
- SFtp port
- sftp_user (str)
- SFtp user
- sftp_password (str)
- SFtp password
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'host': {'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'port': {'default': 22, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'sftp_password': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_PASSWORD'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'sftp_user': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_SFTP_USER'}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
S3Writer
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to S3 bucket. It is a File Based writer, so it has filebase option available
- bucket (str)
- Name of the bucket to write the items to.
- aws_access_key_id (str)
- Public acces key to the s3 bucket.
- aws_secret_access_key (str)
- Secret access key to the s3 bucket.
- filebase (str)
- Base path to store the items in the bucket.
- aws_region (str)
- AWS region to connect to.
- save_metadata (bool)
- Save key’s items count as metadata. Default: True
- filebase
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'aws_region': {'default': None, 'type': (<type 'basestring'>,)}, 'save_metadata': {'default': True, 'required': False, 'type': <type 'bool'>}, 'host': {'default': None, 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'aws_access_key_id': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_LOGIN'}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'bucket': {'type': (<type 'basestring'>,)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'aws_secret_access_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_S3WRITER_AWS_SECRET'}, 'save_pointer': {'default': None, 'type': (<type 'basestring'>,)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
MailWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
Send emails with items files attached
- email (str)
- Email address where data will be sent
- subject (str)
- Subject of the email
- from (str)
- Sender of the email
- max_mails_sent (str)
- maximum amount of emails that will be sent
-
supported_options
= {'access_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_MAIL_AWS_ACCESS_KEY'}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'from': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_MAIL_FROM'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'file_name': {'default': None, 'type': (<type 'basestring'>,)}, 'max_mails_sent': {'default': 5, 'type': (<type 'int'>, <type 'long'>)}, 'emails': {'type': <class 'exporters.utils.list[unicode]'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'secret_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_MAIL_AWS_SECRET_KEY'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}, 'subject': {'type': (<type 'basestring'>,)}}¶
-
class
exporters.writers.
CloudSearchWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
This writer stores items in CloudSearch Amazon Web Services service (https://aws.amazon.com/es/cloudsearch/)
- endpoint_url
- Document Endpoint (e.g.: http://doc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com)
- id_field
- Field to use as identifier
- access_key
- Public acces key to the s3 bucket.
- secret_key
- Secret access key to the s3 bucket.
-
supported_options
= {'access_key': {'default': None, 'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_CLOUDSEARCH_ACCESS_KEY'}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'id_field': {'default': '_key', 'type': (<type 'basestring'>,), 'help': 'Field to use as identifier'}, 'endpoint_url': {'type': (<type 'basestring'>,), 'help': 'Document Endpoint (e.g.: http://doc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com)'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'secret_key': {'default': None, 'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_CLOUDSEARCH_SECRET_KEY'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
ReduceWriter
(*args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
This writer allow exporters to make aggregation of items data and print the results
- code (str)
- Python code defining a reduce_function(item, accumulator=None)
-
reduced_result
¶
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'code': {'type': (<type 'basestring'>,), 'help': 'Python code defining a reduce_function(item, accumulator=None)'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'source_path': {'default': None, 'type': (<type 'basestring'>,), 'help': 'Source path, useful for debugging/inspecting tools'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
HubstorageReduceWriter
(*args, **kwargs)[source]¶ Bases:
exporters.writers.reduce_writer.ReduceWriter
This writer allow exporters to make aggregation of items data and push results into Scrapinghub Hubstorage collections
- code (str)
- Python code defining a reduce_function(item, accumulator=None)
- collection_url (str)
- Hubstorage Collection URL
- key (str)
- Element key where to push the accumulated result
- apikey (dict)
- Hubstorage API key
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'code': {'type': (<type 'basestring'>,), 'help': 'Python code defining a reduce_function(item, accumulator=None)'}, 'key': {'type': (<type 'basestring'>,), 'help': 'Element key where to push the accumulated result'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'collection_url': {'type': (<type 'basestring'>,), 'help': 'Hubstorage Collection URL'}, 'apikey': {'type': (<type 'basestring'>,), 'help': 'Hubstorage API key'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'source_path': {'default': None, 'type': (<type 'basestring'>,), 'help': 'Source path, useful for debugging/inspecting tools'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
AggregationStatsWriter
(*args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
This writer keeps track of keys occurences in dataset items. It provides information about the number and percentage of every possible key in a dataset.
It has no other options.
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
-
class
exporters.writers.
AzureBlobWriter
(*args, **kw)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
Writes items to azure blob containers.
- account_name (str)
- Public acces name of the azure account.
- account_key (str)
- Public acces key to the azure account.
- container (str)
- Blob container name.
-
VALID_CONTAINER_NAME_RE
= '[a-zA-Z0-9-]{3,63}'¶
-
hash_algorithm
= 'md5'¶
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'container': {'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'account_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_KEY'}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'account_name': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_NAME'}}¶
-
class
exporters.writers.
AzureFileWriter
(options, meta, *args, **kw)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to azure file shares. It is a File Based writer, so it has filebase option available
- account_name (str)
- Public acces name of the azure account.
- account_key (str)
- Public acces key to the azure account.
- share (str)
- File share name.
- filebase (str)
- Base path to store the items in the share.
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'share': {'type': (<type 'basestring'>,)}, 'account_key': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_KEY'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'account_name': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_AZUREWRITER_NAME'}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
DropboxWriter
(*args, **kw)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to dropbox folder. options available
- access_token (str)
- Oauth access token for Dropbox api.
- filebase (str)
- Base path to store the items in the share.
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'access_token': {'type': (<type 'basestring'>,), 'env_fallback': 'EXPORTERS_DROPBOXWRITER_TOKEN'}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
GDriveWriter
(*args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to Google Drive account. It is a File Based writer, so it has filebase
- client_secret (object)
- JSON object containing client secrets (client-secret.json) file obtained when creating the google drive API key.
- credentials (object)
- JSON object containing credentials, obtained by authenticating the application using the bin/get_gdrive_credentials.py ds script
- filebase (str)
- Path to store the exported files
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'credentials': {'type': <type 'object'>}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'client_secret': {'type': <type 'object'>}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
GStorageWriter
(options, *args, **kwargs)[source]¶ Bases:
exporters.writers.filebase_base_writer.FilebaseBaseWriter
Writes items to Google Storage buckets. It is a File Based writer, so it has filebase option available
- filebase (str)
- Path to store the exported files
- project (str)
- Valid project name
- bucket (str)
- Google Storage bucket name
- credentials (str or dict)
- Object with valid Google credentials, could be set using env variable EXPORTERS_GSTORAGE_CREDS_RESOURCE which should include reference to credentials JSON file installed with setuptools. This reference should have form “package_name:file_path”
-
supported_options
= {'filebase': {'type': (<type 'basestring'>,)}, 'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'credentials': {'type': (<type 'dict'>, <type 'basestring'>), 'env_fallback': 'EXPORTERS_GSTORAGE_CREDS_RESOURCE'}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'start_file_count': {'default': 0, 'type': <type 'int'>}, 'project': {'type': (<type 'basestring'>,)}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'bucket': {'type': (<type 'basestring'>,)}, 'generate_md5': {'default': False, 'type': <type 'bool'>}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶
-
class
exporters.writers.
HubstorageWriter
(*args, **kwargs)[source]¶ Bases:
exporters.writers.base_writer.BaseWriter
This writer sends items into Scrapinghub Hubstorage collection.
- apikey (str)
- API key with access to the project where the items are being generated.
- project_id (str)
- Id of the project.
- collection_name (str)
- Name of the collection of items.
- key_field (str)
- Record field which should be used as Hubstorage item key
-
supported_options
= {'write_buffer': {'default': 'exporters.write_buffers.base.WriteBuffer', 'type': (<type 'basestring'>,)}, 'apikey': {'type': (<type 'basestring'>,), 'help': 'Hubstorage API key', 'env_fallback': 'EXPORTERS_HS_APIKEY'}, 'compression': {'default': 'gz', 'type': (<type 'basestring'>,)}, 'items_per_buffer_write': {'default': 500000, 'type': (<type 'int'>, <type 'long'>)}, 'collection_name': {'type': (<type 'basestring'>,), 'help': 'Name of the collection of items'}, 'key_field': {'default': '_key', 'type': (<type 'basestring'>,), 'help': 'Record field which should be used as Hubstorage item key'}, 'project_id': {'type': (<type 'basestring'>,), 'help': 'Id of the project'}, 'size_per_buffer_write': {'default': 4000000000, 'type': (<type 'int'>, <type 'long'>)}, 'items_limit': {'default': 0, 'type': (<type 'int'>, <type 'long'>)}, 'check_consistency': {'default': False, 'type': <type 'bool'>}, 'write_buffer_options': {'default': {}, 'type': <type 'dict'>}}¶