Friday, May 12, 2017

Amazon Athena: Key highlights on Amazon Athena


Amazon Athena is a serverless interactive query service that is used to analyze data in Amazon S3 using standard SQL.

  • Amazon Athena applies schema-on-read.
  • Amazon Athena can be used to analyze structured, semi-structured and unstructured datasets.
  • Athena integrates with Amazon QuickSight for easy visualization.
  • Amazon Athena uses Presto, a distributed SQL engine to execute the queries.
  • Apache Hive DDL is used in Athena to define tables.
  • Amazon Athena can be accessed in either of two ways.
    • AWS Management Console
    • JDBC Connection
  • A user requires User permissions and Amazon S3 permissions to run queries.
    • Athena User permissions: Amazon IAM provides two managed policies for Athena: AmazonAthenaFullAccess and AWSQuicksightAthenaAccess. These policies can be attached to the user profile to get required permissions.
    • Athena S3 permissions: To access data from a particular S3 location, a Athena user needs appropriate permissions on S3 buckets.
  • Amazon Athena supports encryption.
  • If dataset is encrypted on Amazon S3, a table DDL can have TBLPROPERTIES('has_encrypted_data'='true') to inform Athena that data to read is encrypted. 
  • Amazon Athena stores the query result sets by default in S3 staging directory.
  • Any settings defined to store encrypted results, will apply to all tables and queries. Configuring settings for individual databases, tables or queries is not possible. 
  • To encrypt query results stored in Amazon S3 using the console, provide the required details in Settings Tab.
  • Tables can be created in Athena using
    • AWS console
    • Using JDBC Driver
    • using Athena create table wizard.
  • Databases and Tables are simply logical objects pointing to actual files on Amazon S3.
  • Athena catalog stores the metadata about the databases and tables.
  • Athena doesn't support
    • CREATE TABLE AS SELECT
    • Transaction based operations.
    • ALTER INDEX
    • ALTER TABLE .. ARCHIVE PARTITION
    • ALTER TABLE ... CLUSTERED BY ..
    • ALTER TABLE .. EXCHANGE PARTITION
    • ALTER TABLE .. NOT CLUSTERED
    • ALTER TABLE ... NOT SKEWED
    • ALTER TABLE .. RENAME TO
    • COMMIT
    • CREATE INDEX
    • CREATE ROLE
    • CREATE TABLE LIKE
    • CREATE VIEW
    • DESCRIBE DATABASE
    • INSERT INTO 
    • ROLLBACK
    • EXPORT/IMPORT TABLE
    • DELETE FROM
    • ALTER TABLE .. ADD COLUMNS
    • ALTER TABLE..REPLACE COLUMNS
    • ALTER TABLE .. CHANGE COLUMNS
    • ALTER TABLE ..TOUCH
    • UDF's are not supported
    • Stored procedures
  • Athena limitations:
    • Operations that change table stats like create,update ,delete tables are ACID compliant.
    • All tables are EXTERNAL.
    • Table names are case-insensitive
    • Table names allow only underscore character, cannot contain any other special character.
  • Advantages of accessing Athena using JDBC Driver
    • Using driver Athena can connect to third party applications such as SQL Workbench.
    • We can run queries programmatically against Athena. 
  • JDBC Driver Options:
    • s3_staging_dir
    • query_results_encryption_option
    • query_results_aws_kms_key
    • aws_credentials_provider_class
    • aws_credentails_provider_arguments
    • max_error_retries
    • connection_timeout
    • socket_timeout
    • retry_base_delay
    • retry_max_backoff_time
    • log_path
    • log_level
  • JDBC commands:
    • connection.createStatement(); 
    • statement.executeQuery("<query>")
  • AWS CloudTrail is a service that records AWS API calls and events for AWS accounts. CloudTrail generates encrypted (*.gzip) logfiles and stores them in Amazon S3 in JSON format.
  • CloudTrail SerDe is used by Athena to read log files generated by CloudTrail in JSON format.
  • Compression Formats supported:
    • Snappy
    • Zlib
    • GZIP
    • LZO

No comments:

Post a Comment

Amazon Athena: Key highlights on Amazon Athena

Amazon Athena is a serverless interactive query service that is used to analyze data in Amazon S3 using standard SQL. Amazon Athena app...