Amazon Athena is a serverless interactive query service that is used to analyze data in Amazon S3 using standard SQL.
- Amazon Athena applies schema-on-read.
- Amazon Athena can be used to analyze structured, semi-structured and unstructured datasets.
- Athena integrates with Amazon QuickSight for easy visualization.
- Amazon Athena uses Presto, a distributed SQL engine to execute the queries.
- Apache Hive DDL is used in Athena to define tables.
- Amazon Athena can be accessed in either of two ways.
- AWS Management Console
- JDBC Connection
- A user requires User permissions and Amazon S3 permissions to run queries.
- Athena User permissions: Amazon IAM provides two managed policies for Athena: AmazonAthenaFullAccess and AWSQuicksightAthenaAccess. These policies can be attached to the user profile to get required permissions.
- Athena S3 permissions: To access data from a particular S3 location, a Athena user needs appropriate permissions on S3 buckets.
- Amazon Athena supports encryption.
- If dataset is encrypted on Amazon S3, a table DDL can have TBLPROPERTIES('has_encrypted_data'='true') to inform Athena that data to read is encrypted.
- Amazon Athena stores the query result sets by default in S3 staging directory.
- Any settings defined to store encrypted results, will apply to all tables and queries. Configuring settings for individual databases, tables or queries is not possible.
- To encrypt query results stored in Amazon S3 using the console, provide the required details in Settings Tab.
- Tables can be created in Athena using
- AWS console
- Using JDBC Driver
- using Athena create table wizard.
- Databases and Tables are simply logical objects pointing to actual files on Amazon S3.
- Athena catalog stores the metadata about the databases and tables.
- Athena doesn't support
- CREATE TABLE AS SELECT
- Transaction based operations.
- ALTER INDEX
- ALTER TABLE .. ARCHIVE PARTITION
- ALTER TABLE ... CLUSTERED BY ..
- ALTER TABLE .. EXCHANGE PARTITION
- ALTER TABLE .. NOT CLUSTERED
- ALTER TABLE ... NOT SKEWED
- ALTER TABLE .. RENAME TO
- COMMIT
- CREATE INDEX
- CREATE ROLE
- CREATE TABLE LIKE
- CREATE VIEW
- DESCRIBE DATABASE
- INSERT INTO
- ROLLBACK
- EXPORT/IMPORT TABLE
- DELETE FROM
- ALTER TABLE .. ADD COLUMNS
- ALTER TABLE..REPLACE COLUMNS
- ALTER TABLE .. CHANGE COLUMNS
- ALTER TABLE ..TOUCH
- UDF's are not supported
- Stored procedures
- Athena limitations:
- Operations that change table stats like create,update ,delete tables are ACID compliant.
- All tables are EXTERNAL.
- Table names are case-insensitive
- Table names allow only underscore character, cannot contain any other special character.
- Advantages of accessing Athena using JDBC Driver
- Using driver Athena can connect to third party applications such as SQL Workbench.
- We can run queries programmatically against Athena.
- JDBC Driver Options:
- s3_staging_dir
- query_results_encryption_option
- query_results_aws_kms_key
- aws_credentials_provider_class
- aws_credentails_provider_arguments
- max_error_retries
- connection_timeout
- socket_timeout
- retry_base_delay
- retry_max_backoff_time
- log_path
- log_level
- JDBC commands:
- connection.createStatement();
- statement.executeQuery("<query>")
- AWS CloudTrail is a service that records AWS API calls and events for AWS accounts. CloudTrail generates encrypted (*.gzip) logfiles and stores them in Amazon S3 in JSON format.
- CloudTrail SerDe is used by Athena to read log files generated by CloudTrail in JSON format.
- Compression Formats supported:
- Snappy
- Zlib
- GZIP
- LZO
No comments:
Post a Comment