What Is AWS Athena: 9 Features, Pricing [Athena Tutorial]

What Is AWS Athena? Are you looking for an effective data analysis solution to improve your business decision-making? Amazon Athena is a perfect solution for your organization.

Are you planning to use Amazon Athena? This guide will walk you through:

The Definition of Amazon Athena
What Is AWS Athena
How Amazon Athena Works
Features of Amazon Athena
Things to Consider Before Investing in Amazon Athena

What Is AWS Athena?

AWS Athena can be defined as an interactive query service that seamlessly utilizes standard SQL (Structured Query Language) to effectively analyze the data stored in AWS S3 (Simple Storage Service).

It was primarily designed to simplify the entire analysis process for AWS S3 data.

To get started, you simply need to launch your Amazon Management Console, target the data in AWS S3 through Amazon Athena, and ultimately run standard SQL queries.

In no more than a couple of seconds, the AWS Athena Query results will be retrieved.

When understanding AWS Athena, you’ll also be interested in knowing that the solution is not only serverless but is also programmed for automated scaling.

Its serverless nature also means that there won’t be any need to establish or maintain infrastructure.

Thanks to the auto-scaling feature, you should be able to run your queries simultaneously and generate quick results, even when working with large data sets and complex queries.

As for Amazon Athena pricing, you’re only charged for the queries you run. Thus, for businesses using AWS S3, it’s a highly inexpensive service.

Now that you have a basic idea about the AWS Athena, let’s find out how it works:

How Does AWS Athena Work

Suppose your data is stored in S3 as objects. This data can be in a variety of semi-structured or structured formats, such as columnar files like Apache ORC or Apache Parquet, plaintext files like JSON and CSV, and AWS service and application logs.

Following the squaring away of your data, a table needs to be created in Athena.

This includes a location for the data within S3 and a schema showing how the data is organized.
However, the table won’t host any data. The Apache Hive DDL that resembles SQL is used to define the tables.
In the end, a query is written and executed in standard ANSI SQL. Running in parallel, the query will then be distributed over thousands of cores in an AWS-managed computing pool.
Reading the data from S3, the cores execute the query and provide the results.
To deliver all this, AWS Athena relies on an open-source distributed SQL engine known as Presto, which is targeted toward petabyte-scale queries.

You can use AWS Athena in the following three ways:

There’s an asynchronous API in the Athena SDK. Upon running a query, you return a unique ID, which is in turn used to track progress and access results.
The AWS Management Console comes with a built-in function to create tables and schemas and is referred to as SQL editor. Results are displayed inline. It’s known to give a great start.
Equipped with JDBC and ODBC drivers, AWS Athena can be integrated with your favorite library or database app.

Based on the way AWS Athena works, it maintains a record of the queries you’ve executed, using CSV files in S3 to save the results, which are then used as a component of a larger pipeline.

Let’s now move on to the features of AWS Athena:

9 Features of AWS Athena

Some of the features of AWS Athena include:

1. Speedy Performance:

Since searches in the AWS Athena are conducted in parallel, query results are generated in seconds, regardless of how big the datasets are.

2. Zero Administration:

You need not provide anything beforehand or manage something afterward to deploy code. Plus, there’s no concept of an operating system, a fleet, or an instance.

3. Machine Learning:

When it comes to running distributed machine learning models on AWS SageMaker, anyone possessing SQL experience should be able to do that using AWS Athena.

4. Serverless:

As mentioned earlier, the serverless nature of AWS Athena precludes the need to have an infrastructure. The software and configuration are adapted to your user count and dataset needs.

5. Federated Query:

AWS Athena enables you to execute SQL queries between data stored in custom, relational, object, and non-relational data sources.

6. Auto-Scaling:

Any scaling challenges are addressed by your service provider, so there’s no need to write scripts or fire alerts to scale up or down. Weekend lulls and instant bursts of traffic are handled in the same way.

7. Simplistic Query:

As discussed earlier, AWS Athena makes use of the distributed SQL query engine called Presto, which is primarily meant for low-latency data processing. Thankfully, AWS Athena is compatible with a diverse range of data formats, including Parquet, CSV, Avro, JSON, and ORC.

8. Highly Secure:

To deliver outstanding security, AWS Athena utilizes ACLs (Access control lists), AWS S3 package policies, and AWS IAM (Identity and Access Management) policies.

9. Pay per Query:

While using AWS Athena, you only pay for the queries performed.

Things to Consider When Signing Up for AWS Athena

If you’re planning to choose AWS Athena for your business functions, be sure to consider the following aspects:

The Limitations

Knowing about the limitations of AWS Athena should put you in a much better position to make a choice.
For example, keep in mind that the solution doesn’t support Presto or Hive transactions, and user-defined stores procedures and functions.
On top of that, certain limitations have been imposed by Amazon on queries. For example, you may allow users to submit no more than one query.
This way, five user accounts can run queries simultaneously at a time.
Besides, you can have up to 100 databases in an account, and up to 100 tables in each database.
Using Athena, you can certainly access data from a region that’s different from where the query was initiated, not all regions are supported.

So you can only work with a limited number of supported regions.

User-Friendliness of the Interface

The Amazon Management Console, which is used for AWS Athena, represents an easy-to-understand and simple interface.
Yet, users need to have a basic understanding of SQL to be able to make the most out of it.
With four primary tabs, namely Built-in Query Editor, Catalog Manager, Saved Queries, and History, navigating the menu structure is fairly simple.
You won’t need any designated training to work on the tool if you possess experience running the SQL series.
From Amazon Athena documentation to other critical functions, everything is made incredibly easy with the Amazon Management Console.

Can You Integrate It with a BI Tool?

Amazon positions AWS Athena as a procedure to generate result sets through SQL queries.
For analysis and reporting, however, you can also use the data with various leading business intelligence solutions. BI analysts using a BI tool will need support from Amazon Athena.
Thankfully, AWS Athena doesn’t disappoint them. Amazon offers a JDBC driver (which can download from here) that helps users integrate Athena with other BI software such as Microsoft Power BI.
Amazon QuickSight is another popular BI tool. Other examples include Looker or Tableau.

Performance and Speed

Running AWS Athena queries for S3 data without performing any housekeeping that’s often demanded by other query systems, establishing servers, or defining clusters is simple and quick.

Thanks to the redundant data storage provided by Amazon, AWS Athena delivers outstanding availability, durability, and speed.

Also, AWS Athena leverages compute resources in separate, multiple Availability Zones.

Yet, it’s important that users stay updated on the best practices for optimal performance such as how certain query processing optimizations like Apache Parquet and partitions can help you recognize the full potential of your use case.

Data Formats

We’ve already highlighted the compatibility of AWS Athena with a wide range of data formats.

But there are additional aspects to understand in this regard.

According to Amazon, users should convert data unto columnar storage formats with the help of Apache Parquet.
Because one of the most critical features of an interactive query service relates to a separation of storage and computing, your team should have a strong grip on this optimization.
Using a columnar, compressed format helps to further enhance performance, while also minimizing the costs associated with storage and query.
Amazon further advises partitioning data to lower the data the query scans to refine the performance of the query.
To make format and file structure more efficient, you can also consider data format transformation by pairing Amazon Glue or EMR.

Amazon Athena Pricing

When using AWS Athena, the service charges are based on the data volume scanned by the queries users run, while storage charges are determined by the results stored in S3.

For storage data, regular S3 rates apply.

Each terabyte of data scanned costs $5. The minimum number of MBs is 10, and queries will be rounded up to the nearest MB.

Again, keeping data in columnar formats, using compressed data files, and deleting old results from time to time should help keep costs low.

You can also minimize query bills and speed up queries by formatting data in Apache Parquet.

AWS Athena vs redshift

So, let’s check out the difference between these 2 tools widely used in AWS.

Athena	Redshift
Athena is a serverless data analysis tool. It can query data from the S3 bucket.	Whereas Redshift is a data analysis tool but you have to load data in the DB first. This is not serverless it is set up in physical servers. Redshift Spectrum is serverless though.
Athena is mainly used to query data from S3 which are unstructured and cannot be loaded into an RDS.	As it’s a DB so we have the option of querying only structured data loaded into tables.
Write the query and just run it it is that simple.	Setup of DB and loading data takes time.
Athena has a very unique price structure of $5 per TB of data scanned.	The price of Redshift DB depends on the underlying hardware along with Query scan along with storage.
Performance wise this is good as you can query just 1 data source	Query performs really well with complex algorithms, join queries are really fast as compared to Athena.

AWS Athena role

The role of Athena can be divided into two broad categories.

One is to do data analysis only through queries, as this is a tool that has the capacity to query structured and unstructured data stored in S3 buckets.
It can also be integrated with many of the BI tools like QuickSight, also with other SQL clients using JDBC connection.

AWS Athena vs AWS glue

Let us try to find the difference between these two tools.

Athena	Glue
Serverless, use to query data from S3. It uses Presto at the backend for querying.	Serverless but it’s an ETL tool used for data processing and manipulating the data.
Athena uses the Gle Crawler to discover the schema in the S3 bucket for querying.	Glue Crawler is used for schema discovery.
It has the capacity to query unstructured as well as structured data from S3. Also Avro, Parquet, and JSON among others as well	It supports a host of file and data formats like Avro, Parquet, JSON, and many more for ETL purpose
The pricing is simple for 1 TB of data scan its $5	Pricing depends on the data processed and time.

AWS Athena Use Cases

There are various cases where Athena can be used. Let’s check out a few of them.

Easily Adopted by Users – To query structured and unstructured data directly from S3 on the fly. No need to load data anywhere directly the query can be fired from Athena into the S3 bucket. So can be used by data analysts to fire SQL queries.
Also can be used as a data source for tools for data Visualisation like Quicksight and other BI tools that supports JDBC connectivity.

FAQS:

Q: AWS athena documentation?

We have tried our best to cover most of the topics related to Athena. If still you would like to read more about it refer the AWS documentation.

Q: AWS Athena schema

AWS Athena does not have its own schema store. It uses the schema from the Glue Crawler. So before querying data we have to make sure to crawl the data with the help of Glue. Athena uses that to process the SQL queries.

Q: AWS Athena hdf5?

It’s not possible to read HDF5 file data with Athena, as the file size is too huge. These kinds of files are basically from a geospatial source which cannot be just queried with just a simple SQL statement.

Conclusion

After going through this ultimate Amazon Athena tutorial, you should have understood the solution, inside out.

What Is AWS Athena
How Amazon Athena Works
Features of Amazon Athena
Things to Consider before Investing in Amazon Athena

With this amazing AWS service, you should be able to analyze your business data seamlessly using standard SQL.

We still recommend staying updated on Amazon Athena tutorials and truly hope that this guide helps you make the most out of Amazon Athena and ultimately improve your organization’s decision-making.

Steve

I am an Amazon Web Services Professional, having more than 11 years of experience in AWS and other technologies. Extensively working in various AWS tools like S3, Lambda, API, Kinesis, Load Balancers, EKS, ECS, and many more. Working as a Solution Architect and Technology Lead for Architecting and implementing the same for different clients. He provides expert solutions around the world and especially in countries like the United States, Canada, United Kingdom, Australia, New Zealand, etc. Check out the complete profile on About us.