Executive Summary (TLDR)

This project analyzed 447,505 Data Analyst job postings (2023–2025) to identify the most valuable skills and job market trends.

Key findings:

SQL is the most essential skill, appearing in nearly half of all postings and dominating skill combinations.
Python is the strongest differentiator, appearing in nearly 29% of postings and frequently associated with higher-paying roles.
Power BI and Tableau are baseline expectations, with Power BI showing the strongest growth trend.
Cloud and data engineering skills offer the highest salary premiums, despite lower overall demand.
Salary transparency is extremely limited, with only 5.39% of postings disclosing compensation.
The job market is volatile, declining sharply from 2023 to 2024 before partial recovery in 2025.
Career takeaway - most valuable skill combination is:
```
  SQL + Python + Visualization tool
```

Introduction

Using SQL and Python to analyze Data roles, specifically Data Analyst (DA) role, in the data job market.

The goal is to give a sneak peek to the most in demand and high paying skills for Data Analysts, based on a collection of job postings Jan/2023 to Dec/2025 from various countries, possibly helping job seekers realize their best focus for their job hunt.

Analytical Framework

This project follows a structured analytical workflow:

Data Preparation – Filter job postings to Data Analyst roles and clean salary fields.
Exploratory Analysis – Examine salary distributions, job demand and geographic patterns.
Skill Demand Analysis – Identify the most requested technical skills and how their demand evolves over time.
Salary & Skill Relationship – Evaluate how specific skills influence salary levels.
Skill Combo Analysis – Analyze common skill combinations required in job postings.
Statistical Validation – Use Mann–Whitney U tests and effect size to confirm whether observed salary differences are statistically meaningful.

Background

The data was collected from: Data_Jobs_SQL_Tutorial

For this project, our goal is to answer the following questions that are grouped by themes:

Limitations of the data

There a few data considerations that must be acknowledged that can impact the

Remote work - the data does not specifically tell us what type of arrangement this is. It is not possible to know if these are full remote or hybrid or even if it is a permanent or temporary arrangement. As a result, any comparisons should not be interpreted as definitive.
Skill tagging - the skills requirements on job postings may be incomplete. This means that if a job posting requires fewer skills this could be due to incompleteness rather than simply a lower skill requirement.
Title of the jobs - throughout the analysis role grouping (ex. Data Analyst = Senior Data Analyst + Junior Data Analyst) was performed in order to improve comparability, but it is acknowledged that this masks differences in seniority and specialization which in turn may influence compensation levels.
Currency information - the dataset does not specify salary currency. Since the majority of salary-disclosing DA postings originate from the United States, salaries are treated as USD values for analysis purposes.

Tools Used

SQL (PostgreSQL): the language and the database management system to interact with the database.
Python: the equivalent code of the SQL used (not using psycopg2.connect).
Visual Studio Code: to execute SQL queries.
Git & GitHub: to allow push commands to publish this project.

Analysis

The analysis will follow our questions layout.
For each question, a small intro will be provided to help understand the goal, along with the SQL code (Python code will be in separate files so a link will be provided), a table with the results (when appropriate), graph(s) for easy visualization and the main insights.

Python code for all graphs

1. What are the top-paying data analyst jobs?

For this question, we will focus on all Data Analyst roles using the normalized job_title_short field, excluding job postings that do not disclose the annual salary, leaving us with 24,105 job posts. We will also consider any location and any type of work arrangement to capture the full upper range of posted compensation.

with median_cte as(
    SELECT 
        percentile_cont(0.5) WITHIN GROUP (ORDER BY salary_year_avg) AS median_salary
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
    AND salary_year_avg IS NOT NULL
)

SELECT
	job_id,
	job_title,
	job_location,
	job_schedule_type,
	salary_year_avg,
    median_salary,
	job_posted_date,
	job_work_from_home as remote_work,
	name AS company_name
FROM
	job_postings_fact
LEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id
CROSS JOIN median_cte
WHERE
	job_title_short = 'Data Analyst'
	AND salary_year_avg IS NOT NULL
ORDER BY
	salary_year_avg DESC 
LIMIT 10

SQL code

Python code

Top 10 paying DA jobs

job_id	job_title	job_location	job_schedule_type	salary_year_avg	median_salary	job_posted_date	remote_work	company_name
1718590	Business Data Analyst Leader	Jefferson City, MO	Full-time	776000.0	94200	2025-10-21 12:01:26	false	beBeeAnalytical
1697728	Healthcare Innovator	Wayne, PA	Full-time	662500.0	94200	2025-09-29 15:01:10	false	beBeeLeadership
142665	Data Analyst	Anywhere	Full-time	650000.0	94200	2023-02-20 15:13:44	true	Mantys
1722555	GIS Data Analyst	Davie, FL	Full-time	620060.0	94200	2025-10-24 21:01:30	false	beBeeGeography
1512967	Data Analyst (Microsoft Dynamics 365 ERP)	South Africa	Full-time	570000.0	94200	2025-04-12 07:05:09	false	The Legends Agency
1757882	Data Analyst (Microsoft Dynamics 365 ERP)	South Africa	Full-time	570000.0	94200	2025-12-01 14:10:22	false	The Legends Agency
1774123	Data Analyst (Microsoft Dynamics 365 ERP)	South Africa	Full-time	570000.0	94200	2025-12-20 14:05:36	false	The Legends Agency
1738330	Data Analyst with digital advertising/marketing analytics: 25-06587 (No C2C)	Menlo Park, CA	Full-time	569500.0	94200	2025-11-08 09:00:30	false	Akraya Inc
1784303	Data Analyst with digital advertising/marketing analytics: 25-06587 (No C2C)	Menlo Park, CA	Full-time	569500.0	94200	2025-12-31 09:00:54	false	Akraya Inc
1762985	Data Analyst with digital advertising/marketing analytics: 25-06587 (No C2C)	Menlo Park, CA	Full-time	569500.0	94200	2025-12-08 09:00:35	false	Akraya Inc

In here, we have added the overall Data Analyst median salary to provide perspective.

Top 10 paying DA jobs - some bars represent multiple postings with identical role names and salaries

Handling Extreme Salary Outliers:

Initial exploration revealed several extremely high salaries (up to $776k), far above the typical Data Analyst compensation range.

SELECT
    PERCENTILE_CONT(0.5) 
    WITHIN GROUP (ORDER BY salary_year_avg) AS median_trimmed
FROM job_postings_fact
WHERE job_title_short = 'Data Analyst'
AND salary_year_avg IS NOT NULL
AND salary_year_avg <= (
    SELECT percentile_cont(0.99)
    WITHIN GROUP (ORDER BY salary_year_avg)
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
    AND salary_year_avg IS NOT NULL
)

SQL code

Python code

To evaluate the impact of these values, salaries above the 99th percentile were treated as extreme outliers.
In this dataset, the 99th percentile corresponds to approximately $215,500.

The median salary calculated from the full dataset is $94,200, while the median after removing the top 1% of salaries is $92,750.

Because the difference between these values is minimal ($1,450), the overall conclusions of the analysis are not significantly affected by extreme salary outliers.

For this reason, the analysis continues using the full dataset while relying primarily on median values, which are more robust to extreme observations.

Insights:

Extreme salaries reflect outliers - Data shows a wide salary range ($570k to $776k USD), compared to an overall median Data Analyst salary of $94k USD which tells us that these job postings represent extreme outliers, not typical Data Analyst roles compensation levels, or incorrectly reported postings;
High-salary postings very concentrated - Data shows that these high-salary postings are heavily concentrated among a small number of companies, with some of them appearing multiple times (ex. beBeeGeography). This concentration implies that a limited set of employers, or potentially job aggregators, are disproportionately influencing the upper tail of the distribution;
Limited remote presence among top earners - Remote work is limited (only 1 of the 10 top paying jobs), indicating that extreme salary listings in this dataset are more commonly associated with on-site roles, though this conclusion is based on a very small sample, limiting its strength;
Top-paying jobs are all full-time roles - All of the highest-paying positions are classified as full-time roles, indicating that extreme salary levels are tied to permanent employment;
Job title diversity is shown, suggesting some specialization (e.g. leadership oriented, domain or platform specific roles).
Expanding the sample reinforces the pattern - If, instead, we had analyzed the top 20 job posts, the same high-level patterns remain. The additional roles introduce a small number of remote and contractor positions, but do not change conclusions regarding salary extremity, employer concentration or employment structure.
Takeaway - The highest salary postings can be considered as outliers driven by a small group of employers and specialized roles. They should not be interpreted as representative benchmarks for the broader Data Analyst market.

2. What skills are required for top-paying jobs?

In here we want to find which skills appear more frequently in the upper salary tier (top 10% of postings by annual salary), shifting our focus from individual job listings (top 10 jobs) to a broader group. This 10% cut-off helps capture a wider upper salary tier, reducing the influence of only a few high-paying postings. By finding which skills appear most frequently within this group, we can better understand the skill set typically associated with high-paying Data Analyst roles in this dataset, providing guidance for job seekers wanting to compete for high-paying roles - "If I look only at elite paying roles, what skills do those jobs tend to require?"

WITH top_paying_jobs AS (
    SELECT
        job_id
    FROM
        job_postings_fact
    WHERE
        job_title_short = 'Data Analyst'
        AND salary_year_avg IS NOT NULL
        AND salary_year_avg >= (
            SELECT
                percentile_cont(0.9)
                WITHIN GROUP (ORDER BY salary_year_avg)
            FROM
                job_postings_fact
            WHERE
                job_title_short = 'Data Analyst'
                AND salary_year_avg IS NOT NULL
        )
),

top_job_count AS (
    SELECT
        COUNT(*) AS total_top_jobs
    FROM
        top_paying_jobs
)

SELECT
    sd.skills,
    COUNT(DISTINCT tp.job_id) AS job_count,
    ROUND(COUNT(DISTINCT tp.job_id)::numeric / tjc.total_top_jobs * 100,2) AS pct_of_top_paying_jobs
FROM
    top_paying_jobs tp
JOIN skills_job_dim sj ON tp.job_id = sj.job_id
JOIN skills_dim sd ON sj.skill_id = sd.skill_id
CROSS JOIN top_job_count tjc
GROUP BY
    sd.skills,
    tjc.total_top_jobs
HAVING
    COUNT(DISTINCT tp.job_id) > 10
ORDER BY
    pct_of_top_paying_jobs DESC;

SQL code

Python code

Top 13 Skills (above 5%):

Skill	Job Count	% of Top-Paying Jobs
SQL	1237	51.18%
Python	1037	42.90%
Tableau	673	27.84%
R	475	19.65%
Power BI	379	15.68%
Excel	357	14.77%
Snowflake	337	13.94%
AWS	232	9.60%
SAS	202	8.36%
Spark	186	7.70%
Azure	185	7.65%
Looker	141	5.83%
Databricks	126	5.21%

Full results available as json in the query tab

Insights:

SQL and Python form the high-salary technical core - SQL (51%) and Python (43%) dominate top-paying roles, indicating that these are the foundational skills for higher paying positions;
Visualization tools act as complementary enhancers - Visualization tools (Tableau and PowerBi) also appear frequently but at lower rates, therefore should be viewed as complementary skills and not drivers of salary premiums;
Cloud skills are present but not universal - Cloud and other data tools (e.g. AWS, Azure) are present but are very marginal compared to the other mentioned tools, showing that while they may add value, they are not consistently required across the high paying Data Analyst roles;
Takeaway - Higher-paying Data Analyst roles seem to require strong SQL and Python foundations, that should also be coupled by complementary tools. It is also visible that high paying Data Analyst roles typically require a combination of skills rather than a single competency.

3. What skills are most in demand for data analysts?

Here, we want to find which are the most frequently requested skills (absolute and percentage) across all Data Analyst job postings.

WITH da_jobs AS (
    SELECT job_id
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
),

skill_demand AS (
    SELECT
        sd.skills,
        COUNT(sjd.job_id) AS demand_count
    FROM skills_job_dim sjd
    JOIN da_jobs dj ON sjd.job_id = dj.job_id
    JOIN skills_dim sd ON sjd.skill_id = sd.skill_id
    GROUP BY sd.skills
    --HAVING COUNT(sjd.job_id)>10
)

SELECT
    skills,
    demand_count,
    ROUND(
        demand_count * 100.0 / (SELECT COUNT(*) FROM da_jobs),
        2
    ) AS demand_percentage
FROM skill_demand
ORDER BY demand_count DESC;

SQL code

Python code

Skills	Demand Count	Demand Percentage
sql	198761	44.42%
excel	144995	32.40%
python	128946	28.81%
tableau	99062	22.14%
power bi	94631	21.15%
r	63448	14.18%
sas	53930	12.05%
powerpoint	27323	6.11%
word	27292	6.10%
azure	24019	5.37%
sap	22436	5.01%

Table for skills with demand higher than 5% to highlight relevant demand and maintain readability

Jump back to Remote Skills Graph (Q10) - use this if you came from Q10

Insights

SQL is the clear winner - Appearing in 44% of the postings and leading by a wide margin, SQL clearly represents the core technical expectation for Data Analysts.
Excel retains structural relevance despite technological evolution - Excel, coming in second with 32%, is still relevant, even with the rise of tools or programming languages like Python, showing that spreadsheet analysis still has a place in a Data Analyst role.
Python is a differentiator rather than a baseline requirement - Python appears in just 29% of all postings but keep in mind that it rises significantly among top paying roles (Q2), making it not so much a requirement for entry-level Data Analysts but a differentiator.
Core vs Extension skills distinction visible - From the previous points, SQL and Excel represent the basic technical expectations for Data Analysts, while Python seems to be an extension to the skills and an earning potential increaser rather than a requirement.
BI tools reflect widespread dashboard expectations - Tools such as Tableau (22%) and Power BI (21%) are widely expected skills signalling widespread expectation of dashboard capabilities, at near identical demand levels suggesting no clear market preference.
R and SAS show persistence - R (14%) and SAS (12%), are still a relevant demand (likely concentrated in specific sectors), though at roughly half the frequency of Python, suggesting that the latter has become the dominant programming language for Data Analysts.
Communication tools signal general professional expectations - Communication and presentation software skills (e.g. Word and PowerPoint) are also present in some postings, but at much lower frequency. Though still relevant for presentation or story telling purposes, this likely reflects general office expectations rather than specialized data skills.
Takeaway - The Data Analyst role is anchored by SQL and Excel as baseline expectations, while Python and advanced tools create differentiation..

4. Which skills are associated with higher salaries?

In this question, we want to analyze how salaries differ by skill, in other words, what skill tends to pay more. Unlike Q2, where we began with the highest paying jobs and examined which skills they had in common, here we start with the individual skills and evaluate how they perform across the entire Data Analyst job market.
The goal is to determine whether a particular skill is associated with a salary uplift, i.e., how does that skill influence expected earnings compared to the market average. For this, we will include some of the results from previous questions (i.e. Market Demand, Average and Median Salaries) and calculate the salary uplift compared to the Data Analyst average.

WITH total_jobs AS (
    SELECT COUNT(DISTINCT job_id) AS total_count
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
),

salary_baseline AS (
    SELECT AVG(salary_year_avg) AS overall_avg
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
    AND salary_year_avg IS NOT NULL
),

skill_demand AS (
    SELECT
        s.skill_id,
        s.skills,
        COUNT(DISTINCT j.job_id) AS demand_count
    FROM job_postings_fact j
    JOIN skills_job_dim sj ON j.job_id = sj.job_id
    JOIN skills_dim s ON sj.skill_id = s.skill_id
    WHERE j.job_title_short = 'Data Analyst'
    GROUP BY s.skill_id, s.skills
),

skill_salary AS (
    SELECT
        s.skill_id,
        COUNT(j.job_id) AS salary_count,
        AVG(j.salary_year_avg) AS avg_salary,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary_year_avg) AS median_salary
    FROM job_postings_fact j
    JOIN skills_job_dim sj ON j.job_id = sj.job_id
    JOIN skills_dim s ON sj.skill_id = s.skill_id
    WHERE j.job_title_short = 'Data Analyst'
    AND j.salary_year_avg IS NOT NULL
    GROUP BY s.skill_id
    HAVING COUNT(j.job_id) >= 30
    --Chose this limit as a statistical principle - Central Limit Theorem
)

SELECT
    sd.skills,
    sd.demand_count,
    ROUND(sd.demand_count * 100.0 / (SELECT total_count FROM total_jobs), 2) AS demand_percent,
    ss.salary_count,
    ROUND(ss.avg_salary,0) AS avg_salary,
    ss.median_salary,
    ROUND(ss.avg_salary - (SELECT overall_avg FROM salary_baseline),0) AS salary_uplift
FROM skill_demand sd
JOIN skill_salary ss ON sd.skill_id = ss.skill_id
ORDER BY salary_uplift DESC;

SQL code

Python code

Top 15 skills ranked by salary uplift:

Rank	Skill	Demand Count	Demand %	Salary Count	Avg Salary ($)	Median Salary ($)	Salary Uplift ($)
1	Terraform	713	0.16%	32	153,772	143,500	+56,864
2	DynamoDB	504	0.11%	31	142,462	125,000	+45,554
3	Elasticsearch	854	0.19%	41	132,741	111,175	+35,833
4	GraphQL	312	0.07%	52	132,216	141,420	+35,308
5	Neo4j	419	0.09%	40	130,493	108,087.5	+33,585
6	Splunk	1,435	0.32%	83	129,743	125,000	+32,835
7	Kafka	2,495	0.56%	170	127,069	102,114	+30,161
8	GDPR	4,851	1.08%	137	126,682	113,550	+29,773
9	Seaborn	1,045	0.23%	52	125,774	117,500	+28,866
10	Atlassian	796	0.18%	45	125,280	135,000	+28,372
11	Scala	3,933	0.88%	204	124,609	105,000	+27,701
12	Scikit-learn	2,273	0.51%	152	122,394	105,000	+25,486
13	Snowflake	15,249	3.41%	1,069	122,329	113,550	+25,421
14	Perl	1,206	0.27%	51	120,663	118,269	+23,754
15	Trello	308	0.07%	35	120,325	145,500	+23,416

Salary uplift = average salary of particular skill - overall average salary

Top 15 skills by demand:

Rank	Skill	Demand Count	Demand %	Avg Salary	Median Salary	Salary Uplift
1	SQL	198,761	44.42%	$101,184	$96,350	+4,276
2	Excel	144,995	32.40%	$88,385	$85,874	-8,524
3	Python	128,946	28.81%	$105,238	$100,000	+8,329
4	Tableau	99,062	22.14%	$100,835	$95,000	+3,927
5	Power BI	94,631	21.15%	$96,630	$92,500	-278
6	R	63,448	14.18%	$102,417	$96,500	+5,509
7	PowerPoint	27,323	6.11%	$88,123	$85,874	-8,785
8	Word	27,292	6.10%	$81,106	$77,500	-15,802
9	SAS	26,965	6.03%	$99,133	$93,400	+2,225
10	Azure	24,019	5.37%	$109,286	$100,000	+12,377
11	SAP	22,436	5.01%	$94,185	$90,000	-2,724
12	AWS	20,804	4.65%	$112,622	$102,500	+15,713
13	Oracle	20,400	4.56%	$101,797	$99,764	+4,888
14	SQL Server	16,503	3.69%	$98,028	$95,000	+1,120
15	Looker	15,923	3.56%	$106,933	$100,000	+10,025

Salary uplift = average salary of particular skill - overall average salary

Both rankings (by salary uplift and by demand) are presented for completeness. However, since demand distribution was already analyzed in Q3, the primary visualization focus in this section is salary uplift and its relationship with demand.

Insights

Specialized skills drive salary premiums: Infrastructure, cloud and data movement tools (e.g. Terraform, DynamoDB and Kafka), though showing lower overall demand percentages, are associated with the largest salary premiums - scarcity appears strongly associated with positive salary uplifts. These skills overlap with Data Engineering and/or Machine Learning fields.
Baseline skills are expected, not rewarded: High demand skills for the role (e.g. SQL, Excel, Python, Tableau, PowerBI) cluster near the overall salary baseline, meaning that they do not translate necessarily to high salary premium, perhaps because those skills are expected for the role.
Low differentiation skills: On the lower end, productivity tools such as Outlook, Word, SPSS, PowerPoint and Excel show a negative salary uplift. While some of these skills are common to the role of Data Analyst, their prevalence likely reduces their differentiation value.
Takeaway: While baseline skills are essential for employability, if you are looking to maximize salary, learning rare and specialized skills may create differentiation beyond the baseline skills.

Bonus: While the previous analysis identified skills associated with higher salaries, it is also informative to examine skills that are widely requested but associated with lower salaries than the average (negative salary uplift):

Skill	Demand Count	Demand %	Salary Count	Avg Salary ($)	Median Salary ($)	Salary Uplift ($)
excel	144,995	32.40%	8,716	88,385	85,873.50	-8,524
power bi	94,631	21.15%	4,620	96,630	92,500	-278
powerpoint	27,323	6.11%	1,725	88,123	85,873.50	-8,785
word	27,292	6.10%	1,784	81,106	77,500	-15,802
sap	22,436	5.01%	747	94,185	90,000	-2,724

5. What are the most optimal skills (high demand and high pay) to learn for a data analyst?

In the previous sections, we examined skill demand (Q3) and respective salary (Q4) separately, but to get a better career strategy perspective we need to combine both high demand and high salary outcomes.
In this section we identify the most optimal skills for Data Analysts by evaluating each skill across both dimensions simultaneously. Skills were classified using the median values for the demand (0.86%) and for the salary uplift ($7,862).

WITH da_jobs AS (
    SELECT
        job_postings_fact.job_id,
        job_postings_fact.salary_year_avg
    FROM job_postings_fact
    WHERE job_postings_fact.job_title_short = 'Data Analyst'
),

da_job_counts AS (
    SELECT 
        COUNT(*)::numeric AS total_da_jobs
    FROM da_jobs
),

overall_salary AS (
    SELECT 
        AVG(da_jobs.salary_year_avg)::numeric AS overall_avg_salary
    FROM da_jobs
    WHERE da_jobs.salary_year_avg IS NOT NULL
),

skill_demand AS (
    SELECT
        skills_dim.skill_id,
        skills_dim.skills,
        COUNT(DISTINCT skills_job_dim.job_id)::numeric AS demand_count,
        ROUND((COUNT(DISTINCT skills_job_dim.job_id)::numeric / da_job_counts.total_da_jobs) * 100, 2) AS demand_percent
    FROM skills_job_dim
    JOIN skills_dim
        ON skills_job_dim.skill_id = skills_dim.skill_id
    JOIN da_jobs
        ON skills_job_dim.job_id = da_jobs.job_id
    CROSS JOIN da_job_counts
    --we just need to join the total which is the same for every row
    GROUP BY
        skills_dim.skill_id,
        skills_dim.skills,
        da_job_counts.total_da_jobs
),

skill_salary AS (
    SELECT
        skills_dim.skill_id,
        skills_dim.skills,
        COUNT(da_jobs.salary_year_avg)::int AS salary_count,
        AVG(da_jobs.salary_year_avg)::numeric AS avg_salary,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY da_jobs.salary_year_avg)::numeric AS median_salary
    FROM skills_job_dim
    JOIN skills_dim
        ON skills_job_dim.skill_id = skills_dim.skill_id
    JOIN da_jobs
        ON skills_job_dim.job_id = da_jobs.job_id
    WHERE da_jobs.salary_year_avg IS NOT NULL
    GROUP BY
        skills_dim.skill_id,
        skills_dim.skills
    HAVING COUNT(da_jobs.salary_year_avg) >= 30
),

skill_metrics AS (
    SELECT
        skill_demand.skill_id,
        skill_demand.skills,
        skill_demand.demand_count,
        skill_demand.demand_percent,
        skill_salary.salary_count,
        skill_salary.avg_salary,
        skill_salary.median_salary,
        (skill_salary.avg_salary - overall_salary.overall_avg_salary) AS salary_uplift
    FROM skill_demand
    JOIN skill_salary
        ON skill_demand.skill_id = skill_salary.skill_id
    CROSS JOIN overall_salary
    WHERE skill_salary.salary_count >= 30
),

thresholds AS (
    SELECT
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY skill_metrics.demand_percent) AS demand_median,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY skill_metrics.salary_uplift) AS uplift_median
        -- the medians because it splits the data into two halves. half above and half below the median.
    FROM skill_metrics
),

quadrants AS (
    SELECT
        skill_metrics.skill_id,
        skill_metrics.skills,
        skill_metrics.demand_count,
        skill_metrics.demand_percent,
        skill_metrics.salary_count,
        skill_metrics.avg_salary,
        skill_metrics.median_salary,
        skill_metrics.salary_uplift,
        thresholds.demand_median,
        thresholds.uplift_median,
        -- in here I really just needed the skills column and the case. but included all for "debugging".
        CASE
            WHEN skill_metrics.demand_percent >= thresholds.demand_median
                 AND skill_metrics.salary_uplift >= thresholds.uplift_median
                THEN 'Q1 - Optimal (High demand, High uplift)'
            WHEN skill_metrics.demand_percent < thresholds.demand_median
                 AND skill_metrics.salary_uplift >= thresholds.uplift_median
                THEN 'Q2 - Niche premium (Low demand, High uplift)'
            WHEN skill_metrics.demand_percent < thresholds.demand_median
                 AND skill_metrics.salary_uplift < thresholds.uplift_median
                THEN 'Q3 - Low value (Low demand, Low uplift)'
            ELSE
                'Q4 - Baseline (High demand, Low uplift)'
        END AS quadrant
    FROM skill_metrics
    CROSS JOIN thresholds
)

SELECT
    quadrants.skills,
    quadrants.demand_percent,
    quadrants.salary_uplift,
    quadrants.salary_count,
    quadrants.avg_salary,
    quadrants.median_salary,
    quadrants.quadrant
FROM quadrants
WHERE quadrants.quadrant = 'Q1 - Optimal (High demand, High uplift)'
ORDER BY
    quadrants.salary_uplift DESC,
    quadrants.demand_percent DESC

SQL code

Python code

Insights

Cloud and Data Engineering tools dominate optimal skills - Tools like Snowflake, Spark, AWS and Azure, to name a few, show salary uplifts ranging from approximately $10k to $25k above the Data Analyst average, while also maintaining a slightly above median demand.
Python stands out as the most strategically optimal skill - Python appears with extremely high demand (28.8%) and a positive salary uplift ($8,300), making it one of the strongest skills for both employability and earning potential, without being too much of a specialized skill.
Programming and data processing libraries provide strong salary differentiation - Tools such as Pandas, NumPy, Spark and Hadoop which support large data processing seem to be associated with higher compensation.
General Data Protection Regulation (GPDR) knowledge commands premium salaries - GPDR shows the highest salary uplift in this dataset ($29,773), indicating that regulatory and data governance expertise can significantly increase compensation.
Takeaway - The most optimal skills for Data Analysts combine core analytical skills (e.g. Python, Pandas) with modern data infrastructure and cloud tools.

6. How does salary vary by job schedule type (full-time, contract, etc.)?

After performing a skills analysis, we are now shifting to the employment structure and how it can influence salary outcomes.
In this question, we want to examine how Data Analyst salaries vary by job schedule type to determine whether certain employment arrangements offer higher compensation.
This dataset includes mixed categories (e.g. "Full-Time and Part-Time", "Full-time and Temp work", etc), which likely reflects inconsistent reporting rather than distinct job categories. For analysis purposes and to avoid distortion, we will retain the main categories (Full-time, Part-time, Contractor, Temp Work and Internship) and group the remaining labels as "Mixed/Other".

SELECT
    CASE
        WHEN job_schedule_type = 'Full-time' THEN 'Full-time'
        WHEN job_schedule_type = 'Contractor' THEN 'Contractor'
        WHEN job_schedule_type = 'Part-time' THEN 'Part-time'
        WHEN job_schedule_type = 'Temp work' THEN 'Temp work'
        WHEN job_schedule_type = 'Internship' THEN 'Internship'
        ELSE 'Mixed / Other'
    END AS schedule_group,
    COUNT(*) AS salary_count,
    ROUND(AVG(salary_year_avg), 0) AS avg_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY job_postings_fact.salary_year_avg) AS median_salary
FROM job_postings_fact
WHERE job_title_short = 'Data Analyst'
  AND salary_year_avg IS NOT NULL
GROUP BY schedule_group
HAVING COUNT(*) > 10
-- Excluding schedule types with fewer than 10 postings to avoid averages driven by very small samples
ORDER BY avg_salary DESC;

SQL code

Python code

Schedule Group	Job Count	Avg Salary	Salary Median
Full-time	21994	98507	95000
Contractor	462	88832	87500
Mixed / Other	906	82632	81140
Temp work	39	82372	75000
Internship	177	72155	67000
Part-time	527	71210	65000

Insights:

Full-time roles dominate both pay and volume - Full-time postings represent the majority of salary reported Data Analyst roles in this dataset and have the highest average and median salary amongst schedule types.
Contractor Data Analyst roles do not show a salary premium - Contractor salaries are below full-time. This can be somewhat surprising, but can also mean that contractor postings may include junior roles and inconsistent salary reporting methods (hourly to annual conversions and missing benefits).
Temporary, Internship and Part-time are the lowest - As expected, these schedule types offer lower compensation, likely reflecting reduced hours, experience requirements, or short contract durations. Temp work has very few postings so salary estimates should be treated cautiously

7. How does salary differ between remote vs on-site roles?(job_work_from_home = TRUE/FALSE)

Work mode is also a key factor that can influence salary expectations, so in this section we will compare salaries between Data Analyst roles that offer remote work and those that require on-site presence. This will help determine if a particular work mode is associated with some kind of salary differentiation or if it has no impact.

SELECT
    job_work_from_home,
    COUNT(*) AS salary_count,
    ROUND(AVG(salary_year_avg), 0) AS avg_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY job_postings_fact.salary_year_avg) AS median_salary
FROM job_postings_fact
WHERE job_title_short='Data Analyst'
AND salary_year_avg IS NOT NULL
GROUP BY job_work_from_home;

SQL code

Python code

Work Type	Salary Count	Average Salary	Median Salary
On-site	22,828	96,927	94,455
Remote	1,277	96,582	89,225

Insights

On-site roles dominate salary job postings - In this dataset, on-site positions account for 22,828 postings (aprox. 95%) with salary information, compared to only 1,277 remote postings. Though, this may reflect reporting practices rather than overall market availability.
No meaningful salary difference between remote and on-site roles - The average salaries for remote and on-site roles are nearly identical (€96k). However, the lower median salary for remote roles suggests the presence of some higher paying remote outliers that increase the average, while most remote roles cluster at slightly lower salary levels.
Takeaway - Remote work does not appear to offer a salary (dis)advantage in this dataset.

8. How do degree vs no-degree roles compare in pay?

Lets analyze if there's any salary difference between jobs that mention a degree requirements and jobs that do not.

SELECT
    CASE 
        WHEN job_no_degree_mention=TRUE THEN 'No degree mentioned'
        ELSE 'Degree mentioned'
    END as degree_requirement,
    COUNT(*) AS salary_count,
    ROUND(AVG(salary_year_avg), 0) AS avg_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY job_postings_fact.salary_year_avg) AS median_salary
FROM job_postings_fact
WHERE job_title_short='Data Analyst'
AND salary_year_avg IS NOT NULL
GROUP BY job_no_degree_mention;

SQL code

Python code

Degree Requirement	Salary Count	Average Salary	Median Salary
Degree mentioned	16,828	97,062	92,513.75
No degree mentioned	7,277	96,554	95,000

Insights

Degree requirements do not significantly impact average salary - Average salaries for roles that mention degree ($97k) and those that do not ($96k) are not significantly different, suggesting that degrees are not strongly associated with higher compensation in this dataset.
Roles without degree requirement show a higher median salary - Median salary is higher for roles that do not mention a degree requirement ($95k vs $92.5k), which is interesting. One reason may be that Data Analyst roles possibly prioritize skills and experience over formal education.
Roles mentioning degree requirement dominate the dataset - Of 24,105 total salary reported postings, 70% mention a degree requirement, showing that degrees remain a common request.
Takeaway - A formal degree appears to be a common hiring requirement but does not provide a clear salary advantage. Again, skills and experience may play a more important role in determining compensation.

9. Combo analysis of both remote status and degree requirement

After analyzing remote work and degree requirements separately, we now examine how these two factors interact. This will allow us to determine whether remote Data Analyst roles without degree requirements offer comparable compensation or if education plays a larger role in remote job salaries.

SELECT
    CASE
        WHEN job_work_from_home = TRUE THEN 'Remote'
        ELSE 'On-site'
    END AS work_type,
    CASE
        WHEN job_no_degree_mention = TRUE THEN 'No degree mentioned'
        ELSE 'Degree mentioned'
    END AS degree_requirement,
    COUNT(salary_year_avg) AS salary_count,
    ROUND(AVG(salary_year_avg), 0) AS avg_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary_year_avg)::int AS median_salary
FROM job_postings_fact
WHERE job_title_short = 'Data Analyst'
AND salary_year_avg IS NOT NULL
GROUP BY
    work_type,
    degree_requirement
ORDER BY avg_salary DESC;

SQL code

Python code

Work Type	Degree Requirement	Salary Count	Average Salary	Median Salary
Remote	No degree mentioned	255	102,179	91,200
On-site	Degree mentioned	15,806	97,183	93,137
On-site	No degree mentioned	7,022	96,350	95,000
Remote	Degree mentioned	1,022	95,186	87,000

Insights

Remote roles without degree requirements show the highest average salary - Remote Data Analyst roles that do not mention degree requirement have the highest average salary ($102k). However, the median salary ($91k) is lower than both on-site groups, indicating that the higher average is likely driven by a smaller number of very high paying outliers rather than consistently higher pay.
Median salaries suggest more stable compensation for on-site roles - On-site roles, particularly those without degree requirements, show the highest median salary ($95k), suggesting that these positions offer more consistently higher pay across the broader group of jobs.
Remote roles that mention degrees have the lowest typical salaries - These roles have both the lowest average and lowest median salaries in this dataset, suggesting that degree requirements alone do not translate into higher compensation in remote settings.
Major difference in sample sizes - Remote roles without degree requirements represent a much smaller sample (255 postings) compared to on-site roles. This smaller sample size makes the average more sensitive to extreme values and reinforces the importance of interpreting both average and median salaries together.
Takeaway - While some remote roles without degree requirements offer very high salaries, the median suggests that on-site roles provide more consistently high compensation and are extremely more common. This reinforces the broader pattern observed in earlier sections (skills and experience appear to matter more than formal degree requirements).

10. What skills are most common in remote Data Analyst roles?

In this section, we identify the top 10 skills requested in remote Data Analyst job postings in this dataset.
The goal is to help job seekers understand which skills are most commonly associated with remote opportunities and prioritize learning accordingly.

with remote_jobs as (
    SELECT job_id
    FROM job_postings_fact
    WHERE job_title_short='Data Analyst'
    AND job_work_from_home=TRUE
),

total_remote as (
    SELECT count(*) as total_jobs
    FROM remote_jobs
)

SELECT
    skills_dim.skills,
    COUNT(DISTINCT skills_job_dim.job_id) as demand_count,
    ROUND(COUNT(DISTINCT skills_job_dim.job_id) * 100.0 / total_remote.total_jobs,2) as demand_percent
FROM remote_jobs
JOIN skills_job_dim ON remote_jobs.job_id=skills_job_dim.job_id
JOIN skills_dim ON skills_dim.skill_id=skills_job_dim.skill_id
CROSS JOIN total_remote
GROUP BY
    skills_dim.skills,
    total_remote.total_jobs
ORDER BY
    demand_count DESC
LIMIT 10;

SQL code

Python code

Skill	Demand Count	Demand Percent (%)
SQL	16082	55.77
Python	10386	36.01
Excel	9849	34.15
Tableau	8704	30.18
Power BI	6300	21.85
R	4894	16.97
Looker	2307	8.00
Go	1858	6.44
SAS	1787	6.20
AWS	1764	6.12

Jump to Demand Graph (Q3) - if you wish to compare Compared with Q3, the remote top 10 replaces office productivity tools (Word and PowerPoint) with more technical tools (AWS, Looker and Go)

Insights

SQL remains the dominant skill for remote Data Analyst roles - SQL appears in the largest number of remote job postings, reinforcing its role as the foundational skill required for Data Analysts regardless of work environment.
Python becomes more prominent in remote postings - Python ranks second and is relatively more common than in the overall market (Q3), suggesting greater emphasis on programming and automation capabilities.
Visualization skills remain essential - Tableau and Power BI rank among the most demanded skills, confirming that communicating insights remains a core responsibility even in remote settings.
Takeaway - Core Data Analyst skill requirements (SQL, Python and BI tools) remain largely consistent, but the remote top 10 includes more cloud tools (e.g. AWS, Looker and Go), so in order to maximize opportunities for remote Data Analyst roles, candidates should prioritize the core skills while developing complementary technical cloud skills.

11. Which skills are most common in no-degree roles?

In this section we identify the top 10 skills for jobs postings where no degree is explicitly mentioned.
The goal is to understand which skills may help candidates access Data Analyst roles without formal academic qualifications.

with no_degree_jobs as (
    SELECT job_id
    FROM job_postings_fact
    WHERE job_title_short='Data Analyst'
    AND job_no_degree_mention=TRUE
),

no_degree_job_counts as (
    SELECT
        COUNT(*) as total_no_degree_jobs
    FROM no_degree_jobs
)

SELECT
    skills_dim.skills,
    COUNT(DISTINCT skills_job_dim.job_id) as demand_count,
    ROUND((COUNT(DISTINCT skills_job_dim.job_id)*100.0 / no_degree_job_counts.total_no_degree_jobs),2) as demand_percent
FROM no_degree_jobs
JOIN skills_job_dim ON no_degree_jobs.job_id=skills_job_dim.job_id
JOIN skills_dim ON skills_job_dim.skill_id=skills_dim.skill_id
CROSS JOIN no_degree_job_counts
GROUP BY
    skills_dim.skills,
    no_degree_job_counts.total_no_degree_jobs
ORDER BY demand_count DESC
LIMIT 10

SQL code

Python code

Skill	Demand Count	Demand Percent (%)
SQL	70123	38.45
Excel	46069	25.26
Python	39722	21.78
Power BI	32208	17.66
Tableau	29633	16.25
R	15256	8.36
Azure	8903	4.88
SAP	7773	4.26
AWS	7758	4.25
Looker	6900	3.78

Insights

Substantial portion of Data Analyst jobs do not explicitly mention a degree requirement - In this dataset, 182,380 out of 447,505 job postings (40.8%) do not mention a degree requirement. This indicates that while degrees remain common, a significant share of the market may be accessible based on skills knowledge and experience, rather than formal education alone.
Cora Data Analyst skills remain essential - SQL, Excel and Python remain the three most demanded skills in no-degree postings, appearing in 38.5%, 25.3% and 21.8% of postings respectively. This confirms employers prioritize technical proficiency in core analysis tools regardless of degree requirements.
Visualization tools continue to be critical - PowerBI and Tableau rank among the top five skills, appearing in 17.7% and 16.3% of no-degree postings respectively.
Cloud skills are also an advantage - Azure, AWS, SAP and Looker appear in the top 10
Takeaway - Although 40.8% of postings do not explicitly mention a degree requirement, the majority still do, indicating that formal education remains an important qualification. However, the similarity in skill requirements confirms that strong technical skills are essential in both cases and may help candidates access opportunities even without a degree.

12. Which countries offer the highest-paying data analyst roles?

In this section, we analyze average salaries by country to identify which locations offer the highest-paying Data Analyst roles. This helps job seekers understand how geographic location may influence earning potential.

SELECT
    job_country,
    COUNT(*) AS job_count,
    ROUND(AVG(salary_year_avg), 0) AS avg_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary_year_avg)::int as median_salary,
    ROUND(MIN(salary_year_avg),0) as min_salary,
    ROUND(MAX(salary_year_avg),0) as max_salary
FROM job_postings_fact
WHERE job_title_short='Data Analyst'
AND salary_year_avg IS NOT NULL
AND job_country IS NOT NULL
AND job_country NOT IN ('Anywhere','Remote')
GROUP BY job_country
HAVING COUNT(*) >= 30
ORDER BY avg_salary DESC
LIMIT 10;

SQL code

Python code

Country	Job Count	Avg Salary	Median Salary	Min Salary	Max Salary
Singapore	49	111707	105000	45000	227500
Canada	144	106541	98500	44400	385000
South Africa	63	100511	75000	25500	570000
Germany	66	99231	97825	43200	210000
United States	21433	97905	95000	16000	776000
United Arab Emirates	245	97597	95000	16800	233500
Guam	35	94897	83200	44330	296910
Portugal	43	93814	89100	51014	165000
Sudan	187	92865	90000	40000	255830
Pakistan	64	92860	100000	15600	219781

*Countries with relatively small samples sizes should be interpreted cautiously due to higher salary volatility influencing average salary

The measure chosen to plot is the median salary since our data clearly has outliers.

Insights

Several countries show high median salaries, though smaller sample sizes require cautious interpretation - Countries such as Singapore ($105K median, 49 postings) rank among the highest paying countries, however, they lack sample depth reducing statistical reliability, making them worth watching but not conclusive.
The US provides the most statistically reliable benchmark - With over 21,000 postings and a median salary of $95K, closely aligned with the average salary, the US offers the most stable and representative salary distribution in the dataset. Its range of $16K–$776K is extreme, signalling wide variation.
Germany shows a highly consistent salary distribution - An average ($99K) and median ($97.8K) nearly identical along with the narrowest salary range of any country in the list, suggest a well balanced distribution with less influence from extreme outliers compared to other countries in the ranking.
Median salary provides a more accurate comparison than average salary in volatile markets - Countries such as South Africa show a large difference between average ($100K) and median ($75K) salaries, indicating that a smaller number of high paying roles may inflate the average. This highlights the importance of using median salary to better reflect typical compensation.
Surprising presence of some countries - The presence of certain countries with high reported salaries may reflect specific job characteristics, including remote roles or employer reporting practices. This cannot be confirmed directly from the dataset and highlights the importance of cautious interpretation.
Takeaway - Median salary and sample size should always be read together. High averages in small or volatile markets can mislead. The most representative markets for a Data Analyst (pay+reliability+opportunity wise) are the US, Germany and UAE. Singapore and Canada are promising but need more data to confirm.

13. Which countries have the most job opportunities for DA roles?

In this section, we want to analyze the geographic distribution of Data Analyst job postings to identify which countries offer the greatest number of opportunities. While salary is important, job availability is equally critical for job seekers, as larger markets may provide more entry points and career flexibility.

SELECT
    job_country,
    job_count,
    ROUND(job_count * 100.0 / SUM(job_count) OVER (), 2) AS job_percent
FROM (
    SELECT
        job_country,
        COUNT(*) AS job_count
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
      AND job_country IS NOT NULL
      AND job_country NOT IN ('Anywhere', 'Remote')
    GROUP BY job_country
) AS country_counts
ORDER BY job_count DESC
LIMIT 10;

SQL code

Python code

Country	Job Count	Job Percent (%)
United States	173,383	38.77%
United Kingdom	32,279	7.22%
France	25,333	5.66%
India	16,896	3.78%
Germany	13,054	2.92%
Singapore	11,771	2.63%
Philippines	11,694	2.61%
Spain	9,891	2.21%
Italy	8,887	1.99%
Belgium	7,605	1.70%

Insights

The US dominates Data Analyst job postings - The United States accounts for 173,383 job postings, representing almost 39% of all Data Analyst roles in the dataset, in other words, roughly 2 out of every 5 job opportunities are located in the US, making it the largest and most accessible job market.
European countries collectively represent a significant share of opportunities - In this Top 10 we have 6 european countries: UK (7.22%), France (5.66%), Germany (2.92%), Spain (2.21%), Italy (1.99%) and Belgium (1.70%). While individually much smaller than the US, together they represent a meaningful portion (21.7%) of the global Data Analyst job market, highlighting Europe as another major regional hub.
Asia also plays an important role in global job availability - India (3.78%), Singapore (2.63%) and the Philippines (2.61%) rank among the top countries by job postings, collectively representing (9%) of global postings, making Asia also an important player.
Job opportunities are highly concentrated geographically - The sharp drop from the first ranked, US (38.77%), to the second ranked UK (7.22%) demonstrates that job availability is heavily concentrated in a small number of countries, with the US serving as the clear global center of opportunity with more job postings than the other 9 countries combined.
Takeaway - While Data Analyst opportunities exist globally, they are strongly concentrated in a few major markets, especially the US. Job seekers targeting the largest number of opportunities will find the US to be the most accessible market, while Europe and parts of Asia also offer substantial and regionally diverse opportunities that may align better with specific industries, languages or work cultures. Job seekers may benefit from considering both the volume of opportunities and the regional characteristics of each market.

14. How has demand for Data Analyst roles changed over time?

In this section, we will see how demand for Data Analyst roles has changed over time by examining the job postings per month. This will help understand if the job market is stable, expanding or contracting.
The analysis is conducted at a monthly level to capture detailed demand fluctuations. For that we will transform each date to the start of the respective month. A 3 month (quarterly) moving average is also applied for a smoother and trend analysis, helping reduce short term noise somewhat common in monthly analyses (e.g. hiring bursts and seasonal slowdowns). It helps figuring it out if it's just a one month spike/drop or if the market is really changing direction, therefore, avoiding exaggerated interpretations of short term variations.

Monthly

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', job_posted_date)::date AS month,
        --round dates to the start of the respective month
        COUNT(*) AS job_postings
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
    AND job_posted_date IS NOT NULL
    GROUP BY month
),

final AS (
    SELECT
        month,
        job_postings,
        LAG(job_postings) OVER (ORDER BY month) AS prev_month,
        --to get the job_postings value from the previous row
        ROUND(100.0 * (job_postings - LAG(job_postings) OVER (ORDER BY month))
            / NULLIF(LAG(job_postings) OVER (ORDER BY month), 0),2) AS mom_pct_change,
        ROUND(AVG(job_postings) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),
            0
        ) AS moving_avg_3mo
    FROM monthly
)

SELECT *
FROM final
ORDER BY month;

SQL code

Python code

Insights

Market contraction 2023 - late 2024 - Demand fell from 23,574 (Jan 2023) to 3,691 (Nov 2024), more than an 84% decline, signalling a major downturn in Data Analyst job postings. The 3 month moving average also confirms this was not isolated monthly fluctuations.
Weakening demand through 2023 - A spike in January (23,574) but then postings generally trended down and stabilized in the range 13k-16k, reaching 13,926 by the end of the year. The 3 month moving average also confirms this slow but persistent downward trend.
Continued weakening demand through 2024 - The decline continues in 2024, with a evident drop in Sep-Nov, being Nov 2024 the lowest of this dataset. The 3 month average also confirms this is not just a one month noise and, by remaining low, that the large increase in Dec was not a full market recovery, evidencing its importance by separating short-term shocks from underlying direction.
Temporary recovery in early 2025 followed by another decline - In early 2025 (Jan - Feb) postings surged to over 21k, but then we see another decline to the end of the year. To note that the demand declined in 2025 to almost the same numbers of 2024 but at a more faster rate.
Takeaway - This dataset only covers data from 2023-2025 so it is somewhat insufficient to get an understanding of long-term patterns, but the DA job market appears to be very volatile. Also, the start of each year (Jan-Mar) appears to be their respective year demand peaks possibly reflecting hiring cycle seasonality.

15. What is the Skill demand over time?

In this section, we analyze how demand for key Data Analyst skills has evolved over time. Rather than looking at raw counts, we will normalize skill demand by the total number of Data Analyst job postings in each month. This ensures that changes in overall job volume do not distort the analysis and allows us to isolate true changes in skill importance. This helps identify whether certain skills are becoming more or less essential in the Data Analyst job market.

A simple example to understand why normalization is important in this section:

Year	Python Jobs	Total Jobs	Raw	Normalized
2023	60,000	200,000	decreased	30%
2024	40,000	120,000	decreased	33.33%

As you can see, if we had not normalized we could be misled into thinking that Python demand decreased in 2024, when in fact, became more required.

We will focus on the most common core skills:

SQL
Python
Excel
Power BI
Tableau
R

These represent the foundational technical stack for most Data Analyst roles, as seen in previous sections.

WITH monthly_total AS (
    SELECT
        DATE_TRUNC('month', job_postings_fact.job_posted_date)::date AS month,
        COUNT(DISTINCT job_postings_fact.job_id) AS total_jobs
    FROM job_postings_fact
    WHERE job_postings_fact.job_title_short = 'Data Analyst'
    AND job_postings_fact.job_posted_date IS NOT NULL
    GROUP BY
        DATE_TRUNC('month', job_postings_fact.job_posted_date)::date
),

monthly_skill AS (
    SELECT
        DATE_TRUNC('month', job_postings_fact.job_posted_date)::date AS month,
        skills_dim.skills,
        COUNT(DISTINCT job_postings_fact.job_id) AS skill_jobs
    FROM job_postings_fact
    JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id
    JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id
    WHERE job_postings_fact.job_title_short = 'Data Analyst'
    AND job_postings_fact.job_posted_date IS NOT NULL
    AND skills_dim.skills IN (
            'sql',
            'python',
            'excel',
            'power bi',
            'tableau',
            'r'
      )
    GROUP BY
        DATE_TRUNC('month', job_postings_fact.job_posted_date)::date,
        skills_dim.skills
)

SELECT
    monthly_skill.month,
    monthly_skill.skills,
    monthly_skill.skill_jobs,
    monthly_total.total_jobs,
    ROUND(
        100.0 * monthly_skill.skill_jobs / monthly_total.total_jobs,
        2
    ) AS skill_percent
FROM monthly_skill
JOIN monthly_total ON monthly_skill.month = monthly_total.month
ORDER BY
    monthly_skill.month,
    monthly_skill.skills;

SQL code

Python code

Insights:

SQL is still king, but its lead is shrinking - SQL is consistently the most requested skill throughout the entire period, confirming its role as the core foundation of Data Analyst work, like we have seen in previous sections. However, its relative demand declines over time, falling from approx. 49% in early 2023 to around 36% by late 2025. That's not a crisis, but it's a meaningful decline.
Excel shows a steady downward trend - Excel demand declines from approx. 35% in early 2023 to around 25% by late 2025. This consistent reduction suggests that employers may increasingly prioritize more scalable or specialized tools over traditional spreadsheet analysis. However, Excel still remains one of the most commonly requested skills.
Python remains a key differentiator but volatile - Python demand remains relatively stable between 28-33% during 2023 and 2024, confirming its role as one of the most valuable technical skills for Data Analysts. However, during 2025 its demand also ends lower at approx. 23% during 2025.
Power BI shows the clearest upward trend - Power BI demand increases from approx. 18% in early 2023 to over 27% in 2025, making it the skill with strongest upward trend in relative demand. This suggests increasing employer emphasis on dashboarding.
Tableau shows moderate volatility but a slight overall decline - Tableau demand fluctuates between approx. 18-29% but the overall trend shows a slight decline compared to earlier periods. This suggests that, while Tableau remains important, it may be losing ground to competing tools such as Power BI.
R shows the most consistent long-term decline - R declines steadily from approx. 15% in 2023 to below 10% by late 2025, making it the only skill with a clear and continuous downward trend. This could be due to more focus being given to other analytical tools such as Python.
Takeaway - The Data Analyst skill landscape shows a clear downward trend for several traditional tools, but with a slight increasing importance of business intelligence skills. Additionally, the gap between skills narrows over time, suggesting employers may increasingly expect Data Analysts to possess a broader and more balanced technical skill set rather than specializing in a single tool.

Note - Even though the numbers were normalized, low volume months (Oct-Nov 2024) can still introduce volatility, producing exaggerated percentage swings.

16. Which skills most frequently appear together? (skill pair analysis)

In this section, we analyze the most common combinations of skills (pairs) that appear together in Data Analyst job postings.
While individual skill demand analysis is useful, employers usually expect candidates to posses multiple complementary skills, therefore, identifying the most frequent pairs provides insight into which tools form the most valuable and commonly expected skill sets in the job market.

WITH filtered_jobs AS (
    SELECT job_id
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
),

skill_pairs AS (
    SELECT
        s1.job_id,
        s1.skill_id AS skill_1,
        s2.skill_id AS skill_2
    FROM skills_job_dim s1
    JOIN skills_job_dim s2
        ON s1.job_id = s2.job_id
       AND s1.skill_id < s2.skill_id
    JOIN filtered_jobs fj
        ON s1.job_id = fj.job_id
)

SELECT
    sd1.skills AS skill_1,
    sd2.skills AS skill_2,
    COUNT(*) AS pair_count
FROM skill_pairs sp
JOIN skills_dim sd1 ON sp.skill_1 = sd1.skill_id
JOIN skills_dim sd2 ON sp.skill_2 = sd2.skill_id
WHERE sd1.skills <> sd2.skills 
GROUP BY sd1.skills, sd2.skills
ORDER BY pair_count DESC
LIMIT 20;

SQL code

Python code

Rank	Skill 1	Skill 2	Pair Count	Percentage
1	sql	python	99,925	22.33%
2	sql	tableau	74,326	16.61%
3	sql	excel	73,664	16.46%
4	sql	power bi	65,036	14.53%
5	python	r	54,871	12.26%
6	python	tableau	51,293	11.46%
7	sql	r	47,361	10.58%
8	excel	power bi	43,635	9.75%
9	tableau	power bi	43,526	9.73%
10	tableau	excel	40,434	9.04%

Insights:

SQL and Python form the core technical foundation of Data Analyst roles - The most common skill combination is SQL and Python, appearing together in 22.33% of all Data Analyst job postings. This means more than one in five Data Analyst roles require both database querying and programming capabilities, confirming that DAs are expected to combine data extraction and advanced analytical skills.
SQL serves as the central skill - SQL appears in the four most common skill combinations, confirming its role as a foundational skill.
Visualization tools are essential complementary skills - Visualization tools, Tableau and Power BI, frequently appear in top combinations, confirming the importance of communicating insights through dashboards and visual reporting.
Python is the dominant programming language - Python appears in several major combinations confirming its versatility and importance across data analysis, statistical work and visualization integration.
Excel remains highly relevant but increasingly paired with modern BI tools - Excel appear in three top 10 combinations, including with Power BI and Tableau, suggesting that while Excel is still a baseline expectation, employers increasingly expect to complement it with more advanced visualization and reporting tools.
R retains a meaningful presence in analytical roles - Python + R ranks 5th with 12.26% of postings, suggesting a notable segment of roles (likely more statistically oriented) still expect both languages. However, R appears less broadly than Python across other combinations, reinforcing Python's dominance as the primary programming language.
Takeaway - The most valuable Data Analyst skill set combines SQL, Python and a BI tool (Tableau or Power BI). Together, these results show that Data Analysts are expected to work across the entire analytics pipeline, from querying and analysis to reporting and visualization, but SQL + Python appearing in more than 22% of all Data Analyst job postings can be considered the most important skill combination in the job market.

17. How many job postings are missing salary information?

In this section, we analyze salary transparency in Data Analyst job postings by measuring how often employers disclose salary information. Salary transparency is an important factor for job seekers, as it improves market clarity and helps candidates make more informed career decisions. By comparing the number and percentage of postings with and without salary information, we can assess how transparent the Data Analyst job market is.

WITH filtered_jobs AS (
    SELECT
        job_id,
        salary_year_avg
    FROM job_postings_fact
    WHERE job_title_short = 'Data Analyst'
),

salary_counts AS (
    SELECT
        COUNT(*) AS total_jobs,
        COUNT(salary_year_avg) AS jobs_with_salary,
        COUNT(*) - COUNT(salary_year_avg) AS jobs_without_salary
    FROM filtered_jobs
)

SELECT
    total_jobs,
    jobs_with_salary,
    jobs_without_salary,
    ROUND(100.0 * jobs_with_salary / total_jobs, 2) AS percent_with_salary,
    ROUND(100.0 * jobs_without_salary / total_jobs, 2) AS percent_without_salary
FROM salary_counts;

SQL code

Python code

Total Jobs	Jobs With Salary	Jobs Without Salary	% With Salary	% Without Salary
447,505	24,105	423,400	5.39%	94.61%

Insights:

Salary transparency is extremely low in Data Analyst job postings - Only 5.39% of postings disclose salary information, while 94.61% do not provide salary data. This means the overwhelming majority of employers do not publicly share compensation details.
Very limited market visibility - With fewer than 1 in 20 job postings including salary information, candidates have limited ability to benchmark their value or compare offers effectively.
Salary analysis requires filtering - Because such a small percentage of jobs include salary data, all previous salary analyses in this project were based on a limited subset of postings. This is an important condition when interpreting salary insights as this may reflect sampling bias.
Takeaway - Salary transparency remains a major limitation in the Data Analyst job market. Job seekers should expect that most roles will not disclose salary upfront and should be prepared to research compensation independently or discuss it during later hiring stages.

Hypothesis Testing:

In this section, we statistically validate some of the most important findings identified in previous analyses. While earlier sections revealed differences between groups, hypothesis testing allows us to determine whether those differences are statistically significant or just random chance.

This will strengthen the reliability of our conclusions and provide deeper confidence in the insights derived from the data.

Since over 95% of salary-disclosed postings originate from the United States, we can say that the hypothesis testing focusses on the US Data Analyst job market.

For the tests, Mann-Whitney U was chosen because salary data is right-skewed with extreme outliers and we have been using the median values for our insights, making that test more appropriate than other tests such as t-test.

Effect size (r) is reported alongside p-values because the large sample size (20k+) makes statistical significance alone less informative, as p-values tend to become significant even for small differences.

IMPORTANT LIMITATION: Salary information is available for only 21k out of 447k jobs (5.4%). As a result, hypothesis testing relies on this subset, which may introduce selection bias. Additionally, salaries may be influenced by factors not controlled for in this analysis, such as experience level, company size or location differences. Therefore, the tests should be interpreted as indicative (demonstration purpose) rather than fully representative of the entire job market.

Test 1 - Python vs Non-Python salary

Previous analysis, Q3 and Q4, suggested that Python is associated with salary uplifts and overall higher salaries for Data Analyst roles. Q2 also showed it dominates top-paying roles. This test evaluates whether the observed salary difference between roles requiring Python and those that do not is statistically significant.

This is one of the most important findings of the project, as it directly addresses a key career question:

"Does learning Python significantly increase salary for Data Analysts?"

Defining the Hypotheses:

Null Hypothesis H₀: The salary distributions of python and non-python DA jobs are the same.

Alternative Hypothesis H₁: The salary distribution of python jobs is different from the non-python jobs.

Test:

import pandas as pd
from scipy.stats import mannwhitneyu

job_postings_fact=pd.read_csv(r'...')
skills_job = pd.read_csv(r"...")
skills_dim = pd.read_csv(r"...")

#Lets filter our data:
df = job_postings_fact[
    (job_postings_fact["job_title_short"] == "Data Analyst") &
    (job_postings_fact["salary_year_avg"].notna())
].copy()

#Lets identify the Python skill_id by filterng just rows with python in the skills column
#and then just get the skill_id column but only the first value
python_skill_id = skills_dim.loc[skills_dim["skills"] == "python", "skill_id"].iloc[0]

#Lets find the jobs that require Python by creating a job_id list
python_job_ids = skills_job.loc[skills_job["skill_id"] == python_skill_id,"job_id"]

#And now we create a column with trues and falses for python
df["python_required"] = df["job_id"].isin(python_job_ids)

# Split groups
python_salaries = df.loc[df["python_required"], "salary_year_avg"]
non_python_salaries = df.loc[~df["python_required"], "salary_year_avg"]

#Perform the Mann-Whitney test:
u_stat, p_value = mannwhitneyu(
    python_salaries,
    non_python_salaries,
    alternative="two-sided"
)

print("U statistic:", u_stat)
print("p-value:", p_value)

#And the effect size
#formula: r = 1 − (2U / (n₁ × n₂))

n1 = len(python_salaries)
n2 = len(non_python_salaries)

rank_biserial = 1 - (2 * u_stat) / (n1 * n2)

print(f"n1={n1}\nn2={n2}\nn1xn2={n1*n2}\nu_stat={u_stat}")
print("Rank-biserial effect size:", rank_biserial)

Python code

Results:

Metric	Result
Median Python salary	$100,000
Median non-Python salary	$90,000
Median difference	+$10,000
Mann–Whitney test	p < 0.001
Effect size	r ≈ 0.19 (small-to-moderate)

This confirms the descriptive findings from earlier sections that Python skills are associated with higher-paying DA roles.

Conclusion:

Data Analyst roles requiring Python show a median salary of $100,000 compared with $90,000 for roles that do not require Python, representing a $10,000 salary premium.

A Mann–Whitney U test confirms that this difference is statistically significant (p < 0.001). However, because very large samples tend to produce significant p-values even for small differences, the effect size was also calculated (r = 0.19), which indicates a small to moderate practical effect, suggesting that Python is associated with higher-paying roles but is not the sole determinant of salary.

Takeaway: Python is associated with a statistically significant salary premium, though the magnitude is samll/moderate.

Test 2 - Remote vs On-site Salary

Earlier analysis (Q7) showed little difference in average salary between remote and on-site roles. This test will evaluate whether this similarity has any statistically significance.

Remote work is a highly discussed topic in the modern job market and this test helps determine whether work location has a measurable impact on compensation.

Defining the Hypotheses:

Null Hypothesis H₀: The salary distributions of remote and on-site DA jobs are the same.

Alternative Hypothesis H₁: The salary distributions of remote and on-site DA jobs are different.

Test:

import pandas as pd
from scipy.stats import mannwhitneyu

#Importing CSV files with pd.read_csv()

#Lest filter our data:
df = job_postings_fact[
    (job_postings_fact["job_title_short"] == "Data Analyst") &
    (job_postings_fact["salary_year_avg"].notna())
].copy()

#Lets get the remote salaries by filtering just rows with remote in the corresponding column
# and getting just the salary column
remote_salaries = df.loc[df["job_work_from_home"] == True, "salary_year_avg"]

#now the onsite salaries:
onsite_salaries = df.loc[df["job_work_from_home"] == False, "salary_year_avg"]

#Perform the Mann-Whitney test:

u_stat, p_value = mannwhitneyu(
    remote_salaries,
    onsite_salaries,
    alternative="two-sided"
)

print("U statistic:", u_stat)
print("p-value:", p_value)

#And the effect size
#formula: r = 1 − (2U / (n₁ × n₂))

n1 = len(remote_salaries)
n2 = len(onsite_salaries)

rank_biserial = abs(1 - (2 * u_stat) / (n1 * n2))

print(f"n1={n1}\nn2={n2}\nn1*n2={n1*n2}\nu_stat={u_stat}")
print("Rank-biserial effect size:", rank_biserial)

Python code

Results:

Metric	Result
Median remote salary	$89,225
Median on-site salary	$94,455
Median difference	-$5,230
Mann–Whitney test	p = 0.164
Effect size	r ≈ 0.02 (negligible)

This confirms the descriptive findings from earlier sections that remote and on-site Data Analyst roles exhibit very similar salary distributions.

Conclusion:

Remote Data Analyst roles show a median salary of $89,225 compared with $94,455 for on-site roles, representing a difference of approximately $5,230. However, a Mann–Whitney U test indicates that this difference is not statistically significant (p=0.164 [< 0.05]).

The effect size (r=0.02) is negligible, indicating that the salary distributions of remote and on-site roles are almost identical.

Takeaway: Work location does not appear to meaningfully influence salary levels for Data Analyst roles in this dataset. Compensation appears to be driven more strongly by factors such as skills, experience and role requirements rather than work mode (remote or on-site).

Test 3 - Cloud Skills vs Non-Cloud Salary

Cloud and data engineering skills showed the highest salary premiums in previous analyses. This test evaluates whether roles requiring cloud-related skills have statistically higher salaries compared to roles that do not.

This helps confirm whether specialized technical skills are associated with meaningful salary advantages.

Several of the highest paying skills identified in Q4 belong to modern data infrastructure and cloud systems (Snowflake, Kafka, DynamoDB). Additionally, cloud tools such as AWS and Azure appear among the most demanded skills in the dataset while also associated with considerable salary premiums.

So in order to evaluate wether working with these technologies is associated with higher compensation, these tools were grouped into a broader category "cloud/data" for hypothesis testing.

Defining the Hypotheses:

Null Hypothesis H₀: The salary distributions of DA roles requiring cloud skills and those that do not are the same.

Alternative Hypothesis H₁: The salary distributions of DA roles requiring cloud skills and those that do not are different.

Test:

import pandas as pd
from scipy.stats import mannwhitneyu

#Data import pd.read_csv()

#Lets filter our data:
df = job_postings_fact[
    (job_postings_fact["job_title_short"] == "Data Analyst") &
    (job_postings_fact["salary_year_avg"].notna())
].copy()

#Creating a list of cloud skills:
cloud_skills = [
    "aws",
    "azure",
    "snowflake",
    "databricks",
    "spark",
    "kafka",
    "dynamodb",
    "elasticsearch"
]

#We need to get the ids:
cloud_skill_ids = skills_dim.loc[skills_dim["skills"].isin(cloud_skills),"skill_id"]

#And now et the jobs that require those skills:
cloud_job_ids=skills_job.loc[skills_job["skill_id"].isin(cloud_skill_ids), "job_id"]

#Now lets create the grouping column:
df["cloud_required"]=df["job_id"].isin(cloud_job_ids)

#Split the groups with the salaries:
cloud_salaries=df.loc[df["cloud_required"]==True,"salary_year_avg"]
non_cloud_salaries=df.loc[df["cloud_required"]==False,"salary_year_avg"]

#Test:
u_stat, p_value = mannwhitneyu(
    cloud_salaries,
    non_cloud_salaries,
    alternative="two-sided"
)

print("U statistic:", u_stat)
print("p-value:", p_value)


#And the effect size
n1 = len(cloud_salaries)
n2 = len(non_cloud_salaries)

rank_biserial = 1 - (2 * u_stat) / (n1 * n2)

print(f"n1={n1}\nn2={n2}\nn1xn2={n1*n2}\nu_stat={u_stat}")
print("Rank-biserial effect size:", rank_biserial)

Python code

Metric	Result
Median cloud salary	$102,500
Median non-cloud salary	$90,000
Median difference	+$12,500
Mann–Whitney test	p < 0.001
Effect size	r ≈ 0.31 (moderate)

This reinforces earlier findings, that cloud skills are associated with some of the highest paying roles in the DA job market

Conclusion:

Cloud DA roles show a median salary of $102,500, compared with $90,000 for roles that do not require these technologies, representing a $12,500 salary premium.

A Mann–Whitney U test confirms that this difference is statistically significant (p < 0.001). The effect size (r = 0.31) indicates a moderate practical effect, suggesting that working with cloud and data infrastructure technologies is associated with meaningfully higher salaries.

Takeaway: Cloud and data platform skills appear to command a meaningful salary premium in the DA job market, indicating that analysts who work closer to modern data infrastructure may have access to higher paying roles.

Tests Summary Table:

Test	Median Difference	p-value	Effect Size (r)	Result
Python vs Non-Python	+$10,000	<0.001	0.19	Significant
Remote vs On-site	-$5,230	0.164	0.02	Not significant
Cloud vs Non-Cloud	+$12,500	<0.001	0.31	Significant

“Because salary distributions are heavily right-skewed and contain extreme outliers, I used log-linear regression models to estimate percentage differences in salary across groups. This approach provides interpretable effect sizes while reducing the influence of extreme values.”

Final Conclusions and Recommendations

This project analyzed 447,505 Data Analyst job postings to identify salary patterns, skill demand, geographic opportunities and job market trends. Using SQL and Python, the analysis explored which skills are most valuable, how demand has evolved over time and what skill combinations provide the strongest career advantage.

The goal was to provide insights and information to help aspiring Data Analysts prioritize their learning and career strategy.

Key findings:

SQL is the foundational skill:

SQL dominates job postings (appears in nearly half of them) and is anchored in the top 4 skill combinations. It serves as the core technical requirement for Data Analyst roles and without this skill, job opportunities are limited.
Python is the strongest differentiator:

Python appears in over one in five postings alongside SQL and is consistently associated with higher-paying roles.
BI and visualization tools are baseline expectations:

Power BI and Tableau appear frequently across job postings, confirming that employers expect analysts to communicate insights effectively. Power BI, in particular, shows the strongest growth trend.
Excel remains important but declining:

Although its relative demand is declining, it continues to be widely expected for data analysis and reporting.
Specialized and cloud skills are associated with higher salaries:

Less common skills, particularly cloud platforms and data engineering tools, show the largest salary premiums (confirmed by hypothesis testing).
Skills matter more than formal education for salary outcomes:

While degree requirements are mentioned in many postings, salary analysis shows no meaningful difference between roles that require a degree and those that do not. This suggests technical skills play a larger role in determining compensation.
Remote work does not provide a salary premium:

Remote and on-site roles show similar salary distributions, indicating that compensation is primarily driven by skills and role requirements rather than work "location".
Job market showed contraction during the analyzed period:

Job postings declined significantly between 2023 and late 2024, followed by a partial recovery in early 2025.
Opportunities are geographically concentrated:

The United States dominates global job volume, making it the largest and most influential Data Analyst job market. Other regions provide opportunities but at significantly lower scale.
Salary transparency is extremely low:

Fewer than 6% of job postings disclose salary information, meaning that most compensation insights are derived from a relatively small job market sample.

Recommendations:

Learn SQL first, as it is the most consistently demanded foundational skill.
Add Python to improve access to higher paying and more advanced roles.
Learn at least one BI tool, preferably Power BI or Tableau.
Consider building familiarity with cloud and data tools to access higher-paying opportunities.
Focus on skills over credentials, as degree requirements did not show a meaningful salary advantage.
Treat salary benchmarks cautiously, since only a small percentage of postings disclose compensation.

What I learned

The analysis goes beyond surface-level aggregations, addressing data quality issues, acknowledging limitations, contextualizing outliers and drawing conclusions that are appropriately qualified by the constraints of the dataset.

This project allowed me to strengthened both my technical and analytical decision-making skills:

SQL:

Joins;
CTEs;
Window Functions;
Time series;
Data cleaning, filtering, aggregation and transformation.

Python:

Pandas;
Matplotlib visualizations and formatting;
Data cleaning, filtering, aggregation and transformation.

Data Analysis Skills ("what does the data show and is it trustworthy?"):

Handling skewed data (Salary data heavily skewed with extreme outliers - Mean vs Median comparisons)
Trend analysis;
Skill demand analysis;
Market analysis;
Demand vs Value analysis;
Analytical thinking (qualify conclusions vs reporting numbers);
Choosing the appropriate statistical validation tools (hypothesis testing and effect size interpretation);
Translating analysis into career insights;
Recognizing data limitations:
- Short period (2023-2025);
- Only 5.39% of jobs include salary data;
- Job postings do not include experience level;
- Some skills appear duplicated in the raw dataset (different id for the same name).

Final conclusion

If you're trying to break into or even advance in Data Analytics, this analysis points to a clear priority order:

SQL - get to a strong level first, as it appears in nearly half of all postings and anchors almost every high-demand skill combination;
Python - as a next step, not because it's universally required, but because it consistently appears in higher-paying roles and is the clearest differentiator between entry to mid level compensation;
BI tool - to complete the core (Power BI is growing faster but Tableau remains widely expected).
cloud and data engineering skills - beyond the core stack, these skills carry the largest salary premiums but they're niche, so treat them as a second phase, not a starting point.

Candidates with these skills are best positioned to succeed in the job market.

Warning: salary insights here are based on fewer than 6% of postings that actually disclosed compensation. Although statistical tests confirm the observed differences, the figures should be interpreted as indicative estimates rather than precise market benchmarks.

Future improvements:

Experience level analysis - Examining how salary and skill requirements evolve across career stages (not available in the current dataset).
Industry segmentation - Analyzing demand by industry (finance, healthcare, technology, etc.) could reveal sector-specific skill requirements and salary patterns (not available in the current dataset);
Remote vs On-site skill comparison - A direct comparison between remote and on-site postings could highlight which skills are more strongly associated with remote work.
Predictive salary modeling - Building a regression model could estimate salary based on combinations of skills, work arrangement and geographic location.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
results tables		results tables
.gitignore		.gitignore
1.1_salary_trimming.py		1.1_salary_trimming.py
1.1_salary_trimming.sql		1.1_salary_trimming.sql
10_Top_skills_remote_jobs.py		10_Top_skills_remote_jobs.py
10_Top_skills_remote_jobs.sql		10_Top_skills_remote_jobs.sql
11_Top_skills_no_degree.py		11_Top_skills_no_degree.py
11_Top_skills_no_degree.sql		11_Top_skills_no_degree.sql
12_Top_paying_countries.py		12_Top_paying_countries.py
12_Top_paying_countries.sql		12_Top_paying_countries.sql
13_Countries_with_most_job_postings.py		13_Countries_with_most_job_postings.py
13_Countries_with_most_job_postings.sql		13_Countries_with_most_job_postings.sql
14_Demand_DA_roles_over_time.py		14_Demand_DA_roles_over_time.py
14_Demand_DA_roles_over_time.sql		14_Demand_DA_roles_over_time.sql
15_Skills_Demand_over_time.py		15_Skills_Demand_over_time.py
15_Skills_Demand_over_time.sql		15_Skills_Demand_over_time.sql
16_Most_requested_skill_pairs.py		16_Most_requested_skill_pairs.py
16_Most_requested_skill_pairs.sql		16_Most_requested_skill_pairs.sql
17_Missing_salaries.py		17_Missing_salaries.py
17_Missing_salaries.sql		17_Missing_salaries.sql
1_top_paying_jobs.py		1_top_paying_jobs.py
1_top_paying_jobs.sql		1_top_paying_jobs.sql
2_skills_in_high_paying_roles.py		2_skills_in_high_paying_roles.py
2_skills_in_high_paying_roles.sql		2_skills_in_high_paying_roles.sql
3_top_in_demand_skills_DA.py		3_top_in_demand_skills_DA.py
3_top_in_demand_skills_DA.sql		3_top_in_demand_skills_DA.sql
4_top_paying_skills.py		4_top_paying_skills.py
4_top_paying_skills.sql		4_top_paying_skills.sql
5_optimal_skills.py		5_optimal_skills.py
5_optimal_skills.sql		5_optimal_skills.sql
6_Salary_by_job_schedule_type.py		6_Salary_by_job_schedule_type.py
6_Salary_by_job_schedule_type.sql		6_Salary_by_job_schedule_type.sql
7_Remote_VS_OnSite_Salary.py		7_Remote_VS_OnSite_Salary.py
7_Remote_VS_OnSite_Salary.sql		7_Remote_VS_OnSite_Salary.sql
8_Degree_Salary.py		8_Degree_Salary.py
8_Degree_Salary.sql		8_Degree_Salary.sql
9_Remove_VS_Degree_combo.py		9_Remove_VS_Degree_combo.py
9_Remove_VS_Degree_combo.sql		9_Remove_VS_Degree_combo.sql
Python_graphs_code.ipynb		Python_graphs_code.ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Executive Summary (TLDR)

Table of Contents

Introduction

Analytical Framework

Background

A. Salary and Skills Landscape of DA roles:

B. Skill Demand, Value and Optimization:

C. Work Structure, Accessibility and Compensation:

D. Geographic Job Market Dynamics:

E. Market Evolution (2023-2025)

F. Skill Relationships:

G. Data Transparency and Limitations:

Limitations of the data

Tools Used

Analysis

1. What are the top-paying data analyst jobs?

Handling Extreme Salary Outliers:

2. What skills are required for top-paying jobs?

3. What skills are most in demand for data analysts?

4. Which skills are associated with higher salaries?

5. What are the most optimal skills (high demand and high pay) to learn for a data analyst?

6. How does salary vary by job schedule type (full-time, contract, etc.)?

7. How does salary differ between remote vs on-site roles?(job_work_from_home = TRUE/FALSE)

8. How do degree vs no-degree roles compare in pay?

9. Combo analysis of both remote status and degree requirement

10. What skills are most common in remote Data Analyst roles?

11. Which skills are most common in no-degree roles?

12. Which countries offer the highest-paying data analyst roles?

13. Which countries have the most job opportunities for DA roles?

14. How has demand for Data Analyst roles changed over time?

15. What is the Skill demand over time?

16. Which skills most frequently appear together? (skill pair analysis)

17. How many job postings are missing salary information?

Hypothesis Testing:

Test 1 - Python vs Non-Python salary

Test 2 - Remote vs On-site Salary

Test 3 - Cloud Skills vs Non-Cloud Salary

Tests Summary Table:

Final Conclusions and Recommendations

What I learned

Final conclusion

Future improvements:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages