<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://coolshades.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://coolshades.github.io/" rel="alternate" type="text/html" /><updated>2026-03-09T16:49:53+00:00</updated><id>https://coolshades.github.io/feed.xml</id><title type="html">Dr U Bhalraam</title><subtitle>About Me!</subtitle><author><name>U Bhalraam</name></author><entry><title type="html">Building the FPR Modeller: A Deep Dive into Data-Driven Pay Analysis</title><link href="https://coolshades.github.io/data%20science/web%20development/Project-2/" rel="alternate" type="text/html" title="Building the FPR Modeller: A Deep Dive into Data-Driven Pay Analysis" /><published>2024-07-14T00:00:00+00:00</published><updated>2024-07-14T00:00:00+00:00</updated><id>https://coolshades.github.io/data%20science/web%20development/Project%202</id><content type="html" xml:base="https://coolshades.github.io/data%20science/web%20development/Project-2/"><![CDATA[<p><em>Disclaimer: This blog post was generated with the assistance of AI technology. While the content has been reviewed for accuracy, some phrasings or structures may reflect AI-generated text.</em></p>

<h2 id="from-concept-to-reality-the-journey-of-creating-the-fpr-modeller">From Concept to Reality: The Journey of Creating the FPR Modeller</h2>

<p>In the world of healthcare and medical professionals, understanding the intricacies of pay progression and restoration is crucial. This need inspired me to develop the Full Pay Restoration (FPR) Modeller, a sophisticated web application designed to help medical professionals, union representatives, and policymakers visualize and analyze complex pay scenarios. In this post, I’ll take you through the development process, highlight key features, and share insights into the technology that powers this tool.</p>

<h3 id="the-spark-why-build-an-fpr-modeller">The Spark: Why Build an FPR Modeller?</h3>

<p>The idea for the FPR Modeller stemmed from the ongoing discussions about pay restoration in the medical field. I realized there was a need for a tool that could:</p>

<ol>
  <li>Visualize pay progression over time</li>
  <li>Calculate the impact of inflation on real pay</li>
  <li>Track progress towards full pay restoration</li>
  <li>Estimate the costs associated with different pay deals</li>
</ol>

<p>With these goals in mind, I set out to create a user-friendly, data-driven application that could handle these complex calculations and present the results in an easily digestible format.</p>

<h3 id="choosing-the-right-tools">Choosing the Right Tools</h3>

<p>After considering various options, I decided to build the FPR Modeller using:</p>

<ul>
  <li><strong>Python</strong>: For its robust data processing capabilities and extensive libraries</li>
  <li><strong>Streamlit</strong>: A powerful framework for creating web applications with Python</li>
  <li><strong>Plotly</strong>: For interactive and visually appealing charts</li>
  <li><strong>Pandas</strong>: For efficient data manipulation and analysis</li>
</ul>

<p>These tools allowed me to rapidly develop a prototype and iterate on the design based on user feedback.</p>

<h3 id="key-features-of-the-fpr-modeller">Key Features of the FPR Modeller</h3>

<p>Let’s dive into some of the core features I implemented in the app:</p>

<h4 id="1-flexible-input-parameters">1. Flexible Input Parameters</h4>

<p>Users can adjust various parameters, including:</p>

<ul>
  <li>Inflation measure (RPI or CPI)</li>
  <li>FPR calculation period</li>
  <li>Number of years in the pay deal</li>
  <li>Doctor counts at each nodal point</li>
  <li>Year-by-year pay rises and inflation projections</li>
</ul>

<p>This flexibility allows for a wide range of scenarios to be modeled and compared.</p>

<p>Here’s a glimpse into how the sidebar inputs are set up using Streamlit:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">setup_sidebar</span><span class="p">():</span>
    <span class="n">initialize_session_state</span><span class="p">()</span>
    
    <span class="n">st</span><span class="p">.</span><span class="n">sidebar</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">"Modeller Settings ⚙️"</span><span class="p">)</span>
    
    <span class="n">st</span><span class="p">.</span><span class="n">sidebar</span><span class="p">.</span><span class="n">subheader</span><span class="p">(</span><span class="s">"Calculate Pay Restoration Targets."</span><span class="p">)</span>
    
    <span class="n">inflation_type</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">sidebar</span><span class="p">.</span><span class="n">radio</span><span class="p">(</span><span class="s">"Select inflation measure:"</span><span class="p">,</span> <span class="p">(</span><span class="s">"RPI"</span><span class="p">,</span> <span class="s">"CPI"</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="s">"inflation_type"</span><span class="p">,</span> <span class="n">on_change</span><span class="o">=</span><span class="n">update_fpr_targets</span><span class="p">,</span> <span class="n">horizontal</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    
    <span class="n">col1</span><span class="p">,</span> <span class="n">col2</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">sidebar</span><span class="p">.</span><span class="n">columns</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
    <span class="k">with</span> <span class="n">col1</span><span class="p">:</span>
        <span class="n">fpr_start_year</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">selectbox</span><span class="p">(</span>
            <span class="s">"FPR start year"</span><span class="p">,</span>
            <span class="n">options</span><span class="o">=</span><span class="n">AVAILABLE_YEARS</span><span class="p">,</span>
            <span class="n">index</span><span class="o">=</span><span class="n">AVAILABLE_YEARS</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">fpr_start_year</span><span class="p">),</span>
            <span class="n">key</span><span class="o">=</span><span class="s">"fpr_start_year"</span><span class="p">,</span>
            <span class="n">on_change</span><span class="o">=</span><span class="n">update_end_year_options</span>
        <span class="p">)</span>
    <span class="k">with</span> <span class="n">col2</span><span class="p">:</span>
        <span class="n">fpr_end_year</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">selectbox</span><span class="p">(</span>
            <span class="s">"FPR end year"</span><span class="p">,</span>
            <span class="n">options</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">end_year_options</span><span class="p">,</span>
            <span class="n">index</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">end_year_options</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">fpr_end_year</span><span class="p">),</span>
            <span class="n">key</span><span class="o">=</span><span class="s">"fpr_end_year"</span><span class="p">,</span>
            <span class="n">on_change</span><span class="o">=</span><span class="n">update_fpr_targets</span>
        <span class="p">)</span>
    
    <span class="c1"># ... [rest of the function]
</span>
    <span class="k">return</span> <span class="n">inflation_type</span><span class="p">,</span> <span class="n">fpr_start_year</span><span class="p">,</span> <span class="n">fpr_end_year</span><span class="p">,</span> <span class="n">num_years</span><span class="p">,</span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">fpr_targets</span><span class="p">,</span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">doctor_counts</span><span class="p">,</span> <span class="n">year_inputs</span><span class="p">,</span> <span class="n">additional_hours</span><span class="p">,</span> <span class="n">out_of_hours</span>
</code></pre></div></div>

<p>This function sets up the sidebar inputs and manages the state of the application, allowing for dynamic updates based on user interactions.</p>

<h4 id="2-interactive-visualizations">2. Interactive Visualizations</h4>

<p>One of the most powerful aspects of the FPR Modeller is its ability to visually represent complex data. I implemented several interactive charts using Plotly:</p>

<ul>
  <li>Pay Progression Chart: Shows baseline pay, pay increases, FPR progress, and pay erosion over time</li>
  <li>Pay Increase Curve: Displays nominal increases, real increases, and cumulative costs</li>
</ul>

<p>These visualizations help users quickly grasp the long-term implications of different pay scenarios.</p>

<p>The creation of interactive visualizations is powered by Plotly. Here’s an example of how the pay progression chart is created:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">create_pay_progression_chart</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">num_years</span><span class="p">):</span>
    <span class="n">years</span> <span class="o">=</span> <span class="p">[</span><span class="sa">f</span><span class="s">"Year </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s"> (</span><span class="si">{</span><span class="mi">2023</span><span class="o">+</span><span class="n">i</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="mi">2024</span><span class="o">+</span><span class="n">i</span><span class="si">}</span><span class="s">)"</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_years</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)]</span>
    <span class="n">nominal_pay</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="s">"Pay Progression Nominal"</span><span class="p">]</span>
    <span class="n">baseline_pay</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="s">"Base Pay"</span><span class="p">]</span>
    <span class="n">pay_increase</span> <span class="o">=</span> <span class="p">[</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">pay</span> <span class="o">-</span> <span class="n">baseline_pay</span><span class="p">)</span> <span class="k">for</span> <span class="n">pay</span> <span class="ow">in</span> <span class="n">nominal_pay</span><span class="p">]</span>
    <span class="n">percent_increase</span> <span class="o">=</span> <span class="p">[(</span><span class="n">increase</span> <span class="o">/</span> <span class="n">baseline_pay</span><span class="p">)</span> <span class="o">*</span> <span class="mi">100</span> <span class="k">for</span> <span class="n">increase</span> <span class="ow">in</span> <span class="n">pay_increase</span><span class="p">]</span>
    <span class="n">pay_erosion</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="mi">100</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">result</span><span class="p">[</span><span class="s">"Real Terms Pay Cuts"</span><span class="p">]]</span>
    <span class="n">fpr_progress</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="s">"FPR Progress"</span><span class="p">]</span>

    <span class="n">fig</span> <span class="o">=</span> <span class="n">make_subplots</span><span class="p">(</span><span class="n">specs</span><span class="o">=</span><span class="p">[[{</span><span class="s">"secondary_y"</span><span class="p">:</span> <span class="bp">True</span><span class="p">}]])</span>

    <span class="n">fig</span><span class="p">.</span><span class="n">add_trace</span><span class="p">(</span>
        <span class="n">go</span><span class="p">.</span><span class="n">Bar</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">years</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="n">baseline_pay</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">years</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">"Baseline Pay"</span><span class="p">,</span> <span class="n">marker_color</span><span class="o">=</span><span class="s">'rgb(0, 123, 255)'</span><span class="p">),</span>
        <span class="n">secondary_y</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="n">fig</span><span class="p">.</span><span class="n">add_trace</span><span class="p">(</span>
        <span class="n">go</span><span class="p">.</span><span class="n">Bar</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">years</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">pay_increase</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Pay Increase"</span><span class="p">,</span> <span class="n">marker_color</span><span class="o">=</span><span class="s">'rgb(255, 165, 0)'</span><span class="p">,</span>
               <span class="n">hovertemplate</span><span class="o">=</span><span class="s">'Year: %{x}&lt;br&gt;Total Pay: £%{customdata[0]:,.2f}&lt;br&gt;Increase: £%{y:,.2f} (%{customdata[1]:.2f}%)&lt;extra&gt;&lt;/extra&gt;'</span><span class="p">,</span>
               <span class="n">customdata</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">nominal_pay</span><span class="p">,</span> <span class="n">percent_increase</span><span class="p">))),</span>
        <span class="n">secondary_y</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="n">fig</span><span class="p">.</span><span class="n">add_trace</span><span class="p">(</span>
        <span class="n">go</span><span class="p">.</span><span class="n">Scatter</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">years</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">fpr_progress</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"FPR Progress"</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s">'rgb(0, 200, 0)'</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">2</span><span class="p">)),</span>
        <span class="n">secondary_y</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="n">fig</span><span class="p">.</span><span class="n">add_trace</span><span class="p">(</span>
        <span class="n">go</span><span class="p">.</span><span class="n">Scatter</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">years</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">pay_erosion</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Pay Erosion"</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s">'rgb(255, 0, 0)'</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">2</span><span class="p">)),</span>
        <span class="n">secondary_y</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="c1"># ... [layout configuration]
</span>
    <span class="k">return</span> <span class="n">fig</span>
</code></pre></div></div>

<p>This function demonstrates how multiple data series are combined into a single, informative chart using Plotly’s <code class="language-plaintext highlighter-rouge">make_subplots</code> and various trace types.</p>

<h4 id="3-detailed-cost-breakdown">3. Detailed Cost Breakdown</h4>

<p>Understanding the financial impact of pay deals is crucial. The app provides a comprehensive cost breakdown, including:</p>

<ul>
  <li>Basic pay costs</li>
  <li>Pension contributions</li>
  <li>Additional hours and out-of-hours costs</li>
  <li>Employer National Insurance contributions</li>
  <li>Tax recouped</li>
</ul>

<p>This level of detail allows for more informed decision-making in pay negotiations.</p>

<p>The cost breakdown calculations involve several components. Here’s a snippet showing how the costs are calculated:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calculate_costs</span><span class="p">(</span><span class="n">pay_progression_nominal</span><span class="p">,</span> <span class="n">doctor_count</span><span class="p">,</span> <span class="n">year_inputs</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">post_ddrb_pay</span><span class="p">,</span> <span class="n">additional_hours</span><span class="p">,</span> <span class="n">out_of_hours</span><span class="p">):</span>
    <span class="n">yearly_basic_costs</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">yearly_total_costs</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">yearly_tax_recouped</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">yearly_net_costs</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">yearly_employer_ni_costs</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">yearly_pension_costs</span> <span class="o">=</span> <span class="p">[]</span>

    <span class="k">def</span> <span class="nf">calculate_total_pay</span><span class="p">(</span><span class="n">basic_pay</span><span class="p">):</span>
        <span class="n">additional_pay</span> <span class="o">=</span> <span class="p">(</span><span class="n">basic_pay</span> <span class="o">/</span> <span class="mi">40</span><span class="p">)</span> <span class="o">*</span> <span class="n">additional_hours</span>
        <span class="n">ooh_pay</span> <span class="o">=</span> <span class="p">(</span><span class="n">basic_pay</span> <span class="o">/</span> <span class="mi">40</span><span class="p">)</span> <span class="o">*</span> <span class="n">out_of_hours</span> <span class="o">*</span> <span class="mf">0.37</span>
        <span class="k">return</span> <span class="n">basic_pay</span><span class="p">,</span> <span class="n">additional_pay</span><span class="p">,</span> <span class="n">ooh_pay</span>

    <span class="k">def</span> <span class="nf">calculate_tax</span><span class="p">(</span><span class="n">total_pay</span><span class="p">,</span> <span class="n">basic_pay</span><span class="p">):</span>
        <span class="n">pension_contribution</span> <span class="o">=</span> <span class="n">calculate_pension_contribution</span><span class="p">(</span><span class="n">basic_pay</span><span class="p">)</span>
        <span class="n">taxable_pay</span> <span class="o">=</span> <span class="n">total_pay</span> <span class="o">-</span> <span class="n">pension_contribution</span>
        <span class="n">income_tax</span> <span class="o">=</span> <span class="n">calculate_income_tax</span><span class="p">(</span><span class="n">taxable_pay</span><span class="p">)</span>
        <span class="n">ni</span> <span class="o">=</span> <span class="n">calculate_national_insurance</span><span class="p">(</span><span class="n">taxable_pay</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">income_tax</span><span class="p">,</span> <span class="n">ni</span><span class="p">,</span> <span class="n">pension_contribution</span>

    <span class="k">for</span> <span class="n">year</span><span class="p">,</span> <span class="p">(</span><span class="n">current_pay</span><span class="p">,</span> <span class="n">prev_pay</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">pay_progression_nominal</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">pay_progression_nominal</span><span class="p">)):</span>
        <span class="n">current_basic</span><span class="p">,</span> <span class="n">current_additional</span><span class="p">,</span> <span class="n">current_ooh</span> <span class="o">=</span> <span class="n">calculate_total_pay</span><span class="p">(</span><span class="n">current_pay</span><span class="p">)</span>
        <span class="n">current_total_pay</span> <span class="o">=</span> <span class="n">current_basic</span> <span class="o">+</span> <span class="n">current_additional</span> <span class="o">+</span> <span class="n">current_ooh</span>
        <span class="n">current_income_tax</span><span class="p">,</span> <span class="n">current_ni</span><span class="p">,</span> <span class="n">current_pension</span> <span class="o">=</span> <span class="n">calculate_tax</span><span class="p">(</span><span class="n">current_total_pay</span><span class="p">,</span> <span class="n">current_basic</span><span class="p">)</span>

        <span class="c1"># ... [cost calculations for each year]
</span>
        <span class="n">yearly_basic_costs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">basic_pay_cost</span><span class="p">)</span>
        <span class="n">yearly_total_costs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">total_cost</span><span class="p">)</span>
        <span class="n">yearly_tax_recouped</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">tax_recouped</span><span class="p">)</span>
        <span class="n">yearly_net_costs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">net_cost</span><span class="p">)</span>
        <span class="n">yearly_employer_ni_costs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">employer_ni_cost</span><span class="p">)</span>
        <span class="n">yearly_pension_costs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">pension_cost</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">yearly_basic_costs</span><span class="p">,</span> <span class="n">yearly_total_costs</span><span class="p">,</span> <span class="n">yearly_tax_recouped</span><span class="p">,</span> <span class="n">yearly_net_costs</span><span class="p">,</span> <span class="n">yearly_employer_ni_costs</span><span class="p">,</span> <span class="n">yearly_pension_costs</span>
</code></pre></div></div>

<p>This function showcases how various components of pay and taxation are factored into the cost calculations, providing a comprehensive breakdown of the financial implications of pay deals.</p>

<h4 id="4-fpr-achievement-tracking">4. FPR Achievement Tracking</h4>

<p>The app calculates and displays whether the proposed pay deal achieves full pay restoration for each nodal point. This feature helps users quickly assess the effectiveness of different scenarios in meeting FPR targets.</p>

<p>The FPR achievement is calculated and displayed using the following function:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">display_fpr_achievement</span><span class="p">(</span><span class="n">results</span><span class="p">):</span>
    <span class="n">st</span><span class="p">.</span><span class="n">subheader</span><span class="p">(</span><span class="s">":blue-background[👈 Will FPR be achieved from this pay deal? 🕵️]"</span><span class="p">)</span>
    <span class="n">fpr_achieved</span> <span class="o">=</span> <span class="nb">all</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="s">"FPR Progress"</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">&gt;=</span> <span class="mi">100</span> <span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">)</span>
    
    <span class="k">if</span> <span class="n">fpr_achieved</span><span class="p">:</span>
        <span class="n">st</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="s">"Yes, FPR will be achieved for all nodal points."</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">st</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="s">"No, FPR will not be achieved for all nodal points. But some progress has been made...  </span><span class="se">\n</span><span class="s">Note the residual pay erosion figures in % below."</span><span class="p">)</span>
    
    <span class="n">cols</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">columns</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">results</span><span class="p">))</span>
    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">result</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">results</span><span class="p">):</span>
        <span class="k">with</span> <span class="n">cols</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
            <span class="n">fpr_progress</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="s">"FPR Progress"</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
            <span class="n">pay_erosion</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="s">"Real Terms Pay Cuts"</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
            
            <span class="n">pay_erosion_formatted</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">pay_erosion</span> <span class="o">*</span> <span class="mi">100</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">%"</span>
            <span class="n">fpr_progress_formatted</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"FPR: </span><span class="si">{</span><span class="n">fpr_progress</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">%"</span>
            
            <span class="n">st</span><span class="p">.</span><span class="n">metric</span><span class="p">(</span>
                <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">result</span><span class="p">[</span><span class="s">'Nodal Point'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">,</span>
                <span class="n">value</span><span class="o">=</span><span class="n">pay_erosion_formatted</span><span class="p">,</span>
                <span class="n">delta</span><span class="o">=</span><span class="n">fpr_progress_formatted</span><span class="p">,</span>
                <span class="n">delta_color</span><span class="o">=</span><span class="s">"normal"</span>
            <span class="p">)</span>
</code></pre></div></div>

<p>This function not only determines whether FPR is achieved but also presents the results in an easy-to-understand format using Streamlit’s <code class="language-plaintext highlighter-rouge">metric</code> component.</p>

<h3 id="the-heart-of-the-calculations-fpr-and-pay-erosion">The Heart of the Calculations: FPR and Pay Erosion</h3>

<p>At the core of the FPR Modeller are the calculations for Full Pay Restoration progress and pay erosion. Let’s take a closer look at how these are implemented:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calculate_real_terms_change</span><span class="p">(</span><span class="n">pay_rise</span><span class="p">,</span> <span class="n">inflation</span><span class="p">):</span>
    <span class="k">return</span> <span class="p">((</span><span class="mi">1</span> <span class="o">+</span> <span class="n">pay_rise</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">inflation</span><span class="p">))</span> <span class="o">-</span> <span class="mi">1</span>

<span class="k">def</span> <span class="nf">calculate_new_pay_erosion</span><span class="p">(</span><span class="n">current_erosion</span><span class="p">,</span> <span class="n">real_terms_change</span><span class="p">):</span>
    <span class="k">return</span> <span class="p">((</span><span class="mi">1</span> <span class="o">+</span> <span class="n">current_erosion</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">real_terms_change</span><span class="p">))</span> <span class="o">-</span> <span class="mi">1</span>

<span class="k">def</span> <span class="nf">calculate_fpr_and_erosion</span><span class="p">(</span><span class="n">base_pay</span><span class="p">,</span> <span class="n">pay_progression_nominal</span><span class="p">,</span> <span class="n">pay_progression_real</span><span class="p">,</span> <span class="n">fpr_percentage</span><span class="p">,</span> <span class="n">year_inputs</span><span class="p">):</span>
    <span class="n">real_terms_pay_cuts</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="n">fpr_percentage</span> <span class="o">/</span> <span class="mi">100</span><span class="p">]</span>
    <span class="n">fpr_progress</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span>

    <span class="k">for</span> <span class="n">year</span><span class="p">,</span> <span class="p">(</span><span class="n">nominal_pay</span><span class="p">,</span> <span class="n">real_pay</span><span class="p">,</span> <span class="n">year_input</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">pay_progression_nominal</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">pay_progression_real</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">year_inputs</span><span class="p">)):</span>
        <span class="n">total_pay_rise</span> <span class="o">=</span> <span class="p">(</span><span class="n">nominal_pay</span> <span class="o">-</span> <span class="n">pay_progression_nominal</span><span class="p">[</span><span class="n">year</span><span class="p">])</span> <span class="o">/</span> <span class="n">pay_progression_nominal</span><span class="p">[</span><span class="n">year</span><span class="p">]</span>
        <span class="n">inflation_rate</span> <span class="o">=</span> <span class="n">year_input</span><span class="p">[</span><span class="s">"inflation"</span><span class="p">]</span>
        
        <span class="n">real_terms_change</span> <span class="o">=</span> <span class="n">calculate_real_terms_change</span><span class="p">(</span><span class="n">total_pay_rise</span><span class="p">,</span> <span class="n">inflation_rate</span><span class="p">)</span>
        <span class="n">current_pay_cut</span> <span class="o">=</span> <span class="n">calculate_new_pay_erosion</span><span class="p">(</span><span class="n">real_terms_pay_cuts</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">real_terms_change</span><span class="p">)</span>
        
        <span class="n">real_terms_pay_cuts</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">current_pay_cut</span><span class="p">)</span>
        <span class="n">current_progress</span> <span class="o">=</span> <span class="p">(</span><span class="n">fpr_percentage</span> <span class="o">/</span> <span class="mi">100</span> <span class="o">+</span> <span class="n">current_pay_cut</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">fpr_percentage</span> <span class="o">/</span> <span class="mi">100</span><span class="p">)</span> <span class="o">*</span> <span class="mi">100</span>
        <span class="n">fpr_progress</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">current_progress</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">real_terms_pay_cuts</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">fpr_progress</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
</code></pre></div></div>

<p>These functions demonstrate how the app calculates real terms changes in pay, tracks pay erosion over time, and measures progress towards the FPR target.</p>

<h3 id="overcoming-challenges">Overcoming Challenges</h3>

<p>Developing the FPR Modeller wasn’t without its challenges. Some of the key hurdles I faced and overcame included:</p>

<ol>
  <li>
    <p><strong>Complex Calculations</strong>: Ensuring accuracy in the various interrelated calculations required careful planning and extensive testing.</p>
  </li>
  <li>
    <p><strong>Performance Optimization</strong>: With multiple interactive elements and real-time calculations, optimizing performance was crucial for a smooth user experience.</p>
  </li>
  <li>
    <p><strong>User Interface Design</strong>: Balancing the need for detailed inputs with a clean, intuitive interface required several iterations.</p>
  </li>
  <li>
    <p><strong>Data Visualization</strong>: Choosing the right types of charts and ensuring they effectively communicated the key insights took experimentation and user feedback.</p>
  </li>
</ol>

<h3 id="the-development-process">The Development Process</h3>

<p>Here’s a brief overview of my development process:</p>

<ol>
  <li>
    <p><strong>Requirements Gathering</strong>: I started by clearly defining the app’s requirements based on the needs of potential users.</p>
  </li>
  <li>
    <p><strong>Prototype Development</strong>: Using Streamlit, I quickly built a basic prototype to test the core functionality.</p>
  </li>
  <li>
    <p><strong>Iterative Refinement</strong>: I continuously improved the app based on self-testing and feedback, adding features and refining the user interface.</p>
  </li>
  <li>
    <p><strong>Testing and Validation</strong>: Rigorous testing was performed to ensure the accuracy of calculations and the reliability of the app across different scenarios.</p>
  </li>
  <li>
    <p><strong>Documentation and Deployment</strong>: Finally, I created user documentation and deployed the app for wider access.</p>
  </li>
</ol>

<h3 id="future-enhancements">Future Enhancements</h3>

<p>While the current version of the FPR Modeller is fully functional, there’s always room for improvement. Some ideas for future enhancements include:</p>

<ol>
  <li>Integration with live economic data sources for up-to-date inflation figures</li>
  <li>Additional visualization options, such as comparative charts for different scenarios</li>
  <li>Export functionality for reports and data</li>
  <li>Collaborative features allowing multiple users to work on scenarios together</li>
</ol>

<h3 id="conclusion">Conclusion</h3>

<p>Developing the FPR Modeller has been an exciting journey, blending data science, web development, and domain-specific knowledge in healthcare economics. This project showcases the power of modern web technologies and data visualization techniques in creating practical tools for complex real-world problems.</p>

<p>I hope this behind-the-scenes look at the FPR Modeller inspires you to tackle similar challenges in your field. Whether you’re a developer, a data scientist, or simply someone interested in the intersection of technology and specialized domains, there’s always an opportunity to create tools that can make a difference.</p>

<p>Feel free to check out the <a href="https://modeltest.streamlit.app/">FPR Modeller</a> and let me know your thoughts or suggestions for improvement. Happy modeling!</p>]]></content><author><name>U Bhalraam</name></author><category term="Data Science" /><category term="Web Development" /><category term="Streamlit" /><category term="Python" /><category term="Data Visualization" /><category term="Healthcare" /><summary type="html"><![CDATA[Disclaimer: This blog post was generated with the assistance of AI technology. While the content has been reviewed for accuracy, some phrasings or structures may reflect AI-generated text.]]></summary></entry><entry><title type="html">Data Science: Weather and Happiness Analysis</title><link href="https://coolshades.github.io/data%20science/Project-1/" rel="alternate" type="text/html" title="Data Science: Weather and Happiness Analysis" /><published>2024-06-02T00:00:00+00:00</published><updated>2024-06-02T00:00:00+00:00</updated><id>https://coolshades.github.io/data%20science/Project%201</id><content type="html" xml:base="https://coolshades.github.io/data%20science/Project-1/"><![CDATA[<h2 id="can-data-solve-the-age-old-question-does-weather-influence-happiness">Can Data Solve the Age-Old Question: Does Weather Influence Happiness?</h2>

<p>Have you ever wondered if those sunny days actually boost your mood, or if rainy days truly bring you down? This age-old question has puzzled scientists, philosophers, and just about everyone else for generations. But what if we could use data to finally find an answer? In this post, I’ll dive into the fascinating intersection of meteorology and psychology to explore whether the weather really has a tangible effect on our happiness. By leveraging data science, I aim to uncover patterns and insights that could shed light on how our environment influences our well-being. So, let’s embark on this journey to see if we can decode the weather’s impact on happiness using the power of data.</p>

<p>For the full analysis and more details, visit <a href="https://ubhalraam.com/DSTI_Assignment3">this link</a>.</p>

<h3 id="part-0-data-cleaning-and-preparation">Part 0: Data Cleaning and Preparation</h3>

<p>To begin, I needed comprehensive weather data from various stations across the UK. I wrote a Python script to download and process the data from the MetOffice, ensuring it was ready for analysis. Here’s a glimpse of the data preparation process:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">requests</span>

<span class="n">folder_name</span> <span class="o">=</span> <span class="s">'/path/to/weather_data'</span>
<span class="n">os</span><span class="p">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">folder_name</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">download_file</span><span class="p">(</span><span class="n">station_name</span><span class="p">):</span>
    <span class="n">url</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/</span><span class="si">{</span><span class="n">station_name</span><span class="si">}</span><span class="s">data.txt'</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
    <span class="n">response</span><span class="p">.</span><span class="n">raise_for_status</span><span class="p">()</span>
    <span class="n">save_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">folder_name</span><span class="p">,</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">station_name</span><span class="si">}</span><span class="s">data.txt'</span><span class="p">)</span>
    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">save_path</span><span class="p">,</span> <span class="s">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="nb">file</span><span class="p">:</span>
        <span class="nb">file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'data/stations.txt'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="nb">file</span><span class="p">:</span>
    <span class="n">station_names</span> <span class="o">=</span> <span class="nb">file</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="n">splitlines</span><span class="p">()</span>

<span class="k">for</span> <span class="n">station_name</span> <span class="ow">in</span> <span class="n">station_names</span><span class="p">:</span>
    <span class="n">download_file</span><span class="p">(</span><span class="n">station_name</span><span class="p">)</span>
</code></pre></div></div>

<p>This script ensured I had all the necessary weather data, which was then processed into a structured format using pandas.</p>

<p>I then used R for more sophisticated data wrangling, converting coordinates and validating the data:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">

</span><span class="n">S_weather</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'path/to/combined_weather_data.csv'</span><span class="p">)</span><span class="w">

</span><span class="c1"># Convert to spatial dataframe</span><span class="w">
</span><span class="n">weather_sf</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">S_weather</span><span class="p">,</span><span class="w"> </span><span class="n">coords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"easting"</span><span class="p">,</span><span class="w"> </span><span class="s2">"northing"</span><span class="p">),</span><span class="w"> </span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">27700</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w">
  </span><span class="n">st_transform</span><span class="p">(</span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4326</span><span class="p">)</span><span class="w">

</span><span class="c1"># Extract coordinates</span><span class="w">
</span><span class="n">coords</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">st_coordinates</span><span class="p">(</span><span class="n">weather_sf</span><span class="p">)</span><span class="w">
</span><span class="n">S_weather</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">S_weather</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">Longitude</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">coords</span><span class="p">[,</span><span class="w"> </span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">Latitude</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">coords</span><span class="p">[,</span><span class="w"> </span><span class="m">2</span><span class="p">])</span><span class="w">

</span><span class="c1"># Data validation</span><span class="w">
</span><span class="n">validate_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">df</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
    </span><span class="n">filter</span><span class="p">(</span><span class="n">year</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">1850</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">year</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">2024</span><span class="p">,</span><span class="w">
           </span><span class="n">month</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">month</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w">
           </span><span class="n">latitude</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">49.9</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">latitude</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">60.9</span><span class="p">,</span><span class="w">
           </span><span class="n">longitude</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">-8</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">longitude</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w">
           </span><span class="n">altitude</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">-2.75</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">altitude</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">1343</span><span class="p">,</span><span class="w">
           </span><span class="n">t_min</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">-50</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">t_min</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w">
           </span><span class="n">t_max</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">-50</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">t_max</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w">
           </span><span class="n">rain</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">rain</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">1000</span><span class="p">,</span><span class="w">
           </span><span class="n">sun</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">sun</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">1000</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">S_weather</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">validate_data</span><span class="p">(</span><span class="n">S_weather</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<h3 id="part-1-clustering-by-weather-data">Part 1: Clustering by Weather Data</h3>

<p>Once I had the data, I performed clustering to identify patterns. Using k-means clustering, I grouped weather stations based on key weather parameters like minimum temperature, maximum temperature, rainfall, sunshine duration, and air frost days.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">cluster</span><span class="p">)</span><span class="w">

</span><span class="n">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span><span class="w">
</span><span class="n">scaled_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">scale</span><span class="p">(</span><span class="n">S_weather</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">t_min</span><span class="p">,</span><span class="w"> </span><span class="n">t_max</span><span class="p">,</span><span class="w"> </span><span class="n">rain</span><span class="p">,</span><span class="w"> </span><span class="n">sun</span><span class="p">,</span><span class="w"> </span><span class="n">af_inmth</span><span class="p">))</span><span class="w">

</span><span class="n">wss</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sapply</span><span class="p">(</span><span class="m">2</span><span class="o">:</span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">k</span><span class="p">){</span><span class="w">
  </span><span class="nf">sum</span><span class="p">(</span><span class="n">kmeans</span><span class="p">(</span><span class="n">scaled_data</span><span class="p">,</span><span class="w"> </span><span class="n">centers</span><span class="o">=</span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">nstart</span><span class="o">=</span><span class="m">25</span><span class="p">)</span><span class="o">$</span><span class="n">withinss</span><span class="p">)</span><span class="w">
</span><span class="p">})</span><span class="w">

</span><span class="n">plot</span><span class="p">(</span><span class="m">2</span><span class="o">:</span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">wss</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s1">'b'</span><span class="p">,</span><span class="w"> </span><span class="n">xlab</span><span class="o">=</span><span class="s1">'Number of Clusters'</span><span class="p">,</span><span class="w"> </span><span class="n">ylab</span><span class="o">=</span><span class="s1">'Within groups sum of squares'</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>I determined the optimal number of clusters using the elbow method and visualised the clusters with ggplot2:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggpubr</span><span class="p">)</span><span class="w">

</span><span class="n">kmeans_result</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">kmeans</span><span class="p">(</span><span class="n">scaled_data</span><span class="p">,</span><span class="w"> </span><span class="n">centers</span><span class="o">=</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">nstart</span><span class="o">=</span><span class="m">25</span><span class="p">)</span><span class="w">
</span><span class="n">S_weather</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">S_weather</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">cluster</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.factor</span><span class="p">(</span><span class="n">kmeans_result</span><span class="o">$</span><span class="n">cluster</span><span class="p">))</span><span class="w">

</span><span class="n">ggscatter</span><span class="p">(</span><span class="n">S_weather</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sun"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"af_inmth"</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"cluster"</span><span class="p">,</span><span class="w"> 
          </span><span class="n">ellipse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"right"</span><span class="p">,</span><span class="w"> </span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"jco"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">ggtitle</span><span class="p">(</span><span class="s2">"Clusters based on Sunshine Duration and Air Frost Days"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<h3 id="part-2-predicting-geographic-regions">Part 2: Predicting Geographic Regions</h3>

<p>Next, I used the weather data to predict whether a station is located in the Northern, Middle, or Southern UK. This classification was done using a Random Forest model, implemented in the tidymodels framework.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span><span class="w">

</span><span class="n">region_rec</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">recipe</span><span class="p">(</span><span class="n">region</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">S_weather</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">step_impute_knn</span><span class="p">(</span><span class="n">all_predictors</span><span class="p">())</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">step_zv</span><span class="p">(</span><span class="n">all_predictors</span><span class="p">())</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">step_normalize</span><span class="p">(</span><span class="n">all_predictors</span><span class="p">())</span><span class="w">

</span><span class="n">rf_spec</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rand_forest</span><span class="p">(</span><span class="n">trees</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">set_mode</span><span class="p">(</span><span class="s2">"classification"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">set_engine</span><span class="p">(</span><span class="s2">"ranger"</span><span class="p">)</span><span class="w">

</span><span class="n">region_wf</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">workflow</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">add_recipe</span><span class="p">(</span><span class="n">region_rec</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">add_model</span><span class="p">(</span><span class="n">rf_spec</span><span class="p">)</span><span class="w">

</span><span class="n">set.seed</span><span class="p">(</span><span class="m">234</span><span class="p">)</span><span class="w">
</span><span class="n">region_folds</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">vfold_cv</span><span class="p">(</span><span class="n">S_weather</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="n">rf_rs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">region_wf</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">fit_resamples</span><span class="p">(</span><span class="n">region_folds</span><span class="p">)</span><span class="w">

</span><span class="n">region_pred</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rf_rs</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">collect_predictions</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">correct</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">region</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">.pred_class</span><span class="p">)</span><span class="w">
</span><span class="n">accuracy</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">accuracy</span><span class="p">(</span><span class="n">region_pred</span><span class="p">,</span><span class="w"> </span><span class="n">truth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">region</span><span class="p">,</span><span class="w"> </span><span class="n">estimate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.pred_class</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>My model showed an impressive accuracy, highlighting the potential of machine learning in interpreting geographic nuances based on meteorological data.</p>

<h3 id="conclusion">Conclusion</h3>

<p>Through this data-driven exploration, I’ve seen that weather data can indeed reveal interesting patterns and even predict geographic locations based on climatic conditions. While this analysis doesn’t directly answer whether weather influences individual happiness, it lays the groundwork for understanding the broader environmental factors at play. Future research could integrate psychological data to directly link weather patterns with mood changes, providing deeper insights into this intriguing question.</p>

<p>Stay tuned as I continue to explore the fascinating connections between our environment and well-being! For the full analysis and more details, visit <a href="https://ubhalraam.com/DSTI_Assignment3">this link</a>.</p>]]></content><author><name>U Bhalraam</name></author><category term="Data Science" /><summary type="html"><![CDATA[Can Data Solve the Age-Old Question: Does Weather Influence Happiness?]]></summary></entry></feed>