0

GenAI Stole My Work

 7 months ago
source link: https://www.stevebrownlee.com/genai-stole-my-work/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

GenAI Stole My Work

10 February 2024

I maintain a bespoke learning management platform application that I built for my team to track all kinds of student data. This week, I finally decided I wanted to tackle the slowest request in the entire application - students by cohort. I knew it would be a challenge to take all of the individual ORM statements I have written over the past 2 years and consolidate them into a speedier process. I was planning on spending about a week to analyze, tinker, refine and deploy a solution.

GenAI beat me to it.

It stole all that time and allowed me to do the entire process in about 10 minutes. Here's the tale about how it was able to steal my work from me.

Data Context

First, I learned that this infernal tool can ingest the DBML for my entire database as context and understand how the structure and relationships work. Here's a screenshot of my database. It's certainly not a large database, but 38 tables can challenge the most structured mind when trying to get data that spans 10 of them, and also requires conditions, grouping, and aggregate functions.

learnops

Since I know that it can parse DBML and generate high quality SQL statement from it, I decided to give this bugaboo the opportunity to fail. There was no chance that it could work at this level of complexity. I was sure of it.

I am providing the DBML for my API project below. I need you to take on the role of a business analyst and an expert SQL developer to help me generate SQL statements for new features for the project, and optimizing existing queries.

```dmbl
{ all my DBML here }
```

It was over-confident right from the get go. I couldn't wait to rub its smug face in how much I could do as a human that it couldn't.

Based on your DBML for the API project, I will help generate SQL statements for new features you might be considering and making existing queries more efficient.

Yeah, we'll see.

Slow Django Serialization

Again, my most egregious crime in this app is the slow build-up of more and more data being added to the information to be sent as JSON about the students in a cohort.

At first, it was just what project was being worked on. Then it was if an assessment was being worked on. Then weighted learning scores, assessment status, capstone submission, capstone submission statues, on and on and on. Finally, after all those additions, the HTTP request was taking over 2 full seconds to complete - unacceptable.

Now it was time to show GenAI how much it sucks and it will never replace humans. It's only for losers who don't know how to code in the first place. I copy-pastad my entire Django model, which has 6 methods on it used by the Serializer for producing a response. I asked it to produce a single SQL statment that would replace all of the separate ORM statements.

Below is an abbreviated version of the model. An experienced developer will immediately notice how much of a performance hit these will incur when run separately when used by a serializer build the JSON.

Time to break GenAI. Below is my prompt, and I knew that I would get complete garbage as a response, and with further prompting, I might get something that could save me a little bit of time.

The Big Prompt

Here is a Django model based on the DBML above. Currently there are 6 methods - book, assessment_status, proposals, score, assessment_overview, and current_cohort - that are all used by a serializer to produce a JSON response with all of this information about a student. Because all of these execute separately, the operation can take over 2 seconds to complete. I want all of these combined into either a single Django ORM statement, or the full SQL in order to create a database function.

class NssUser(models.Model):
    """Model for NSS-specific user information beyond Django user"""

    @property
    def book(self):
        student_project = StudentProject.objects.filter(student=self).last()
        assigned_cohort = self.assigned_cohorts.order_by("-id").last()

        if student_project is None:
            cohort_course = CohortCourse.objects.get(cohort=assigned_cohort.cohort, index=0)
            project = Project.objects.get(
                book__course=cohort_course.course, book__index=0, index=0)

        ...
        
    @property
    def assessment_status(self):
        try:
            student_assessment = self.assessments.last()
            if student_assessment.assessment.book.id != self.book["id"]:
                return 0

            status = student_assessment.status.status
        ...
        
    @property
    def proposals(self):
        try:
            lastest_status = CapstoneTimeline.objects.filter(capstone=OuterRef("pk")).order_by("-pk")

            proposals = self.capstones.annotate(
                course_name=F("course__name"),
                current_status_id=Subquery(
                    lastest_status.values('status__id')[:1]
                ),
                current_status=Subquery(
                    lastest_status.values('status__status')[:1])
                )
                .values(
                    'id', 'current_status', 
                    'course_name', 'proposal_url', 
                    'current_status_id'
                )
        ...
        
    @property
    def score(self):
        scores = self.learning_records.filter(achieved=True) \
            .annotate(total_score=Sum("weight__weight")) \
            .values_list('total_score', flat=True)
        total = sum(list(scores))

        core_skill_records = list(
            self.core_skills.values_list('level', flat=True))
        ...

    @property
    def assessment_overview(self):
        for assessment in self.assessments.all().order_by("-assessment__book__index"):
            assessment_list.append({
                "id": assessment.id,
                "name": assessment.assessment.name,
                "status": assessment.status.status,
                "book": assessment.assessment.assigned_book,
                "reviewed_by": assessment.instructor.user.first_name
            })
        ...

    @property
    def current_cohort(self):
        assignment = self.assigned_cohorts.order_by("-id").last()

        return {
            "name": assignment.cohort.name,
            "id": assignment.cohort.id,
            "start": assignment.cohort.start_date,
            "end": assignment.cohort.end_date,
            "courses": assignment.cohort.courses
                .order_by('index')
                .values('course__name', 'course__id', 'active'),
            }

ChatGPT's Disgusting, Useless Response

Combining the operations of the six methods (book, assessment_status, proposals, score, assessment_overview, and current_cohort) into a single Django ORM statement or SQL query to fetch all required information in one go is quite complex and challenging. This is mainly due to the diverse nature of the data these methods retrieve, which spans across different tables with various relationships.

Aha!! Gotcha!!

As soon as I read that, I knew I'd won and proven my superiority. This tool is already admitting that it can't do it. One point for humans.

uthenbryt_A_human_warrior_like_Aragorn_standing_victorious_over_0347fff6-bcce-4cf9-be04-3eb699aee009

Wait.... what? Are there yet more foes that need to be vanquished?

However, I can outline an approach to create a more efficient SQL query or a set of queries that could potentially reduce the total execution time. The goal here would be to minimize the number of database hits by fetching related data in bulk and then processing it in Python.

Recommendation: Database Function or Stored Procedure:

Given the complexity, consider implementing a database function or stored procedure that encapsulates this logic within the database. This allows you to maintain complex logic in one place and potentially improve performance by utilizing database-specific optimizations.

After creating the stored procedure, you can call it from Django using a raw SQL query:

from django.db import connection

def get_student_details(user_id):
    with connection.cursor() as cursor:
        cursor.execute("CALL get_student_details(%s);", [user_id])
        result = cursor.fetchone()  # or fetchall() based on your needs
    return result

PostgreSQL Function for Student Details

This PostgreSQL function aims to provide a comprehensive overview of a student's details, including their latest book project, current cohort, assessment status, and a simplified score calculation. Given the varied nature of the data, some simplifications and assumptions are made for the sake of brevity and clarity.

-- I have removed much of the SQL, also for the 
-- sake of brevity of this blog post

CREATE OR REPLACE FUNCTION get_student_details(student_user_id INT)
RETURNS TABLE (
    full_name TEXT,
    latest_project_name TEXT,
    latest_book_name TEXT,
    ...
) AS $$
BEGIN
    RETURN QUERY
    WITH student_info AS (
        SELECT
            au.id,
            au.first_name || ' ' || au.last_name AS full_name
        FROM auth_user au
        WHERE au.id = student_user_id
    ),
    latest_project AS (
        SELECT
            sp.student_id,
            p.name AS project_name,
            p.id as project_id,
            ...
        FROM "LearningAPI_studentproject" sp
        ...
    ),
    current_cohort AS (
        SELECT
            nuc.nss_user_id,
            c.name AS cohort_name,
        ...
    ),
    assessment_status_agg AS (
        SELECT
            sa.student_id,
            sa.status_id,
            ...
        FROM "LearningAPI_studentassessment" sa
        ...
    ),
    score_calc AS (
        SELECT
            lr.student_id,
            SUM(lw.weight) AS total_score
        FROM "LearningAPI_learningrecord" lr
        JOIN ...
    )
    SELECT
        si.full_name,
        ...
        lp.book_id AS latest_book_id,
        coalesce(asa.status, 'Not Started') AS assessment_status,
        coalesce(sc.total_score, 0) AS total_score
    FROM student_info si
    LEFT JOIN latest_project lp ON si.id = lp.student_id
    ...;
END; $$
LANGUAGE plpgsql;

Stunned Silence

Ok, so I copy that into my SQL file and execute it. Dubious 🤨, but hopeful because my initial scan of the query had me feeling that it looked... kinda right.

I ran it for one student. It was right.

I ran it for another student in a different project. It was right.

I ran it for a student that wasn't working on an assessment. It was right.

I ran it for a student that was working on an assessment. It was right.

I ran it for a student that hadn't submitted a proposal. It was right.

I ran it for a student that had a capstone proposal and it was currently being reviewed. It was right.

I'm not going to lie. I absolutely could have completed all of this work myself using Google and my own knowledge. It also would have taken me a week or two, multiple dozens of Google searches, myriad mistakes, and multiple versions until I got a working version.

One thing I'm not confident about is that my SQL would have been the most efficient. Looking at what ChatGPT produced, I can state with good confidence that mine would likely have been worse.

The True Power of GenAI

The point of this article was to demonstrate how ridiculously sophisticated these LLMs are, and how much time can be saved by software developers... if they learn how to generate powerful prompts and provide the context the tool needs.

This likely just saved me weeks of very complex, and cognitively challenging work, but it didn't do it for me.

It's not like I was working at my desk and it came up behind me like, "Hey dude, it's Gene. I saw you working on this ticket, but it looked cool so I did it for you. Here's the code."

(side note: I didn't like Gene)

  • I had to understand the real problem.
  • I had to know how to prompt a GenAI for valuable responses.
  • I had to have the DBML for my database generated.
  • I had to know which model, and which methods on that model were the bad actors.
  • I had to design a very comprehensive, and specific, prompt that told it exactly what problem needed to be solved.
  • I had to tell it how I wanted it to act and respond to my queries.
  • I made sure to verify each solution and provide adjustments 3 times.

I feel pity for the low to mid-value developers on the Web that bash GenAI as a hack or a cheat that only losers use.

It's a new tool with a tremendous amount of power, for good or evil. It has some people (like me) thrilled and excited about the benefits that it brings to the world of software development. Some people are curmudgeons and say how much it sucks, or that it's a crutch for bad software developers.

I get it. Coming from a place of fear and anxiety is pretty normal for humans when confronted with something new like this.

"I learned it {insert some old way} and I turned out just fine! You kids these days want to take the easy way out."

To which I always call bullshit. If this tool was available to any of these folks when they were just starting out, it would have been like a miracle and they all would have used it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK