Effortlessly Tag People’s Names in Your Google Sheets with Python and SpaCy

Have you ever wished you could automatically identify and tag people’s names in a Google Sheet? Maybe you’re managing a list of contacts, analyzing survey responses, or working on a project where extracting this information is key. Well, with a bit of Python magic and the power of SpaCy’s natural language processing capabilities, we can do just that!

The Challenge

The task at hand was to create a Python script that could:

  1. Connect to Google Sheets: Securely access a specific spreadsheet using Google’s API.
  2. Extract Data: Read keywords or phrases from a designated column.
  3. Identify Names: Use SpaCy’s named entity recognition (NER) to determine which of these keywords are people’s names.
  4. Update the Sheet: Write “Person” into another column, but only if the cell is empty to avoid overwriting existing data.

Solution: Python and SpaCy to the Rescue!

We tackled this challenge by building a custom Python script leveraging the following tools and techniques:

  • gspread: A Python library that makes it a breeze to interact with Google Sheets.
  • Google Sheets API: This allowed our script to authenticate and access the spreadsheet remotely.
  • SpaCy: A powerful NLP library renowned for its named entity recognition capabilities. We chose the “en_core_web_lg” model for greater accuracy.

The Script: A Step-by-Step Breakdown

  1. Authentication: We started by setting up secure authentication with Google Sheets using a service account and its credentials.
  2. Opening the Sheet: The script connected to the specific spreadsheet and targeted the desired worksheet (“MASTER” in our case).
  3. Reading Keywords: We extracted the keywords from the designated column (column C).
  4. Identifying Names: SpaCy’s NER model worked its magic, analyzing each keyword to determine if it was a person’s name.
  5. Conditional Update: To prevent data loss, the script carefully checked if the target cell (in column D) was empty before writing “Person.”
  6. Batch Updates and Error Handling: To ensure efficiency and avoid hitting API limits, we implemented batch updates. We also included error handling to retry if we ran into rate limiting issues.

Code Snippet:

Here’s a glimpse of the core functionality:

				
					# ... (authentication, sheet opening, keyword extraction)

labels = [label_person(keyword) for keyword in keywords]

cells_to_update = []
existing_values = worksheet.col_values(4) 
for i, label in enumerate(labels):
    while True:
        try:
            if not existing_values[i + 1] and label:
                cells_to_update.append((i + 2, 4, label))
            break 
        except gspread.exceptions.APIError as e:
            if e.response.status_code == 429:
                time.sleep(60)  
            else:
                raise  

if cells_to_update:
    worksheet.update_cells(cells_to_update)

				
			

Conclusion

With this Python script, you can automate the process of identifying and tagging people’s names in your Google Sheets,saving you valuable time and effort. Feel free to customize the code, experiment with different SpaCy models, or even extend it to identify other types of entities like organizations or locations!

SEO Strategies: Using Python to Detect Names in a List of Words

  Using Python to Detect Proper Nouns in a List of Words. I was working on keyword research for a client and was working to categorize all the keywords. Google Gemini helped me create scripts to go through a list of 2500 keywords and flag if it is a person’s name. Here’s a summary of […]

Cyclist Capstone

Cyclistic Capstone Project Cyclistic Capstone Project Brian Hafner 2024-01-28 Cyclistic Rides Analysis Background of Cyclistic Cyclistic is a bike-share program in Chicago, established in 2016, with 5,824 bicycles across 692 stations. They offer various pricing plans, categorizing customers into casual riders (using single-ride or full-day passes) and members (holding annual memberships). The company’s financial analysis […]

Reasons for discrepancies between old Google Analytics Universal Analytics Properties, and the new Google Analytics 4 Properties

Have you noticed discrepancies between your old Google Analytics Universal Analytics properties, and the new Google Analytics 4 properties? As we delve into the differences between Google Analytics 4 (GA4) and Universal Analytics (UA), it becomes evident that discrepancies in session or user numbers are often attributed to the shift in measurement methods. There are several […]

Upgrading from Google Analytics UA property to a Google Analytics 4 (GA4) property 

If you have recently upgraded to a Google Analytics GA4 property, you may have noticed some differences in the data reported by your new property compared to your old UA (universal analytics) property. This is because GA4 and UA use different methods of collecting, processing, and presenting data. In this blog post, we will explain […]

How to Use ChatGPT to Create Visual Assets in Canva

If you are looking for a way to spice up your visual content, you might be interested in a new feature that Canva has recently launched: ChatGPT. ChatGPT is a powerful tool that uses artificial intelligence to generate text and images based on your input. You can use ChatGPT to create catchy headlines, captions, slogans, […]