Bulk Add

Simplified example of bulk add for InterLex terms.

Prerequisite code

With gspread, you can automate various tasks such as retrieving and updating tabular data, clearing sheets, and handling multiple tabs within a Google Sheet. This can significantly streamline workflows involving InterLex terms.gspread is a powerful tool for interacting with Google Sheets, allowing users to add and update InterLex terms efficiently. It offers functionality to open, modify, and manage Google Sheets using Python, making it easier to handle data programmatically.

All the needed code for gspread is found below:

import gspread 

class GSheet:
    """Can Opens/Modify Google sheets 1 tab at a time :D"""

    def __init__(self, sheet_name: str) -> None:
        gc = gspread.oauth()
        self.sheet_name = sheet_name
        try:
            self.sheet = gc.open(sheet_name)
        except:
            self.sheet = gc.open_by_key(sheet_name)

    def get_tab(self, tab_name: str) -> pd.DataFrame:
        """
        Get Google Sheet tab as a pandas DataFrame

        Parameters
        ----------
        tab_name : str
            Specific Google Sheet tab name (Google has the default to "Sheet1")

        Returns
        -------
        pd.DataFrame
        """
        worksheet = self.sheet.worksheet(tab_name)
        try:
            header, *body = worksheet.get_all_values()
            return pd.DataFrame(body, columns=header)
        except:  # worksheet is empty
            return pd.DataFrame()

    def update_tab(
        self,
        tab_name: str,
        df: pd.DataFrame,
        cell_range: str = None,
        include_header: bool = True,
    ) -> None:
        """
        Updates only the text of a single tab within the Google Sheet initialized. This does not update the
        Google Sheet settings, colors, fonts, or funcitons. Just the text value.

        Parameters
        ----------
        tab_name : str
            Google Sheet tab name
        df : pd.DataFrame
            DataFrame of the data to be completely overwritten on the tab
        """
        worksheet = self.sheet.worksheet(tab_name)
        if include_header:
            data = [df.columns.values.tolist()] + df.values.tolist()
        else:
            data = df.values.tolist()
        if cell_range:
            worksheet.update(cell_range, data)
        else:
            worksheet.update(data)
            
    def clear(self, tab_name: str) -> None:
        """deletes contents of sheet"""
        worksheet = self.sheet.worksheet(tab_name)
        worksheet.clear()         

Gspread Usage

Gspread should be unique in naming scheme in order to avoid using ID hashes for the input.

Curate new Terms

With the string_profiler function you can clean the existing labels with parenthesis for special

Cleaned label column

(Optional) Add Synonyms

Synonyms can be inferred as labels for exact ontology types. If you want to check synonyms for a data source you know is not unique, add as so.

Add Terms and update Google tab

Update Google Tabe

! Warning ! Please make sure that there is an existing 'ILX UUID' column in the sheet to write over or it will append the column and might remove the annotation meaning such as colors and grouped sets made on the Google sheets.

Last updated