Mining-Away/jupyter-notes/Panda Bamboo.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas Crash Course\n",
    "As usual, documenting what is being used from pandas here ig\n",
    "\n",
    "Docs:\n",
    "- https://pandas.pydata.org/docs/getting_started/index.html#getting-started"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning Game/Score/Rating Dataset\n",
    "Error found: Game Names had Reviews attached to them"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Game Datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from tkinter.filedialog import askopenfilename\n",
    "filename = askopenfilename()\n",
    "df1= pd.read_csv(filename)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Cleaning:  Removing the word review and anything after it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Unclean showcase\n",
    "unclean = df1\n",
    "#limit this output 3 rows pls\n",
    "print(unclean[['GameName']].head(5))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CLeaning\n",
    "nuke=df1['GameName'].to_list()\n",
    "nuke2 = list()\n",
    "\n",
    "for orphan in nuke : \n",
    "    orphan = orphan.split('Review')[0]\n",
    "    nuke2.append(orphan)\n",
    "\n",
    "df1['GameName']=nuke\n",
    "\n",
    "\n",
    "\n",
    "nuke_frame = pd.DataFrame(nuke2)\n",
    "clean=df1.drop(columns=['GameName'])\n",
    "\n",
    "clean['Name'] = nuke2\n",
    "#limit this output 3 rows pls\n",
    "print(clean[['Name']].head(5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CSV output\n",
    "df1.to_csv('cleaned_games.csv')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Integrating Game Datasets together"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "filename = askopenfilename()\n",
    "df2 = pd.read_csv(filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Rank                                Name         Genre ESRB_Rating  \\\n",
      "0    1.0                          Wii Sports        Sports           E   \n",
      "1    2.0                   Super Mario Bros.      Platform         NaN   \n",
      "2    3.0                      Mario Kart Wii        Racing           E   \n",
      "3    4.0       PlayerUnknown's Battlegrounds       Shooter         NaN   \n",
      "4    5.0                   Wii Sports Resort        Sports           E   \n",
      "5    6.0  Pokemon Red / Green / Blue Version  Role-Playing           E   \n",
      "6    7.0               New Super Mario Bros.      Platform           E   \n",
      "7    8.0                              Tetris        Puzzle           E   \n",
      "8    9.0           New Super Mario Bros. Wii      Platform           E   \n",
      "9   10.0                           Minecraft          Misc         NaN   \n",
      "10  11.0                           Duck Hunt       Shooter         NaN   \n",
      "11  12.0                            Wii Play          Misc           E   \n",
      "12  13.0                  Kinect Adventures!         Party           E   \n",
      "13  14.0                          Nintendogs    Simulation           E   \n",
      "14  15.0                       Mario Kart DS        Racing           E   \n",
      "15  16.0       Pokemon Gold / Silver Version  Role-Playing           E   \n",
      "16  17.0                             Wii Fit        Sports           E   \n",
      "17  18.0                        Wii Fit Plus        Sports           E   \n",
      "18  19.0                   Super Mario World      Platform           E   \n",
      "19  20.0                  Grand Theft Auto V        Action           M   \n",
      "\n",
      "   Platform               Publisher              Developer  Critic_Score  \\\n",
      "0       Wii                Nintendo           Nintendo EAD           7.7   \n",
      "1       NES                Nintendo           Nintendo EAD          10.0   \n",
      "2       Wii                Nintendo           Nintendo EAD           8.2   \n",
      "3        PC        PUBG Corporation       PUBG Corporation           NaN   \n",
      "4       Wii                Nintendo           Nintendo EAD           8.0   \n",
      "5        GB                Nintendo             Game Freak           9.4   \n",
      "6        DS                Nintendo           Nintendo EAD           9.1   \n",
      "7        GB                Nintendo  Bullet Proof Software           NaN   \n",
      "8       Wii                Nintendo           Nintendo EAD           8.6   \n",
      "9        PC                  Mojang              Mojang AB          10.0   \n",
      "10      NES                Nintendo          Nintendo R&D1           NaN   \n",
      "11      Wii                Nintendo           Nintendo EAD           5.9   \n",
      "12     X360  Microsoft Game Studios    Good Science Studio           6.7   \n",
      "13       DS                Nintendo           Nintendo EAD           8.4   \n",
      "14       DS                Nintendo           Nintendo EAD           9.1   \n",
      "15       GB                Nintendo             Game Freak           9.2   \n",
      "16      Wii                Nintendo           Nintendo EAD           7.9   \n",
      "17      Wii                Nintendo           Nintendo EAD           8.0   \n",
      "18     SNES                Nintendo           Nintendo EAD           8.5   \n",
      "19      PS3          Rockstar Games         Rockstar North           9.4   \n",
      "\n",
      "    User_Score  Total_Shipped  Global_Sales  NA_Sales  PAL_Sales  JP_Sales  \\\n",
      "0          NaN          82.86           NaN       NaN        NaN       NaN   \n",
      "1          NaN          40.24           NaN       NaN        NaN       NaN   \n",
      "2          9.1          37.14           NaN       NaN        NaN       NaN   \n",
      "3          NaN          36.60           NaN       NaN        NaN       NaN   \n",
      "4          8.8          33.09           NaN       NaN        NaN       NaN   \n",
      "5          NaN          31.38           NaN       NaN        NaN       NaN   \n",
      "6          8.1          30.80           NaN       NaN        NaN       NaN   \n",
      "7          NaN          30.26           NaN       NaN        NaN       NaN   \n",
      "8          9.2          30.22           NaN       NaN        NaN       NaN   \n",
      "9          NaN          30.01           NaN       NaN        NaN       NaN   \n",
      "10         NaN          28.31           NaN       NaN        NaN       NaN   \n",
      "11         4.5          28.02           NaN       NaN        NaN       NaN   \n",
      "12         NaN          24.00           NaN       NaN        NaN       NaN   \n",
      "13         NaN          23.96           NaN       NaN        NaN       NaN   \n",
      "14         9.4          23.60           NaN       NaN        NaN       NaN   \n",
      "15         NaN          23.10           NaN       NaN        NaN       NaN   \n",
      "16         NaN          22.67           NaN       NaN        NaN       NaN   \n",
      "17         NaN          21.13           NaN       NaN        NaN       NaN   \n",
      "18         NaN          20.61           NaN       NaN        NaN       NaN   \n",
      "19         NaN            NaN         20.32      6.37       9.85      0.99   \n",
      "\n",
      "    Other_Sales    Year  Unnamed: 0 Console Review  Score  \n",
      "0           NaN  2006.0         NaN     NaN    NaN    NaN  \n",
      "1           NaN  1985.0         NaN     NaN    NaN    NaN  \n",
      "2           NaN  2008.0         NaN     NaN    NaN    NaN  \n",
      "3           NaN  2017.0         NaN     NaN    NaN    NaN  \n",
      "4           NaN  2009.0         NaN     NaN    NaN    NaN  \n",
      "5           NaN  1998.0         NaN     NaN    NaN    NaN  \n",
      "6           NaN  2006.0         NaN     NaN    NaN    NaN  \n",
      "7           NaN  1989.0         NaN     NaN    NaN    NaN  \n",
      "8           NaN  2009.0         NaN     NaN    NaN    NaN  \n",
      "9           NaN  2010.0         NaN     NaN    NaN    NaN  \n",
      "10          NaN  1985.0         NaN     NaN    NaN    NaN  \n",
      "11          NaN  2007.0         NaN     NaN    NaN    NaN  \n",
      "12          NaN  2010.0         NaN     NaN    NaN    NaN  \n",
      "13          NaN  2005.0         NaN     NaN    NaN    NaN  \n",
      "14          NaN  2005.0         NaN     NaN    NaN    NaN  \n",
      "15          NaN  2000.0         NaN     NaN    NaN    NaN  \n",
      "16          NaN  2008.0         NaN     NaN    NaN    NaN  \n",
      "17          NaN  2009.0         NaN     NaN    NaN    NaN  \n",
      "18          NaN  1991.0         NaN     NaN    NaN    NaN  \n",
      "19         3.12  2013.0         NaN     NaN    NaN    NaN  \n"
     ]
    }
   ],
   "source": [
    "# merged = pd.merge(df1,df2, how='inner', sort=True) DOES NOT WORK\n",
    "# print(merged.head(10))\n",
    "columns = df1.columns\n",
    "merged = pd.concat([df2,df1], sort=False, ignore_index=True) #Good\n",
    "merged = pd.concat([df1,merged], sort=False, ignore_index=True)\n",
    "print(merged.head(20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "merged.to_csv('merged_games.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(df1[['Name']].head(100))\n",
    "print(df2[['Name']].head(100))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}