Description
Code Sample, a copy-pastable example if possible
Updated without yfinance as it was a lousy choice of example - requires only standard numpy and datetime. A DateTimeIndex exacerbates the time slowness.
import timeit
timeit.timeit(stmt='for x in range(len(hist)): _ = hist.iloc[x].Col1', setup='import numpy as np; import pandas as pd; col = np.random.uniform(low=-1, high=1, size=(10000)); hist = pd.DataFrame({"Col1":col,"Col2":col,"Col3":col,"Col4":col,"Col5":col})', number=3)
#3.553613900000073
timeit.timeit(stmt='for x in range(len(hist)): _ = hist.Col1[x]', setup='import numpy as np; import pandas as pd; col = np.random.uniform(low=-1, high=1, size=(10000)); hist = pd.DataFrame({"Col1":col,"Col2":col,"Col3":col,"Col4":col,"Col5":col})', number=3)
#0.5215671999999358
timeit.timeit(stmt='for x in range(len(hist)): _ = hist.iloc[x].Col1', setup='import datetime; import numpy as np; import pandas as pd; col = np.random.uniform(low=-1, high=1, size=(10000)); hist = pd.DataFrame({"Col1":col,"Col2":col,"Col3":col,"Col4":col,"Col5":col}, [datetime.datetime.now() + datetime.timedelta(seconds=x) for x in range(10000)])', number=3)
#4.485623600000508
timeit.timeit(stmt='for x in range(len(hist)): _ = hist.Col1[x]', setup='import datetime; import numpy as np; import pandas as pd; col = np.random.uniform(low=-1, high=1, size=(10000)); hist = pd.DataFrame({"Col1":col,"Col2":col,"Col3":col,"Col4":col,"Col5":col}, [datetime.datetime.now() + datetime.timedelta(seconds=x) for x in range(10000)])', number=3)
#0.7185356999998476
Old code example:
#requires pip install yfinance for this example - could substitute with any data set
timeit.timeit(stmt='for x in range(len(hist)): _ = hist.iloc[x].Close', setup='import yfinance as yf; hist = yf.Ticker("MSFT").history(period="max")', number=3)
#5.178573199999846
timeit.timeit(stmt='for x in range(len(hist)): _ = hist.iloc[x].Close', setup='import yfinance as yf; hist = yf.Ticker("MSFT").history(period="max")', number=3)
#0.5886300999998184
Problem description
One would naturally expect iloc to have close to identical efficiency as column indexing. This is a dramatic blunder in a lot of code more than likely if it is not well documented and known. I would expect some strange inefficiency is present as logically it could be slightly slower to return a row rather than a data point. But 10 times slower is so dramatic that this function should be avoided.
Expected Output
Less than double the speed of single column value access. Or at least thorough documentation of this limitation.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.5
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.9
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1